SRE Lead
A fast-growing fintech firm in Taiwan seeks a Site Reliability Engineering Lead to build and scale reliable global systems for a blockchain-focused platform in a VC-backed environment.
Key responsibilities:
As Site Reliability Engineering Lead, you will oversee AWS/Kubernetes reliability, automation, and incident management while driving SRE best practices.
- Lead and manage the SRE team to ensure high availability, scalability, and reliability of production systems
- Own AWS cloud infrastructure operations including monitoring, security, resource management, and cost optimisation in a 24/7 environment
- Lead incident management, troubleshooting, RCA, post-incident reviews, and continuous service improvements
- Ensure compliance with security, audit, and regulatory standards (e.g., MAS TRM, ISO 27001) across infrastructure and operations
- Drive SRE best practices including observability, alerting, SLA/SLO/SLI management, capacity planning, DR, and high availability
- Improve system performance and operational efficiency through automation, CI/CD, IaC, and Kubernetes/EKS optimisation
- Collaborate with cross-functional teams (Backend, Data, Security, Product) while mentoring engineers and strengthening operational maturity
Candidate profile:
To excel in this role, you bring expertise in Linux, AWS, and Kubernetes, with strong experience in automation, CI/CD, observability, and SRE practices.
- 8+ years' Linux system administration and large-scale infrastructure experience with 2+ years in Tech Lead or team management roles
- Hands-on experience operating high-availability, 24/7 cloud platforms with strong AWS expertise (EC2, VPC, IAM, Lambda, EKS, CloudWatch, etc.)
- Strong Kubernetes and container orchestration experience, including EKS administration, troubleshooting, and scaling
- Experience with Infrastructure as Code and CI/CD pipelines using tools such as Terraform, Helm, Kustomize, Jenkins, GitHub Actions, and ArgoCD
- Strong observability and monitoring expertise using tools like Grafana, ELK, Zabbix, and Nagios
- Experience with distributed systems (e.g., Kafka, MongoDB) and SRE/DevOps practices including incident management, DR, capacity planning, and SLO/SLA design
- Proficient in scripting/programming (Bash, Python, or Golang) with strong knowledge of cloud security, collaboration, and fast-paced production environments
About the company:
A fintech organisation in the digital asset space builds secure, scalable financial platforms. It offers a fast-paced, collaborative environment focused on innovation, engineering excellence, and high-impact growth.
Keywords: site reliability engineering, blockchain-focused platform, AWS/Kubernetes, incident management, regulatory compliance, fintech
What’s next:
Learn more and apply today!
關於職缺
招募類型: 臨時性/合同性
專業領域: 資訊科技及數位轉型
職務類別: 資訊基礎建設/網路/系統
產業: 金融服務
薪資: Negotiable
辦公模式: 混合辦公模式
經驗: 中階管理職
地區 Taipei
TEMPORARY職務參考: EGPSMG-0AA0B780
發佈日期: 2026年6月1日
獵頭顧問 Amy Lin
taipei tech-transformation/infrastructure 2026-06-01 2026-07-31 financial-services Taipei TW Robert Walters https://www.robertwalters.com.tw https://www.robertwalters.com.tw/content/dam/robert-walters/global/images/logos/web-logos/square-logo.png true