Site Reliability Engineer (Hybrid)
A leading digital insurance platform is seeking a Site Reliability Engineer to join their Taipei team, offering you the chance to play a pivotal role in building and maintaining reliable, distributed systems that underpin essential business operations.
What you'll do:
As a Site Reliability Engineer based in Taipei, you will be entrusted with the responsibility of ensuring that mission-critical systems remain robust, scalable, and efficient at all times. Your day-to-day activities will involve close collaboration with engineering colleagues as you work together to implement best-in-class monitoring solutions using tools like Prometheus, Grafana, ELK stack, and more. You will take ownership of automating deployment pipelines via GitLab CI/CD while also optimising infrastructure through platforms such as Ansible, Terraform, Docker, and Kubernetes. By defining clear metrics for system health and proactively addressing potential bottlenecks or vulnerabilities—especially those related to network security—you will help maintain seamless service delivery for users across Hong Kong. Your ability to troubleshoot complex issues methodically will be crucial in sustaining high levels of reliability. Success in this role requires not only technical proficiency but also strong interpersonal skills; you will regularly share insights with peers during capacity planning sessions or SDLC reviews. Ultimately, your efforts will empower the organisation’s digital insurance platform to deliver dependable services that customers trust.
- Implement and continuously improve system reliability, availability, scalability, performance, and efficiency by leveraging advanced monitoring, alerting, and automation tools on public cloud platforms such as Azure and GCP.
- Participate actively in capacity planning sessions, analyse software performance metrics, and fine-tune infrastructure components to ensure optimal operation under varying workloads.
- Develop and enhance GitLab CI/CD processes and toolsets to streamline software delivery pipelines and automate deployment workflows for greater consistency and speed.
- Define key metrics for system health monitoring, establish robust alerting mechanisms, and proactively address potential issues before they impact end users or business operations.
- Collaborate closely with engineering teams throughout every stage of the software development life cycle (SDLC) to embed reliability best practices into all aspects of product delivery.
- Troubleshoot complex infrastructure challenges efficiently by identifying root causes quickly and implementing sustainable solutions that prevent recurrence.
- Optimise existing infrastructure through automation of repetitive tasks using platforms like Ansible and Terraform to increase operational effectiveness.
- Enhance orchestration capabilities by integrating Kubernetes for container management while ensuring seamless deployment of microservices architectures.
- Maintain high standards of security by applying in-depth network concepts during system design reviews and ongoing operations.
- Utilise version control systems such as Git to manage source code changes collaboratively with other engineers.
What you bring:
To excel as a Site Reliability Engineer in this environment, your background should demonstrate proven experience with scripting languages that facilitate automation across complex infrastructures. Your familiarity with industry-standard monitoring tools means you can identify trends or anomalies before they escalate into larger problems. A solid grasp of cloud computing principles—especially within Azure or GCP ecosystems—will enable you to design solutions that are both scalable and secure. Your exposure to the full SDLC ensures that reliability considerations are integrated from initial planning through final deployment. Understanding network protocols from both an operational efficiency standpoint as well as a security perspective is essential for protecting customer data. Experience building automated CI/CD pipelines using GitLab CI will allow you to accelerate release cycles without sacrificing quality. Additionally, hands-on work with configuration management tools like Ansible or Terraform demonstrates your commitment to reducing repetitive manual tasks. Familiarity with Kubernetes orchestration further enhances your ability to manage containers effectively within distributed systems. Finally, your use of Git for version control highlights your appreciation for collaborative coding practices—a vital attribute when working within supportive engineering teams.
- Proficiency in programming languages such as Bash, Python or Go enables you to automate tasks efficiently while supporting diverse development needs.
- Advanced knowledge of monitoring solutions including Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana) allows you to track system health accurately and respond swiftly to incidents.
- Expertise in cloud technologies—particularly Azure and GCP—ensures you can architect resilient infrastructures tailored for modern distributed applications.
- Experience across the complete software development life cycle (SDLC) equips you with insight into how reliability can be embedded at every stage of product delivery.
- In-depth understanding of network concepts with a focus on security empowers you to safeguard sensitive data while maintaining high availability.
- Hands-on experience implementing CI/CD processes using tools like GitLab CI streamlines software releases for greater consistency.
- Proficiency with automation platforms such as Ansible or Terraform helps reduce manual intervention while increasing operational efficiency.
- Knowledge of orchestration tools like Kubernetes supports effective container management within microservices environments.
- Familiarity with container technologies including Docker ensures smooth application deployment across various environments.
- Experience managing source code using Git fosters collaborative development practices among engineering teams.
About the job
Contract Type: Perm
Specialism: IT & Digital Transformation
Focus: Infra/Network/System
Industry: IT
Salary: Negotiable
Workplace Type: Hybrid
Experience Level: Associate
Location: Taipei
FULL_TIMEJob Reference: HQ5NH7-2CB02463
Date posted: 5 March 2026
Consultant: Amy Lin
taipei tech-transformation/infrastructure 2026-03-05 2026-05-04 it Taipei TW Robert Walters https://www.robertwalters.com.tw https://www.robertwalters.com.tw/content/dam/robert-walters/global/images/logos/web-logos/square-logo.png true