Senior SRE

A leading digital payments provider in Taiwan is seeking a Senior Site Reliability Engineer to join their expanding SRE team. This is a unique opportunity to play a pivotal role in shaping the infrastructure that supports over six million users and processes billions in transactions every month.

As a Senior Site Reliability Engineer, you will immerse yourself in a dynamic environment where your day-to-day activities revolve around ensuring the seamless operation of critical payment infrastructure. You will work hand-in-hand with cross-functional teams to architect resilient systems capable of handling vast amounts of data while maintaining high availability. Your responsibilities will include automating deployments through infrastructure as code practices, optimising monitoring frameworks for proactive issue detection, and refining CI/CD pipelines for efficient software delivery. By leveraging your experience with containerisation technologies and distributed systems design, you will help future-proof the platform against evolving business needs. Your commitment to collaboration will see you engaging regularly with colleagues from various departments to deliver solutions that drive both technical excellence and business value.

Collaborate with engineering and product teams to plan, design, and operate robust solutions that support both online transactions and offline analysis.
Manage databases and shape the architecture of the data platform to ensure reliability, scalability, and sustainability for all stakeholders.
Drive the establishment of next-generation infrastructure by working alongside research and development teams on architectural improvements.
Implement and maintain monitoring stacks using industry-standard tools such as Prometheus, Grafana, Consul, Elasticsearch, and Loki to ensure system health.
Utilise infrastructure as code tools like Puppet, Chef, Ansible, or Terraform to automate deployment processes and enhance operational efficiency.
Design CI/CD workflows that streamline development pipelines while supporting continuous integration and delivery best practices.
Oversee containerisation strategies to optimise resource utilisation and simplify application deployment across environments.
Contribute to improving system maintainability and stability by identifying areas for enhancement and implementing effective solutions.
Support large-scale distributed infrastructure design initiatives to accommodate growing business complexity and transaction volumes.
Engage in regular knowledge sharing sessions with team members to foster a culture of learning and continuous improvement.

What you bring:

Your proven track record as a Senior Site Reliability Engineer demonstrates your ability to manage complex infrastructures supporting high-transaction environments. You bring deep technical skills in automation using industry-leading tools alongside practical experience in monitoring system health at scale. Your comfort navigating both Linux and Windows environments allows you to troubleshoot effectively across diverse technology stacks. With a strong foundation in scripting languages and containerisation methods, you are adept at streamlining deployment processes while ensuring operational resilience. Your collaborative approach enables you to work seamlessly with multidisciplinary teams on solution planning and implementation. A passion for knowledge sharing rounds out your profile—making you an invaluable contributor to ongoing team development.

Proficiency in working with Linux and Windows command line interfaces for system administration tasks.
Hands-on experience with monitoring stacks such as Prometheus, Grafana, Consul, Elasticsearch, or Loki for comprehensive observability.
Demonstrated expertise in using infrastructure as code tools including Puppet, Chef, Ansible, or Terraform to automate complex environments.
Solid background in containerisation technologies for efficient application deployment across diverse platforms.
Experience designing CI/CD flows that enable smooth integration and delivery cycles within engineering teams.
Practical knowledge of shell scripting along with proficiency in at least one additional programming language such as Python, Ruby, or Golang.
Foundational understanding of networking concepts essential for troubleshooting connectivity issues within distributed systems.
Basic familiarity with database management principles relevant to supporting transactional workloads at scale.
Exposure to large-scale distributed infrastructure design is highly desirable for addressing business growth challenges.
Experience with Kubernetes orchestration or cloud platforms like AWS, GCP, or Azure would be advantageous but not mandatory.

Similar jobs

View more jobs

Senior SRE

Share

Similar jobs