Site Reliability Engineer (SRE)
South Jakarta
Full-time
Posted on 16/05/2025
Job Descriptions
- Analyze Business/Product requirements and propose effective and efficient technical solutions in delivering changes and innovations to the Exchange infrastructure and landscape
- Work with a project focus group (product engineering, product management, architecture, and CTO) to compile a work breakdown structure of tasks for given deliverables and provide realistic estimates for completion or project assignments
- Design, build, maintain and improve Exchange infrastructure and respective tooling. Ensure infrastructure elasticity and automated scalability for cost-efficiency in resources utilization while ensuring the system’s high availability and fault tolerance
- Collaborate with other Developers, SREs, and QA Engineers to execute full-cycle integration, functional, and regression testing. Own and resolve all priority defects identified within the solution codebase efficiently and in a timely fashion
- Promote software changes across all environments, safely and responsibly, through Development, Staging environments to deploying updates to the Production environment in a zero-downtime manner
- Provide effective infrastructure Level 1 technical support during business and, occasionally, off hours depending on a rotation schedule. Design, build, maintain and improve the respective infrastructure monitoring tooling that is critical for both:
- momentum situational as is wareness and pro-active incident response
- future infrastructure capacity planning activities
- Participate in team exercises to identify and implement areas for continuous improvement, and be proactive in bringing your ideas across
- Educate and mentor your engineering colleagues in the areas of your own expertise and domain knowledge, and be open-minded and approachable
Requirements
- 5+ years of SRE experience, ideally working with one of big cloud vendor: Amazon Web Services, Google Cloud, MS. Azure, etc.
- Experience in designing and implementing AWS and/or GCP setup from scratch
- Experience in architecting, building, deploying, and operating enterprise-ready container solutions on Kubernetes
- Solid experience in setting up and maintaining message broker infrastructure (Kafka, RocketMQ, etc.)
- Experience in setting up Cloud Persistence layer (AWS Aurora, GCP BigQuery, etc.)
- Experience implementing large Service mesh via Istio or any other relevant solution
- Experience building on-demand, short-lived environments (for debugging, profiling, and load-testing scenarios)
- Experience with operating systems, especially good knowledge of the Linux operating system and understanding of network architectures