Are you an experienced Site Reliability Engineer with a strong background in software engineering and a passion for observability?
Join my Sydney-based leading financial services institution client where your expertise will drive the reliability, scalability, and performance of critical systems.
Role Overview: As a Senior Site Reliability Engineer, you'll play a pivotal role in enhancing the observability of my client's infrastructure and applications.
You'll work closely with engineering and operations teams to implement and refine monitoring, logging, and alerting solutions, ensuring seamless service delivery and optimal system performance.
Key Responsibilities: Design, build, and maintain robust observability solutions on GCP. Collaborate with cross-functional teams to set SLOs, SLIs, and error budgets that align with business objectives. Implement monitoring, logging, and tracing strategies to proactively detect and resolve issues. Build automation and continuous improvement solutions for infrastructure reliability. Develop software engineering solutions for scalable, high-performance systems, leveraging your coding skills. Conduct root cause analysis and post-mortems to drive service improvements. Qualifications and Skills: Strong software engineering background with proficiency in languages such as Python, Java, or Go. Hands-on experience with Google Cloud Platform (GCP), including tools like Stackdriver, BigQuery, and Pub/Sub. Expertise in observability practices and tooling (e.g., Prometheus, Grafana, ELK). Familiarity with CI/CD, containerisation (Kubernetes), and Infrastructure as Code (e.g., Terraform). Proven ability to work in a collaborative team environment with effective communication skills. Why Join? Work with a dynamic, tech-forward financial services institution. Opportunity to shape and influence the organization's observability practices. Competitive salary package with flexible working arrangements. #J-18808-Ljbffr