Staff Site Reliability Engineer
Current- Design, write and build tools to improve the reliability, latency, availability and scalability of Enterprise products using Kubernetes.
- Expertise in Cloud infrastructure management with Python Scripting.
- Providing architectural and practical guidance to software developers to improve resiliency, efficiency, performance and costs, helping the team adopt the engineering practices and tooling to ensure right-sized.
- Expertise in Observability and Monitoring of applications, services, and networks at scale
- Implementing Kubernetes cluster and application container deployment using EKS and running on Linux, with applications running on mysql, mariadb and postgres
- Monitoring and supporting the IT infrastructure environment and diagnosis of systems for optimal performance, and following SLOs