Site Reliability Engineer, Connectivity
ELECTROLUX GLOBAL CONNECTIVITY & TECH PRESENTS
Make ideas come to life. Globally.
For us going to work everyday has an even greater purpose than putting the latest product or technology on the market. It’s about improving the everyday lives of millions. By staying humble and open for new ideas – we can push the boundaries for taste, care and wellbeing at home. But to keep doing so, we need more people who want to innovate and re-imagine what life at home can be.
Site Reliability Engineer, Connectivity – Stockholm (Sweden) or St. Petersburg (Russia).
Site Reliability Engineers (SREs) are people who use engineering-based approaches to solve operations problems. SRE owns and develops the infrastructure needed for the Electrolux Connectivity Platform and supporting services. SRE is also responsible for making sure the services – both internal and external systems - have the characteristics and qualities needed for the intended use.
You will work to understand the operational requirements and develop an infrastructure architecture and tools that meet these requirements. You will monitor the performance of the system and refine the management of the infrastructure from both a performance and cost perspective so that it is optimal and balanced at all times.
You will also work closely with our DevOps teams to deliver efficiently by empowering them with excellent tools that you develop. These might be for example monitoring tools, infrastructure pipeline components etc.
• Engage in and improve the whole lifecycle of services — from inception and design, through deployment, operation and refinement
• Support services prior to production through activities like system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
• Contribute improvements to the availability, scalability, latency, and efficiency of the services once they are live
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity
• Practice sustainable incident response and blameless postmortems
• Contribute to our deployment and automation tools
• Promote Site Reliability Engineering best practices
• Be part of our on-call rotation with other engineers around the world
• BS or MS in Computer Science or a related technical field
• 3+ years experience working with infrastructure engineering in large-scale production service environment
• 3+ years experience in analyzing and troubleshooting distributed systems using logging, distributed tracing, stack traces and metrics
• Automation skills and a desire to automate everything
• Comfortable with at least one of the following languages: Java, Python, Go. Can learn a new language quickly
• Systematic problem-solving approach with strong sense of ownership
• Good communication skills
• You are a Software Engineer
• A good understanding in large-scale distributed systems
• Experience working with Public Cloud (AWS, Azure or GCP)
• Experience working with container orchestration e.g. Kubernetes
• Experience in monitoring and metrics systems, e.g. Prometheus, Grafana
• A good knowledge of Site Reliability Engineering principles
• Experience with on-call rotation, incident response and blameless postmortem
• CI/CD automation experience
• A great team player
• Fluency in English