Software Developer/ Engineer/ Architect

Cloud Site Reliability Engineer

You became an engineer because you believed in technology’s ability to make a difference in the world. So why would you spend your days building things that don’t matter? At Groupon, we spend our days developing tools, platforms and experiences that help small businesses thrive in their local communities. We may look like an ordinary e-commerce app, but under the surface we’re using cutting edge technology to build products that regularly positively impact the lives of 48M people and 100,000 merchants.

 

Of course, local merchants aren’t the only ones who will benefit from your work—you will too, as will the engineering teams that become our customers. We are looking for great software engineers excited by helping us build out Groupon’s Site Reliability Engineering teams. We use technologies like Kubernetes, and AWS EKS. We build our automation and tooling using common languages like Go, Python, Ruby, and shell. We need our engineers to have a passion for growth and learning, to be excited to use these technologies and tools, and to be ready to develop and evolve the techniques and procedures that will ensure site reliability.

 

We want you to be part of the team that delivers the next generation site reliability platform, automation, and its toolset for Groupon Engineering. We think you’ll agree that it’s an exciting challenge and a really great team to be part of.

 

We are providing Groupon’s microservice engineering teams a solid underpinning of tools and practices in the areas of reliability, monitoring,  alerting, and automation. A measure of success is that engineering teams can focus more on delivering new features than they do thinking about how to get those features into production. Another measure of success in this role is that we see a reduction in time spent by teams bringing systems online after issues and getting to root causes, thereby allowing them to focus on higher-level tasks around site optimization and new features.  

 

  • You will:
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems and practices
  • Improve operational efficiency through automation
  • Develop effective alerts and tooling to quickly identify and address reliability risks
  • Design and develop Groupon’s dev tools, to support continuous delivery in a high scale microservices environment in the cloud
  • Engage with engineering teams to triage outages and carry forward action items to improve reliability
  • Participate in on-call rotation to support other teams’ first responders.
  • Write great quality code using SOLID principles including unit and integration tests. The languages we like to use are Go, Python, Java, Ruby, and Bash.
  • Promote and foster an open source/inner source culture at Groupon

We're excited about you if you have :

  • Good knowledge or strong interest of at least 2 of the following: Kubernetes, AWS, PaaS, IaaS
  • Working knowledge of monitoring metrics and alert tooling
  • A keen interest in tool and platform development for dev teams
  • Good knowledge of toolchains such as Jenkins, Ansible, Maven, Chef, Salt, Puppet, ELK etc.
  • Excellent programming skills using one of the following: Go, Python, Java, Ruby
  • Deep understanding of Agile and Continuous Delivery concepts and tools
  • Knowledge of the Linux command line and system-level analysis
  • Strong knowledge of all aspects of the Software Development Life Cycle (SDLC)
  •  
  • Preferred:
  • AWS Certified Solutions Architect Associate or AWS Certified DevOps Engineer Associate
  • Experience managing production Kubernetes clusters
  • 3+ years of relevant experience, out of which 2+ years of experience as a Site Reliability Engineer