Technical Lead/ Manager

Cloud SRE Manager

Groupon’s mission is to become the daily habit in local commerce and fulfil our purpose of building strong communities through thriving small businesses by connecting people to a vibrant, global marketplace for local services, experiences and goods. In the process, we’re positively impacting the lives of millions of customers and merchants globally. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation and celebrates success. 

We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact. We're curious, fun, and love helping local businesses thrive. We believe that our differences as individuals make our team stronger. We value those who look through a different lens, and appreciate other worldviews. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation and celebrates success.

The Groupon Cloud Platform Compute group is looking to hire an Site Reliability Engineering Manager with experience in providing Cloud Platform support to Development teams.

You will grow, develop and lead a global team of 8+ SRE engineers, promote and champion the SRE charter to the engineering organization. Your function is a blend of leadership and hands-on engineering.

Groupon operates at hyper scale with 500+ micro services and millions of users and transactions a day. We are in the process of migrating this platform to Kubernetes in AWS Cloud which presents new and interesting challenges for us, the Cloud SRE team is integral to the success of this migration and its continued operation in our new home in cloud, we are looking for an Engineering Manager that can be part of this. 

You’ll spend time on the following:

  • Be a force multiplier for our ongoing datacenter to cloud migration. 
  • Motivate and develop the careers of our remote, global, Cloud SRE Engineers
  • Engage with engineering teams to triage outages and carry forward action items to improve reliability
  • Identify process gaps and drive improvements to increase operational reliability including MTTD and MTTR.
  • Gain a deep understanding of our core business and focus on solutions that form better architecture, not just better code.
  • Drive initiatives across engineering teams with a focus on increasing site performance and stability.
  • Work closely with other engineering managers within your group and across the product and engineering organization to identify reliability problems and build effective technical solutions.
  • Advocate for and improve upon our tech culture principles.
  • Drive standardization efforts across the services, infrastructure, systems and practices

We value people who are:

Customer-focused: We believe that doing what’s right for the customer is ultimately what will drive our business forward.

Concerned with Quality: Have standards, do things the right way, avoid repetition  

Team players: You believe that more can be achieved together. You listen to feedback and also provide supportive feedback to help others grow/improve.

Mindful: You maintain a healthy work-life balance and encourage others to.

Pragmatic: We do things quickly to learn what our customers desire. You know when it’s appropriate to take shortcuts that don’t sacrifice quality or maintainability.

Owners: Engineers at Groupon know how to positively impact the business.

We’re excited about you if you have:

  • At least 10+ years of experience in Infrastructure Platform Operations 
  • Experience in team development and growth: coaching and mentoring others 
  • Excellent calm, level headed troubleshooting and problem-solving skills.
  • Experience with Kubernetes and service-mesh in the Cloud.
  • Experience with service-oriented architectures/microservices.
  • Ability to work collaboratively through an agile development process that promotes constant team communication.
  • Experience with influencing and implementing best practices for engineering development in large organizations.
  • Coding skills in one or more languages: Go, Java, Python or Ruby.
  • See SRE as an engineering discipline and can drive the charter forward.