Do you have a passion for being a Cloud Services leader that gets things done?
As a Principal Site Reliability Engineer, you will be a technical leader within an SRE team to ensure that Dell’s Apex Cloud Service delivers the service performance, reliability and availability expected by our customers. You will help ensure that your SRE teams are continuously improving the management automation of our Apex Cloud Service, with the objective of enabling industrialized fleet management at scale. You will help refine and deliver requirements for service enhancements with Apex product engineering teams and collaborate with key Apex Cloud Services Product Managers and stakeholders.
You will be in a dynamic environment with other motivated, talented individuals who inspire greatness in their teammates. Our unique position as a technology leader ensures that you’ll always be challenged in your work and supported in reaching your most ambitious goals.
We are making huge bets and fast progress to bring Cloud to our customers – wherever they may be! If you want to join the ‘ground-floor’ of a massive transformational effort and feel excited by the opportunity to use technology to define new computing paradigms and business opportunities – this role may be just the right opportunity for you!
In This Job, You Will:
Be working in a global team helping to define, measure, and optimize SLIs, SLOs, and Error budgets for product offerings
Develop software solutions to automate service delivery using technologies such as Bash, Terraform and Ansible
Help to create, manage, and utilize CI/CD pipelines to deploy to customer environments
Participate in on-call support and work through all aspects of the Incident Management process, including orchestrating Blameless Post-mortems and encourage the practice within the organization
Operate in an ever-evolving landscape of product offerings to deliver first-class service to our customers
Work closely with customer-facing Support Teams to help evolve, train, and empower them to better support our customers directly
Essential Requirements:
B.S. or M.S. in Engineering, Computer Science, technical degree, or equivalent work experience
8+ years of software engineering or equivalent experience
Proficiency in one or more Scripting Language(s) and experience of IaC in a production environment
Experience in DevOps, SRE practices and creating, using, evolving SLIs/SLOs utilizing enterprise-class monitoring or observability solutions
Incident Management experience coupled with effective communication skills
Deep, fundamental understanding of Agile software development processes and procedures (e.g. Kanban, Secure Development Lifecycle)
Desirable Requirements:
Containerization experience
Experience in deployment and operations of Storage and/or VMWare platform solutions