The ICG SRE team's primary responsibility is to help application teams improve their reliability through engineering. Instead of dashboards, red tape and manual intervention we focus on proactive alerting, eliminating toil, and addressing root causes.
Our Site Reliability Engineers are responsible for developing capabilities that allow application teams to improve reliability and accelerate software delivery. We take a holistic approach to SRE but tend to focus on deployment, automated testing, change control, observability, and incident response.
You will work with our colleagues in ICG DevOps and partners in CTI to develop end-to-end capabilities that application teams can easily adopt and adapt to their particular needs. We write a lot of APIs in Java and node.js, Kubernetes Operators in Go, and Ansible playbooks as well as using these tools to configure and integrate tools such as BitBucket, Artifactory, OpenShift, SonarQube, Harness.io, Splunk, Prometheus and Grafana.
You will work directly with application owners and senior engineers across ICG to help implement these capabilities in their applications; define SLIs and SLOs and build them into their observability stack; hold blame-free post-mortems; identify toil and eliminate it.
The successful candidate will have a strong background in software development and delivery; solid understanding of core software engineering principles and the demands of a live operational environment; a relentless desire to improve all aspects of system reliability; and a proven history of delivering it.
This is an excellent opportunity to join a new team and make a big impact across a large organization, and gain exposure to a wide variety of ICG businesses and technology initiatives.
Responsibilities
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.
Qualifications
Education