Citi’s technology team is growing at lightning speed, and we’re looking for talented technologists to help build the future of global banking. Our teams are creating innovations used across the globe – we’re changing the way people bank and how the world does business. Citi’s technology team supports business operations in 100+ countries, across multiple lines of business spanning both Institutional and retail businesses. The group works to optimize the IT environment by standardizing production platforms, reducing complexity, and introducing innovative solutions that provide new business capabilities, reduce total cost of ownership, and create a competitive advantage for Citi. Join an environment with a laser focus on growth and progress, and take your career to the next level through the power of Citi’s unmatched globality and vast expertise.
Responsibilities
The ICG SRE team's primary responsibility is to help application teams improve their reliability through engineering. Instead of dashboards, red tape and manual intervention we focus on proactive alerting, eliminating toil, and addressing root causes.
Our Site Reliability Engineers are responsible for developing capabilities that allow application teams to improve reliability and accelerate software delivery. We take a holistic approach to SRE but tend to focus on deployment, automated testing, change control, observability, and incident response.
You will work with our colleagues in ICG DevOps and partners in CTI to develop end-to-end capabilities that application teams can easily adopt and adapt to their particular needs. We write a lot of APIs in Java and node.js, Kubernetes Operators in Go, and Ansible playbooks as well as using these tools to configure and integrate tools such as BitBucket, Artifactory, OpenShift, SonarQube, Harness.io, Splunk, Prometheus and Grafana.
You will work directly with application owners and senior engineers across ICG to help implement these capabilities in their applications; define SLIs and SLOs and build them into their observability stack; hold blame-free post-mortems; identify toil and eliminate it.
The successful candidate will have a strong background in software development and delivery; solid understanding of core software engineering principles and the demands of a live operational environment; a relentless desire to improve all aspects of system reliability; and a proven history of delivering it.
This is an excellent opportunity to join a new team and make a big impact across a large organization, and gain exposure to a wide variety of ICG businesses and technology initiatives.
Responsibilities
Develop new capabilities, co-ordinating implementation across a large number of teams including infrastructure, develper tools and information security
Prove and iterate capabilities by appplying them to real world applications
Work with application teams to advise capabilities they should adopt for maximum impact on reliability and software delivery performance
Take part in post-incident review and help guide changes to avoid future incidents in a blame-free post portem
Adapt existing capabilities to the application teams' specific circumstances, technology stack and business requirements
Advise on system architecture and application refactoring efforts
Learn from applications' internally developed best practices and help adopt them across the organisation as new capabilities
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.
Qualifications
Education