Job Software Developer/ Engineer/ Architect

AVP - Site Reliability Engineer (SRE) Innovation Lab - Dublin

The ICG SRE team's primary responsibility is to help application teams improve their reliability through engineering. Instead of dashboards, red tape and manual intervention we focus on proactive alerting, eliminating toil, and addressing root causes.

Our Site Reliability Engineers are responsible for developing capabilities that allow application teams to improve reliability and accelerate software delivery. We take a holistic approach to SRE but tend to focus on deployment, automated testing, change control, observability, and incident response.

You will work with our colleagues in ICG DevOps and partners in CTI to develop end-to-end capabilities that application teams can easily adopt and adapt to their particular needs. We write a lot of APIs in Java and node.js, Kubernetes Operators in Go, and Ansible playbooks as well as using these tools to configure and integrate tools such as BitBucket, Artifactory, OpenShift, SonarQube, Harness.io, Splunk, Prometheus and Grafana.

You will work directly with application owners and senior engineers across ICG to help implement these capabilities in their applications; define SLIs and SLOs and build them into their observability stack; hold blame-free post-mortems; identify toil and eliminate it.

The successful candidate will have a strong background in software development and delivery; solid understanding of core software engineering principles and the demands of a live operational environment; a relentless desire to improve all aspects of system reliability; and a proven history of delivering it.

This is an excellent opportunity to join a new team and make a big impact across a large organization, and gain exposure to a wide variety of ICG businesses and technology initiatives.

Responsibilities

  • Develop new capabilities, co-ordinating implementation across a large number of teams including infrastructure, develper tools and information security
  • Prove and iterate capabilities by appplying them to real world applications
  • Work with application teams to advise capabilities they should adopt for maximum impact on reliability and software delivery performance
  • Take part in post-incident review and help guide changes to avoid future incidents in a blame-free post portem
  • Adapt existing capabilities to the application teams' specific circumstances, technology stack and business requirements
  • Advise on system architecture and application refactoring efforts
  • Learn from applications' internally developed best practices and help adopt them across the organisation as new capabilities

Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.

Qualifications

  • 1 - 3 years of relevant experience in SRE, Apps Development or Systems Analysis role
  • Programming experience in at least one of Java, JavaScript or Go
  • Experience of development, deployment and operation of software applications
  • Experience in implementing successful multi-stakeholder projects
  • Experience in maintaining and improving live software products over an extended period
  • Ability to adjust priorities quickly as circumstances dictate
  • Consistently demonstrates clear and concise written and verbal communication

Education

  • Bachelor’s degree/University degree or equivalent experience
  • Master’s degree preferred