Technology, science and job news

Site Reliability Engineer

Salesforce Ireland
Dublin, Ireland
November 09, 2021

Site Reliability Engineer IT

Salesforce is seeking a Site Reliability Engineering candidate to join the Site Reliability organization in our Dublin location. Working closely with our counterparts in the Infrastructure and R&D organizations, this organization provides a global team of engineers monitoring cloud service availability, supporting platform operations and resolving incidents. The incumbent in this role would demonstrate strong focus on tactical operations, as well as large-scale production engineering and orchestration.

The Site Reliability team keeps the Salesforce cloud and our customers protected. Come help us support the services that power the biggest Enterprise Cloud Computing company in the world!

Role Responsibilities - (Working as part of shift team - 4x10 hour shifts per week):

Continuous systems and service availability to maintain top performance for our customers
Incident management - Act in key support roles during major incidents
Also, participate in the technical review of the incident for problem management.
Work to automate detection and resolution of recurring issues in the production environment.
Ability to operate in the high-pressure environment and troubleshoot complex issues quickly, while successfully handling multiple priorities.
If it breaks, fix it fast, and ensure that any future problems will be automatically resolved.
Analyze performance related issues and work across the Technology organization to develop solutions to ensure service resiliency
Automate operations by developing software applications, tooling and API Integrations to connect disparate/distributed systems
Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth
Participate in on-call rotation

Requirements:

Experience in support of distributed systems, with Linux/UNIX administration and internals knowledge
Experience in some of the following programming languages: Python/Java/C++, good knowledge of bash scripting.
Good knowledge of basic large-scale Internet service architectures (DNS, HTTP, Load Balancing, ...)
Experience in container based architectures: Docker/ Kubernetes etc.
Strong understanding of monitoring implementations and administration
Experience in a role with hands on complex Technical Problem Solving and troubleshooting
Past experience in Incident Management and good understanding of ITIL service operation
Good verbal and written communications skills
Be curious and ask questions

Preferred Qualifications:

BS in Computer Science plus relevant job-related experience
Python certification
Red Hat Certification
AWS/GCP

Apply

Site Reliability Engineer

Related News