Technology, science and job news

Lead Site Reliability Engineer

Opentext
Cork, Ireland
August 24, 2021

The role Lead Cloud Applications Engineer is to build solutions to enhance availability, performance, and stability of OpenText services as well as automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. Your mission will be to use cutting edge technology for monitoring and maintaining the day to day operations of the entire production infrastructure for OpenText Discovery on our AWS platform.

You are great at:

Provide attention to incidents according to Service Level Agreements.
Take ownership and accountability for the incident resolution process.
Provide a quality and timely response.
Act as a technical liaison with other teams to evaluate and report bugs.
Establish and maintain a good relationship with team members, Product Development, Customer Service and Sales.
Participate in training and information sharing activities.
Act as backup for other team members when necessary.

Hands on experience with cloud infrastructure; AWS & GCP a plus
Experience with Ansible and Kubernetes
Experience with installing and configuring Apache and Tomcat.
Deep expertise in Monitoring distributed systems application architectures
Exposure to & maintenance of configuration management tools at scale
Diagnosing & troubleshooting user facing service incidents & outages
Exposure to system & application level telemetry for large distributed cloud architectures
Diagnosing, resolving problems in high-throughput web applications & network services
Expert level troubleshooting skills across different levels of the solution stack
Ability to lead, drive and implement highly scalable and complex solutions
A strong understanding of Security best practices.
Experience with container management and micro-services architectures such as Docker
Application clustering / load balancing concepts and technologies
Understanding network topologies and common network protocols and services (DNS, HTTP(S), SSH, FTP, SMTP, DHCP, TCP, IP etc.)
Experience monitoring cloud services with Dynatrace, New Relic, Icinga, Nagios, BMC or any HPE tools
Experience migrating existing on-premise applications and services to AWS
Awareness and insight into industry trends (technology, methods and tooling)

Apply

Lead Site Reliability Engineer

Related News