Software Developer/ Engineer/ Architect

Lead Site Reliability Engineer

The role Lead Cloud Applications Engineer is to build solutions to enhance availability, performance, and stability of OpenText services as well as automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation.  Your mission will be to use cutting edge technology for monitoring and maintaining the day to day operations of the entire production infrastructure for OpenText Discovery on our AWS platform. 

 

You are great at:

  • Provide attention to incidents according to Service Level Agreements.
  • Take ownership and accountability for the incident resolution process.
  • Provide a quality and timely response.
  • Act as a technical liaison with other teams to evaluate and report bugs.
  • Establish and maintain a good relationship with team members, Product Development, Customer Service and Sales.
  • Participate in training and information sharing activities.
  • Act as backup for other team members when necessary.
  • Hands on experience with cloud infrastructure; AWS & GCP a plus
  • Experience with Ansible and Kubernetes
  • Experience with installing and configuring Apache and Tomcat.
  • Deep expertise in Monitoring distributed systems application architectures
  • Exposure to & maintenance of configuration management tools at scale
  • Diagnosing & troubleshooting user facing service incidents & outages
  • Exposure to system & application level telemetry for large distributed cloud architectures
  • Diagnosing, resolving problems in high-throughput web applications & network services
  • Expert level troubleshooting skills across different levels of the solution stack
  • Ability to lead, drive and implement highly scalable and complex solutions
  • A strong understanding of Security best practices.
  • Experience with container management and micro-services architectures such as Docker
  • Application clustering / load balancing concepts and technologies
  • Understanding network topologies and common network protocols and services (DNS, HTTP(S), SSH, FTP, SMTP, DHCP, TCP, IP etc.)
  • Experience monitoring cloud services with Dynatrace, New Relic, Icinga, Nagios, BMC or any HPE tools
  • Experience migrating existing on-premise applications and services to AWS
  • Awareness and insight into industry trends (technology, methods and tooling)