Lead Site Reliability Engineer
Company: JPMorganChase
Location: Columbus
Posted on: April 2, 2026
|
|
|
Job Description:
Description Job Description There’s nothing more exciting than
being at the center of a rapidly growing field in technology and
applying your skillsets to drive innovation and modernize the
world's most complex and mission-critical systems. As a Lead Site
Reliability Engineer at JPMorgan Chase within Chase within the
Enterprise technology, engineering services and platform team, you
will independently solve complex and broad business problems with
simple and straightforward solutions. Through code and cloud
infrastructure, you will configure, maintain, monitor, support and
optimize applications and their associated infrastructure to
independently decompose and iteratively improve on existing
solutions. You are a significant contributor to your team by
sharing your knowledge of end-to-end operations, availability,
reliability, and scalability of your application or platform. Job
responsibilities Guides and assists others in the areas of building
appropriate level designs and gaining consensus from peers where
appropriate Collaborates with other software engineers and teams to
design and implement deployment approaches using automated
continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design,
develop, test, and implement availability, reliability,
scalability, and solutions in their applications Implements
infrastructure, configuration, and network as code for the
applications and platforms in your remit Collaborates with
technical experts, key stakeholders, and team members to resolve
complex problems Understands service level indicators and utilizes
service level objectives to proactively resolve issues before they
impact customers Leads the adoption of site reliability engineering
best practices within your team Production 24*7 support for
business-critical applications – be part of rotational on-call
support rota. Required qualifications, capabilities, and skills
Formal training or certification in software engineering concepts
with 10 years of applied experience. Proficient in site reliability
culture and principles and familiarity with how to implement site
reliability within an application or platform Proficient in at
least one programming language such as Python, Java/Spring Boot,
and shell scripting. Proficient experience in software engineering
and technical processes within a given technology discipline (e.g.,
Public Cloud, artificial intelligence, etc.) Experience in
observability such as white and black box monitoring, service level
objective alerting, and telemetry collection using tools such as
Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery
tools like Jenkins, spinnaker, or Terraform – configuration
management tools like SaltStack, ansible. Experience in managing,
administering and supporting enterprise level large scale Splunk,
ELK deployments catering application monitoring and observability
to large number of applications. Experience in managing,
administering and supporting vendor products such as Netcool,
Grafana, SCOM Familiarity with container and container
orchestration such as ECS, Kubernetes, and Docker Experience with
troubleshooting performance issues, common networking technologies
and issues Ability to contribute to large and collaborative teams
by presenting information in a logical and timely manner with
compelling language and limited supervision Ability to proactively
recognize road blocks and demonstrates interest in learning
technology that facilitates innovation Experience with large scale
enterprise level event streaming platforms likes Kafka Experience
in handling critical incident and change management – be part of
critical incident taskforce call. Familiarity of agile practices –
preferably, scrum and Kanban Preferred qualifications,
capabilities, and skills Ability to identify new technologies and
relevant solutions to ensure design constraints are met by the
software team Proven track record of initiating and executing ideas
that address complex business challenges. Networking and systems
Deep understanding of TCP/IP, DNS, load balancing, firewalls, and
VPN technologies Proficient tuning Linux performance and
troubleshooting system-level issues Collaborative leadership Proven
track record of mentoring junior engineers and promoting SRE
best-practice adoption across teams. Strong written and verbal
communication skills; comfortable presenting to technical and
non-technical stakeholders Certifications (a plus) AWS Certified
SysOps Administrator or Professional, Certified Kubernetes
Administrator (CKA), terraform associate level or equivalent
Keywords: JPMorganChase, Kettering , Lead Site Reliability Engineer, IT / Software / Systems , Columbus, Ohio