Site Reliability Engineer

Pittsburgh, PA or Phoenix, AZ
Full Time
Information Technology

Job Description

• Collaborate with cross-functional teams to design, implement, and maintain highly available and resilient systems in a financial environment
• Performs maintenance, including installation of patches and upgrades. Coordinates with delivery teams to ensure changes are implemented in order to maintain a stable production environment.
• Monitor, analyze, and troubleshoot performance issues, ensuring optimal system performance and reliability
• Implement automation tools and practices to streamline operational processes and improve efficiency.
• Participate in incident response and resolution, minimizing downtime and impact on critical financial systems.
• Analyze and identify root cause of incidents. Gathers information and reports out on design, reliability, maintenance problems and bugs.
• Work closely with development teams to influence design and architecture decisions for improved reliability and scalability
• Conduct performance testing and capacity planning to proactively address system limitations
• Evaluate and implement cutting-edge technologies to enhance the reliability and efficiency of the organization’s infrastructure
• Collaborate with security teams to implement and maintain robust security measures for financial systems
• Document system configurations, procedures, and best practices to ensure knowledge transfer within the team

• Bachelor's degree in Computer Science, Information Technology, or a related field or equivalent industry experience in lieu of degree
• 3 years of experience as a System/Site Reliability Engineer or similar role in a large corporation
• Open to working all shifts
• Excellent problem-solving skills and the ability to thrive in a fast-paced, high-pressure environment
• Solid understanding of networking, security, and storage principles in a financial context
• Strong understanding of Agile Methodologies
• Strong communication skills with the ability to collaborate effectively across teams
• Proficient in operations analytics methodologies to drive performance improvement (e.g., Lean)
• Certifications (e.g., CISSP, CISM, ITSM, Network+) a plus

Candidates must have working knowledge across technology support areas:
• Server: Administration and troubleshooting in Linux and Windows as well as patching and basic scripting skills (PowerShell, Bash)
• Virtualization Solutions: Experience in VCE/UCP (including VMWare versions 6 and above), platform and network connectivity, and patching – understanding of current threat analysis and remediation trends, alongside Powershell and Linux scripting skills
• Storage: CIFS/NFS, Linux and Windows scripting, DPA reporting, Avamar and Data Domain administration, and solid understanding of Windows and Linux environments
• Database: Oracle, SQL, Mongo
• Middleware: Linux, Windows, WebSphere, Apache, IIS, WebLogic and Tomcat
• Mainframes: JCL, CICS SYSPLEX
• Networking: Strong understanding of the network protocols and OSI Model, as well as Network+ Certification
• Workflow and Knowledge Management: ServiceNow
• Collaboration Tools: TrueSight, Big Panda, Jira, and Confluence

Additional Details

Experience: 2-5 years