Site Reliability Engineer

Site Reliability Engineer – Long Term Project – Remote Genuent is Hiring a Site Reliability Engineer, a Long Term Project with a premiere client based in Los Angeles, CA. this is a 100 Remote opportunity for those based in the US and can operate in PST zone. If you’re a match, please apply or contact As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of the Cloud Platform. Technology Innovation Division Working in the Technology Innovation group, you will drive, develop, and maintain solutions for clients and colleagues. This is an exciting time of technology advancement and innovation across the bank, particularly within our technology teams. Responsibilities Design, develop and implement solutions that improve stability, security, scalability and availability of CNB’s software platforms. Design mechanisms for alerts and responses to identify and address reliability risks. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Design and run performance, capacity and monitoring tests. Create educational material such as cloud native sample apps and starter code, as well as contribute to holding cloud native educational events like hackathons and live coding sessions. Create educational documentation on how-to’s and best practices, and blog about use-cases and architectures that relate to cloud platforms Liaise with the team managing our public cloud environments, including setup, management, and troubleshooting Design, develop and implement solutions that improve stability, security, scalability and availability of CNB’s software platforms. Design mechanisms for alerts and responses to identify and address reliability risks. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Design and run performance, capacity and monitoring tests. Create educational material such as cloud native sample apps and starter code, as well as contribute to holding cloud native educational events like hackathons and live coding sessions. Create educational documentation on how-to’s and best practices, and blog about use-cases and architectures that relate to cloud platforms Liaise with the team managing our public cloud environments, including setup, management, and troubleshooting Qualifications 5+ years of experience in an Operational role, DevOps, SRE, or Software Engineering 5+ years of experience doing development in any of Java, NodeJS, .NET Core, Python 3+ years of experience with development or administration on any cloud platforms (Cloud Foundry, Heroku, AWS, Azure, Google Cloud, IBM Cloud, Bluemix, Kubernetes, and others). (The ideal candidate has significant experience with Platform as a Service cloud such as Cloud Foundry) Expertise in Prometheus (client library and application instrumentations, PromQL), Grafana (GraphQL, Metadata, Dashboard Skills), Dynatrace, Kubernetes, and PagerDuty with ITIL Background. Additional Skills and Knowledge 5+ years of experience developing applications with an active user base, and deploying to production and going through any change management process (Ideal candidate is able to engage in a detailed discussion about their change management process as well as its happypain points) Experience with Prometheus, Grafana, Splunk, Elasticsearch and Kibana Experience with Monitoring tools such as Datadog, Dynatrace etc. Experience with automating manual processes and tests Creativity, energy, and passion for leveraging technology to transform our industry the belief that automation is the only way A good understanding of modern, cloud centric architectures and DevOps principles Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting Providing standardized offerings to facilitate and ensure operational health of stacks throughout their lifecycle including metrics collection, aggregation, and visualization, inventory, capacity, and billingtag management Above average performance. You are competitive and passionate. You thrive on challenge and have a proven ability to set ambitious but achievable goals and surpass them

Site Reliability Engineer

Related Post

Chief Compliance OfficerChief Compliance Officer

Receiving AssociatesReceiving Associates

Management AnalystManagement Analyst