What You Will Do: Design, build and evolve our production cloud infrastructure, strategically employing automation, and infrastructure-as-code (IaC).
Use depth of cloud experience to put together elegant designs that the team will enjoy supporting.
Expertise in incident and problem management including timely problem identification, successful resolution, and root-cause analysis Design, build and evolve our code pipelines, designing and building automation in order to enable agile software development, using self-service where possible.
Partner with the Site Reliability Engineering team to improve efficiency and effectiveness of our monitoring and alerting platform Embed and work cross-functionally with SRE, Security and Engineering teams Exercise and promote security best practices throughout your workflow.
Who You Are: You have 4 years of experience working with and troubleshooting Linux operating systems Experience with configuration management tools (Ansible, Chef, Puppet, etc) You are passionate about making better software and continuously improving the development, integration, and deployment processes You are passionate about making better software and continuously improving the development, integration, and deployment processes You have experience supporting high traffic and public facing websites, applications, and services You have heavily utilized scripting languages (Bash, Perl, Python or Ruby) in your toolkit You possess exceptional communication skills
– written and verbal and are able to communicate cross functionally with both technical and non technical audiences You love problem solving and thrive in environments of both high collaboration and strong autonomy Strong working knowledge of containers, container orchestration, and AWS environment and tools.
Strong expertise in monitoring tools (AppDynamics/App Insights/Sumo Logic/etc.) Strong experience in container orchestration (Kuberenetes,Fargate, EKS,OpenShift)