Site Reliability Engineer – Storage

Site Reliability Engineer – Storage

Location: Sacramento, California, United States

THE ROLE:

Quickly maturing startup seeking like-minded Site Reliability Engineer! The technical team is a small, talented, and close-knit group and we need some development and systems help to make business and development operations flow smoothly.

As a well-rounded site reliability engineer, you should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level!

WHAT YOU LL BE DOING:

  • Managing and automating the care for Linux systems and a lot of disks at scale.
  • Extending the server configuration management systems with new features with Salt.
  • Refactoring existing system management in Ansible as needed, or migrating to Salt.
  • Working autonomously, or with the software engineering team, to troubleshoot and solve complex or unintuitive system issues.
  • Work with the software engineers to achieve 100% self-service automation of build pipelines.

WHAT YOU BRING:

As a well-rounded system engineer and scripter, with a diverse set of skills, this makes you one of the very best people to troubleshoot, monitor the platform, and be on top of releases. You should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level!

  • Experience working in an environment leveraging remote communication collaboration tools like slack, zoom etc. across multiple time zones
  • Experience with git in a multi-contributor/team environment
  • High degree of drive to improve and automate your environment with minimal guidance
  • Be able to solve for the immediate, and plan to accommodate for future problems
  • Experience in automating tasks through scripting. You should be able to use Python and be familiar with a variety of packages.
  • Extensive experience administering a variety of Linux distributions
  • Extensive experience with Ansible, Salt, Terraform
  • Experience with bare metal hardware including physical servers, JBODs, physical cabling, and networking equipment.
  • Experience with ZFS, XFS, GPFS, Ceph, or other distributed file systems
  • Solid understanding of web protocols such as HTTP, TLS, HTTP/2, Server send events, CDN
  • Solid understanding of nginx and SSL

Preferred Experience

  • Experience with Grafana
  • Experience managing Cassandra installations
  • Experience in PXE based deployments
  • Experience with a message queue system like RabbitMQ or Kafka
  • Experience with build pipelines, integration testing, Jenkins, and github actions

Requirements

  • You can be located anywhere in the world, but we do keep a balance in distribution between time zones. Currently this role is only for those who can work standard North American working hours (work day starting somewhere in UTC -5 to UTC -8). Tags:

Related Post

Operations AnalystOperations Analyst

We Are: Accenture Federal Services, helping our federal clients tackle their toughest challenges while unleashing their fullest potentialand then some. What makes our approach so unique? Operating from the nation’s