RecruiHcare: Remote Cloud Site Reliability Engineer Manager

CompanyFor more than 30 years, we have been providing industry-specific, cloud-based business management software and services to small and medium-sized businesses.

With divisions focused on manufacturing, wholesale/retail distribution, building and construction, and field service, we integrate into every aspect of a customers’ business to help them level the playing field, run day-to-day operations more efficiently, and free them up to focus on what matters most.

It’s how business gets done.

Privately held with more than 16,000 customers, we are headquartered in Fort Worth, Texas, USA, with offices and companies throughout the U.S., Australia, New Zealand, England and the Netherlands.

Overview**Must have software deployment cloud experience.**Must have experience managing a small team.The Cloud Site Reliability Engineer Manager is critical to the success of the company and its customers.

The SRE will fill the gap between Cloud OPs and the Product/Dev Teams focusing on improving the delivery of our products to the customers.

This role will work closely with the product dev teams, participating in weekly sprint planning to provide support/consulting and advocate for the improvements needed to provide a world class hosting experience.

The SRE will be responsible for building systems and tooling to enable and empower the dev teams to work more efficient while keeping a cloud-first mentality.

This is an internal product-facing role that will work and collaborate closely with development teams, support teams, architects and peer engineers for planning, development, and implementation of solutions for various systems.ResponsibilitiesDrive Thought Leadership into the team driving towards the SRE Mission StatementManage the Team’s weekly project work through status updates against deadlinesDrive Monthly reports from each SRE for their BU’s uptime and reliability metrics Handle HR duties, Review and approve Time-Off, Coordinate Schedules, Perform Annual Reviews Design processes for improving operational stability of the Cloud.Identify, document and help improve performance and operational efficiency challengesCreate tooling with documentation to scale the CloudValidate and enforce best application security practicesParticipate in incident management on-call rotation and drive root cause analysisCollaborate with engineering teams, product owners and other stakeholders to develop tooling and CI/CD proceduresContinual development of monitoring tools and best practicesHelp drive capacity requirements and planningAbility to function in a Dev Ops atmosphereSupport and manage cloud infrastructure and environments (AWS, Azure, IBM, Private Cloud)Ability to drive standardization across the team by building repeatable processes to ensure Cloud StabilityComplies with security standards and technical designComplies with ITSM standards and practicesQualities and Skills RequiredBachelor’s Degree in Computer Science, Engineering, IS5+ years’ experience in a 24×7 high-availability production Cloud environmentConfiguration management and automation tools such as Ansible, Terraform, vRA, etcExperience with CI/CD tools and implementing best practicesStrongly prefer prior experience in Microsoft Windows (Server and Guest OS)Experience with virtualization technologies such as VMWareExperience with Active Directory, PowerShell, SQL, Microsoft Remote Desktop ServicesExperience with configuring and extending monitoring toolsA background in automating the management of a data center environmentExperience with cloud-based IAAS (AWS, IBM Cloud)Good understanding of Software Development LifecycleExcellent analytical and problem-solving skillsA passion for system stability, performance, scalability and customer successAbility to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiativesStrong interpersonal and team building skillsThe desire to take advantage of training and learning opportunitiesKey qualificationsSkills: Shell Scripting, SQL, Amazon Web Services, VMWare, Month End Close, ITSM, SDLC, Microsoft Windows, Azure, TerraformMinimum education: BachelorsYears experience: 5+ yearsSchedule details: 5 days/week

RecruiHcare: Remote Cloud Site Reliability Engineer Manager

Related Post

Salesforce AdminSalesforce Admin

Software Engineer, Enterprise Integration (Remote OK)Software Engineer, Enterprise Integration (Remote OK)

ADP: Lead Front End Developer, AngularADP: Lead Front End Developer, Angular