Clearance: Public Trust Remote
– needs to be local candidate of Atlanta, GA Job Description We have an immediate opening for a Data Scientist to contribute to public health projects working within a cross-functional team for the National Center of Immunization and Respiratory Disease (NCIRD) at the Centers for Disease Control and Prevention (CDC).
The Data Scientist would analyze structured datasets to determine data relationships, model datasets for loading/storing into relational databases, create data pipelines and ETL processes leveraging the R environment and the Azure cloud stack of resources.
The Data Scientist may contribute to data analysis projects and data visualization projects, working with the team to connect data streams/pipelines to Power BI, Tableau, and R Shiny.
The Data Scientist may also contribute to data migration projects from on-premises SQL databases to the Azure SQL Databases or Azure Synapse databases working with the team.
This role will work closely and collaboratively with the Leidos Data Science team, CDC staff, and partners.
This position requires an entrepreneurial mindset and strong communication skills to meet with customers to translate their requirements to working data solutions, while operating within the government’s guidelines and mandates.
This role provides the opportunity to work across a number of technologies within a growing Data Science discipline at the CDC to help solve tomorrow’s public health challenges.
Day to Day Responsibilities/Duties: Design, develop, test, and implement fully automated, event-triggered, or scheduled production data pipelines using R programming and/or the Azure stack of tools and services working in a collaborative environment within a cross-functional team.Writes, tests, and implements R code for cleaning, wrangling, manipulating, and transforming data to prepare it for downstream storage processing such as insertion/updating of databases and for downstream analytical processing, including statistical computation, visualization, and standardized reporting.Develops data visualizations for public facing Power BI dashboards.Independently meets and clearly communicates with CDC subject-matter experts, CDC technical staff, and fellow NCIRD Data Science Team members to extract project requirements, translate them into technical implementation plans, and to develop solutions to meet the requirements.Creates generalized functions and incorporates them into an R package maintained by the team to perform common data tasks.Attend team planning meetings, backlog refinement, daily stand-ups, and customer demos Qualifications: Bachelor’s Degree in Statistics, Biostatistics, Data Science, Analytics, Mathematics, Computer Science or Computer engineering, Computer or Management Information Systems, (or similar scientific degree), and 3 years of experience designing, developing, and implementing data pipelines or analytical processes using R Studio and/or MS Azure.High proficiency in R programming performing data cleaning, data wrangling, manipulation, and transformation to prepare data for visualization and analysis.
Expertise in writing R functions and in developing and maintaining in-house R packages.Expertise in R packages, especially dplyr, tidyverse, Shiny, plotly, knitr, and ggplot2.Skilled in developing dynamic documents, slide decks, and other deliverables using R Markdown.Skilled in computing various statistics using statistical methods to extract meaning from data.Experience merging and integrating disparate datasets and outputting to various target destination databases or systems.Experience using Application Programming Interfaces (APIs) to retrieve data or submit requests to online data systems.Experience with Power BI and R Shiny Dashboards.Knowledge and interest in Statistics, Machine Learning, and/or AI techniques.Ability to multi-task based on the project priorities and deliver the solutions on-time with excellent quality.Familiarity with GitLab (or GitHub) as a version control code repository as well as a project management tracking system.Understanding of the enterprise data architecture and data quality controls.Team player who thrives in a dynamic and sometimes fast-paced environment.Write technical documentation and create system architecture diagrams.Knowledge of Agile Development methodologies and the Software Development Lifecycle (SDLC).
Plus, but not required: University coursework in R, Statistical Computing, and/or Database Design and Administration.Programming experience in Python, SAS, SUDAAN.Familiarity with database stored procedures, tables, views, triggers, and queries.Experience using data management software and utilities such as Stat-Transfer, DBMS Copy.Experience with the O365 Application Power Automate.Work experience with Azure Data Factory, SQL Server or other Azure Databases.Hands on experience migrating complex data pipelines from on-premises into Azure cloud environments.Any of the following relevant certifications: Azure Data Engineer Certification, Azure Solution Architect, Microsoft Certified Solutions Associate, Solutions Expert or Database Administrator.Experience using MS SQL Server Management Studio to write SQL and T-SQL Code for interacting with project databases.
Familiarity with or willingness to learn to build and configure data workflows using Azure Data Factory, Azure Stream Analytics, Azure SQL Database/Warehouse, Azure Databricks, Delta Lake, Lake House, PySpark, and Scala.Familiarity with complex sample surveys and analysis.Familiarity with continuous integration and continuous delivery or continuous deployment (CI/CD) methods and tools.Experience with Lyx, LaTeX, and/or MiKTex.Experience or familiarity with Visual Studio Application Development in C#, C++.Experience or familiarity with Web Development Technologies: Ability to read/write HTML CodeAbility to write jQuery and/or JavaScript CodeKnowledge of .NET Web Development and Deployment TechnologiesExperience or education in the Public Health, Epidemiology, or Medicine domains.
ABBTECH is an EOE/Minorities/Women/Disabled Individuals/Veterans