Data Engineer

Apply now »

Date: Apr 13, 2024

Location: Gurgaon, HR, IN, 122002

Company: Corning

Requisition Number: 62779

 

Corning is vital to progress – in the industries we help shape and in the world we share.

We invent life-changing technologies using materials science. Our scientific and manufacturing expertise, boundless curiosity, and commitment to purposeful invention place us at the center of the way the world interacts, works, learns, and lives.

Our sustained investment in research, development, and invention means we’re always ready to solve the toughest challenges alongside our customers. 

As a leading developer, manufacturer, and global supplier of scientific laboratory products for 100 years, Corning’s Life Sciences segment collaborates with researchers seeking new approaches to increase efficiencies, reduce costs and compress timelines in the drug discovery process. Using unique expertise in the fields of materials science, surface science, optics, biochemistry and biology, the segment provides innovative solutions that improve productivity and enable breakthrough discoveries.

 

 

Scope of Position:

The Data Engineer, Analytics will be responsible for the architecture, implementation and governance of Corning Life Sciences’ data lake supporting the division’s centralized analytics platform. You will be joining an exciting, newly formed analytics center of excellence for the Corning Life Science division. Our mission is to add value to the business by utilizing our expertise in data engineering, statistics, and machine learning to identify and address the biggest opportunities to grow our $1B+ business. This position is located in our Gurugram, India office.

 

Required Education and Years of Experience:

  • Bachelor's degree in Computer Science, Engineering, or related discipline
  • 3+ years of experience in data engineering roles, developing and maintaining production ETL and ELT pipelines for data warehousing, on-premise or in cloud-based data lake environments
  • 1+ years of demonstrated production programming proficiency in at least one scripting language such as Python
  • 1+ years of experience developing data ingestion pipelines using Apache Spark APIs such as pySpark

 

Required Skills

  • Technical familiarity with Apache Spark architecture, S3, parquet and Delta Lake architecture, technologies, and tools
  • Experience with agile software development & continuous integration + continuous deployment methodologies along with supporting tools such as Git (Gitlab),and  Jira
  • Experience with established enterprise ETL and integration tools such as Informatica, Mulesoft
  • Proven success in collaborating with other technical and non-technical teams to collect and understand requirements, and describe data modeling decisions within business context
  • Excellent organizational skills including prioritization of multiple concurrent projects while still delivering timely and accurate results

 

Day to Day Responsibilities:

  • Design, test, deploy and maintain production big-data ingestion pipelines using agile software development and continuous delivery and/or continuous deployment (CI/CD) practices, collaborating closely with the advanced analytics platform team
  • Work with cross-organizational data source teams to define data ingestion requirements for structured, unstructured and semi-structured data, pilot their implementation, and ensure user acceptance
  • Define and implement automated validation and profiling capabilities needed to ensure reliable data delivery, using agile software development and CI/CD practices
  • Work with data source teams, domain experts, analysts and data scientists to define and develop data transformation, cleansing and data enrichment processes
  • Actively participate in code reviews and technical information sharing with your team members and the broader software engineering community at Corning
  • Develop and implement data governance processes to support a robust and well documented data lake environment
  • Stay up to date with industry standards and technological advancements that will improve the quality, productivity and performance of your work
  • Provide support in a DevOps environment to monitor overall system performance

 

Desired Experience / Qualifications / Skills:

  • Master’s degree in Computer Science, Engineering, or related discipline
  • Familiarity with Oracle, Microsoft SQL Server, SSIS, SSRS data technologies
  • Familiarity with the Databricks Platform and notebook environments
  • Familiarity with reporting and analysis tools such as PowerBI, Tableau, or SAS JMP

Apply now »