Share this Job

Data Engineer

Apply now »

Date: Nov 19, 2022

Location: Shanghai, SH, CN, 200031

Company: Corning

Requisition Number: 57670


Corning is one of the world’s leading innovators in materials science. For more than 160 years, Corning has applied its unparalleled expertise in specialty glass, ceramics, and optical physics to develop products that have created new industries and transformed people’s lives.

Corning succeeds through sustained investment in R&D, a unique combination of material and process innovation, and close collaboration with customers to solve tough technology challenges.

The global Information Technology (IT) Function is leading efforts to align IT and Business Strategy, leverage IT investments, and optimize end to end business processes and associated information integration technologies.  Through these efforts, IT helps to improve the competitive position of Corning's businesses through IT enabled processes.  IT also delivers Information Technology applications, infrastructure, and project services in a cost efficient manner to Corning worldwide.


The Platform Data Engineer will be key to data development activities working with domain experts, application developers, controls engineers, data engineers and data scientists. Their primary responsibility will be to develop productized, reliable and instrumented data ingestion pipelines that land inbound data from multiple process and operational data stores throughout the company to onpremise and cloud-based data lakes. These pipelines will require data validation and data profiling automation along with version control and CI/CD to ensure ongoing resiliency and maintainability of the inbound data flows supporting our advanced analytics projects. These systems need to be reliable, environment agnostic and portable across on-premise and cloud compute environments.


As a Data Engineer for our advanced analytics platforms, your main responsibilities will be to:

• Design and implement patterns of practice for productized, portable, modular, instrumented, CI/CD automated and highly performant data ingestion pipelines that leverage structured streaming techniques, processing both batch and streamed data in unstructured, semistructured and structure form, using Apache Spark, Deltalake, Delta Engine, Hive and other relevant tech stacks

• Ensure that data ingestion pipelines built with these patterns validate and profile inbound data reliably, identify anomalous or otherwise unexpected data conditions, and are able to trigger appropriate remediation actions by operations staff when needed

• Work with data source domain experts both within and outside the company that understand the value delivery potential their data, and collaborate to harvest, land and prepare that data at scale

• Ensure pipelines built with these patterns are architecturally and operationally integrated with data contextualization, feature engineering, outbound data engineering and production inferencing pipelines designed by your core platform development peers

• Deliver and present proofs of concept implementations that explain the key technologies you have selected for your design and the recommended patterns of practice for ongoing development and lifecycle management. The target audience for these efforts span the Corning Restricted company and include project stakeholders, data scientists, process experts, other core software engineering team members and relevant technical communities of practice interested in leveraging your code for their own projects

• Work with your fellow developers using agile development practices, and continually improving development methods with the goal of automating the build, integration, deployment and monitoring of ingestion, enrichment and ML pipelines

• Using your expertise and influence, help establish patterns of practice for the above, and encourage their adoption by software and data engineering teams across the company

• Work with the relevant communities of practice on component roadmaps, and serving as a trusted committer for your code for inner sourcing efforts with other development teams in the company

Education & Experience

• Advanced degree in computer science strongly preferred, but at a minimum a bachelor's degree in computer science, engineering, mathematics, or a related technical discipline.

• 5 years of programming proficiency in, at least, one modern JVM language (e.g. Java, Kotlin, Scala nice to have) and at least one other high-level programming language such as Python must

• 5+ years of full-stack experience developing large scale distributed systems and multi-tier applications

• Expert level proficiency with agile software development & continuous integration + continuous deployment methodologies along with supporting tools such as Git (Gitlab), Jira, Terraform, New Relic nice to have

• Expert level proficiency with both traditional relational and polyglot persistence technologies

• 5+ years of experience in big data engineering roles, developing and maintaining ETL and ELT pipelines for data warehousing, on-premise and cloud datalake environments

• 5+ years of production experience using SQL and DDL must , years could be less 5 yrs

• 3+ years of experience high-level Apache Spark APIs (Scala, PySpark, SparkSQL), and demonstrated strong, hands-on technical familiarity with Apache Spark architecture,

• 3+ years of developing batch, micro-batch and streaming ingestion pipelines on the Apache Spark platform, leveraging both low level RDD APIs and the higher-level APIs (SparkContext, DataFrames, DataSets, GraphFrames, Spark SQL).

• Demonstrated deep technical proficiency with Spark core architecture including physical plans, UDFs, job management, resource management, S3, parquet and Delta Lake architecture, structured streaming practices

• 3+ years DevOps experience with AWS must have platform services, including AWS S3 & EC2, Data Migration Services (DMS), RDS, EMR, RedShi0ft, Lambda, DynamoDB, CloudWatch, CloudTrail

• Demonstrated experience working with inner sourcing initiatives, serving both as a trusted committer and contributor

• Strong technical collaboration and communication skills

• Unwavering commitment to coding best practice and a strong proponent of code review

• Cultural bias towards continual learning, sharing best practice, encouraging and elevating less experienced colleagues as they learn

• Proven success in communicating with users, other technical teams, and senior management to collect requirements, describe data modeling decisions and data engineering strategy

Additional Technical Qualifications

• Proficiency with functional programming methods and their appropriate use in distributed systems

• Expert proficiency with data management fundamentals and data storage principles

• Expert proficiency with AWS foundational compute services, including S3 and EC2, ECS and EKS, IAM and CloudWatch

• Prior full-stack app development experience (front-end, back-end, microservices

• Proficiency working with Ceph, Kubernetes and Docker nice to have

• Familiarity with the following tools and technology practices: o Oracle, Microsoft SQL Server, SSIS, SSRS o Established enterprise ETL and integration tools including Informatica, Mulesoft o Established opensource data integration and DAG tools including NiFi, Streamsets, Airflow o Data sources and integration solutions commonly used in manufacturing enterprises, including Pi Integrator, Maximo o Reporting and analysis tools including PowerBI, Tableau, SAS JMP nice to have Other Qualifications

• Strong relationship building skills Important

• Proven success working in highly matrix environment.

• Strong bias for action and an ability to deliver results despite the complex and fluid environment.

• Excellent analytical and decision-making abilities.

• Must have a passion for success.

• Must demonstrate a proven willingness to go the extra mile, to take on the things that need to be done and maintain a positive attitude that can adapt to change.

• Strong leadership and excellent verbal and written communications skills, with the ability to develop and sell ideas. English (workable and has potential to develop further) vs Technical background