Senior Linux HPC System Administrator

Date: Feb 1, 2021

Location: Corning, NY, US, 14831

Company: Corning

Corning is one of the world’s leading innovators in materials science. For more than 160 years, Corning has applied its unparalleled expertise in specialty glass, ceramics, and optical physics to develop products that have created new industries and transformed people’s lives.

At Corning, our growth is fueled by a commitment to innovation. We succeed through sustained investment in research & development, a unique combination of material and process innovation, and close collaboration with customers to solve tough technology challenges. We are a four-time National Medal of Technology winner thanks to our technology leadership and R&D environment, which attract and enable the best scientific minds in the world. This pipeline of talent has brought life-changing innovation to your fingertips for more than 160 years.

SCOPE/PURPOSE OF POSITION: As a member of the Scientific Computing team, you will lead and participate in the deployment, management, and optimization of systems, software, and processes in support of Corning’s Scientific Research HPC environment. You will work closely with other HPC System Engineers and with Corning’s Modeling and Machine Learning community to identify and provide solutions and technical support that enable Modeling and Scientific Computing objectives to be met.



• Configures, installs, maintains and upgrades HPC clusters (compute, storage, and network) and applications in support of research computing environments

• Leads and collaborates on projects to maintain and enhance system functionality in areas such as systems monitoring, scheduling and resource management, configuration management, and backups.

• Recommends and implements improvements to existing HPC system management utilities/tools

• Provides technical expertise to improve HPC cluster performance and resiliency

• Diagnoses, isolates and resolves complex application and system technical problems (hardware, software, network).

• Develops scripts and automation to enhance operational services and service quality

• Performs system tuning based upon proactive performance analysis

• Builds, installs, and supports scientific software (Commercial and Open Source)

• Remains current on new technologies as they relate to HPC. Evaluates, tests, documents, and recommends software and hardware.

• Supports compute, storage, and network technology evaluations and assessments

• Interacts with hardware and software vendors and Corning’s Global Sourcing Management team to execute purchases, renewals, and service contracts.

• Provides support and training to the user community

• Develops, implements, and documents system architectures, new capabilities, and operational standards

• Develops and maintains technical documentation for customer use

• Builds relationships that foster collaboration and partnerships to drive better services for the technology community

• Documents troubleshooting and operational techniques and best practices, and mentors other members of the HPC Operations team when necessary




• Bachelor’s degree (B.A/B.S). in Computer Science, Engineering, or related course of study, or equivalent combination of education and relevant experience



• Minimum of 7 years of Linux (RHEL, CentOS) System Administration experience in a large distributed computing environment

• Experience providing support for Linux HPC clusters used for scientific research is preferred.




 Demonstrated experience as a technical leader responsible for management and optimization of systems, tools, and processes within a large computing environment

• Extensive understanding of infrastructure technologies including of server, storage, network, database. and virtualization

• Experience configuring, managing, and optimizing large Linux clusters and servers

• Experience configuring, managing, and optimizing distributed and parallel file systems such as Lustre, GPFS, NFS, Ceph.

• Familiarity with high-performance networks such as Infiniband, and network management

• Strong scripting/programming capabilities with Python, Bash, Perl

• Experience managing virtualization platforms (VMWare, KVM, oVirt)

• Extensive knowledge of CentOS, RedHat and experience maintaining, upgrading, and tuning the Linux kernel

• Experience with installation and use of system configuration management and orchestration tools such as Puppet, Ansible, Chef, Cobbler

• Experience with installation and configuration of system management, monitoring/alerting tools (e.g. Ganglia, Nagios, Zabbix)

• Experience building applications from source and ability to troubleshoot compilation issues.

• Experience using containerized workflows

• Solid knowledge of protocols such as DNS, HTTP, LDAP, SMTP and SNMP

• Demonstrated ability to quantify, analyze and resolve complex system issues, determine root cause, and develop preventive actions

• Demonstrated ability to research, quickly identify and correct problems (debug) using system utilities and diagnostics.

• Demonstrated ability to perform complex performance analysis including system processes, I/O subsystems, networks and other related components.

• Excellent written and oral communication skills for interacting with customers, team members, and management

• Ability to work independently as well as collaboratively within a team to include the ability to manage or lead moderately complex projects

• Proactive and innovative, with ability to foresee and prevent potential problems

• Organizational and time management skills, exceptional follow-through, and ability to manage multiple priorities while keeping projects moving forward

• Ability to effectively communicate with people of diverse backgrounds and computer knowledge

• Passion for providing excellent customer service



• Experience integrating systems or designing solutions for HPC workloads

• Experience installing, configuring, and maintaining job management tools (such as PBS, SLURM, Moab, TORQUE, etc.)

• Experience with performance benchmarking using profilers and debuggers to recommend code improvements for scalability and performance.

• Experience configuring, installing and troubleshooting MPI and OpenMP preferred.

• Knowledge of containerization platforms and technologies such as Singularity and Kubernetes

• Experience with configuration and management of high-performance networks such as Infiniband or Omni-Path.

• Experience with Linux kernel development and the Linux development community

• Experience with on-prem and public cloud technologies (AWS, Azure, GCP), OpenStack


