We are seeking a highly skilled and experienced Service Delivery Manager to oversee the setup, management, and optimization of data center infrastructure, with a strong focus on networking and virtualization. The ideal candidate will possess a deep understanding of network security, client management, and best practices in data center operations. This role requires excellent leadership, communication, and technical skills to ensure the successful delivery of services to our clients.
Responsibilities:
Data Center Infrastructure Setup: Lead the planning, design, and implementation of data center infrastructure, including servers, storage, networking equipment, and virtualization technologies.
Service Delivery Management: Oversee the delivery of services to clients, ensuring adherence to service level agreements (SLAs), quality standards, and project timelines.
Networking and Virtualization Expertise: Provide expertise in networking technologies, including LAN, WAN, VLAN, routing, and switching. Manage virtualization platforms such as VMware, Hyper-V, or KVM to optimize resource utilization and performance.
Network Security: Implement and maintain robust network security measures to safeguard data center infrastructure from cyber threats and unauthorized access. Ensure compliance with industry standards and regulatory requirements.
Client Management: Serve as the primary point of contact for clients, addressing their requirements, concerns, and escalations in a timely and professional manner. Build strong relationships with clients to understand their business needs and align service delivery accordingly.
Team Leadership: Supervise a team of technical professionals, providing guidance, mentorship, and support to ensure high performance and professional development.
Performance Monitoring and Optimization: Monitor the performance of data center infrastructure, identifying areas for improvement and implementing optimization strategies to enhance efficiency, reliability, and scalability.
Risk Management: Assess risks related to data center operations and develop mitigation plans to minimize downtime, data loss, and security breaches.
Documentation and Reporting: Maintain accurate documentation of data center configurations, processes, and procedures. Generate regular reports on service delivery metrics, performance trends, and client satisfaction levels.
Requirements:
Bachelor's degree in computer science, information technology, or a related field. Master's degree preferred.
Proven experience in a similar role, with a focus on data center infrastructure setup, networking, virtualization, and client management.
In-depth knowledge of networking protocols, security protocols, and best practices in network design and implementation.
Hands-on experience with virtualization technologies such as VMware vSphere, Microsoft Hyper-V, or KVM.
Strong understanding of network security principles, including firewalls, intrusion detection/prevention systems, VPNs, and encryption techniques.
Excellent leadership and communication skills, with the ability to effectively manage teams, interact with clients, and collaborate with cross-functional stakeholders.
Industry certifications such as CCNA, CCNP, CCIE, VCP, or equivalent certifications preferred.
Proven track record of delivering projects on time and within budget, while maintaining high levels of customer satisfaction.
Ability to work in a fast-paced environment and adapt to changing priorities and requirements.
Strong analytical and problem-solving skills, with a keen attention to detail and a commitment to continuous improvement.
Read LessJob Title: Senior Engineer-HPC
Department: Production & Support
Location: Faridabad
Position Summary:
Accomplished HPC Systems Engineer with 8-10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing, implementing, and optimizing HPC infrastructure, including compute, storage, and high-speed networking, to deliver maximum performance for demanding workloads.
Key Responsibilities:
HPC Cluster Management & Optimization
Design, implement, and maintain HPC environments, including compute, storage, and network components.Configure and optimize Slurm, PBS Pro, or other workload managers/schedulers for efficient job scheduling and resource allocation.Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands.Manage HPC filesystem solutions such as Lustre, BeeGFS, or GPFS/Spectrum Scale.Linux Administration
Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments.Manage kernel upgrades, patching, and security hardening.Troubleshoot kernel-level and system-level issues for performance and stability.Automation & Configuration Management
Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems.Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices.Automate cluster deployment and environment consistency across hundreds of nodes.Monitoring, Troubleshooting & Support
Implement and maintain monitoring tools (e.g., Grafana, Prometheus, Nagios, Ganglia).Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks.Provide Tier-3 escalation support for Linux/HPC-related incidents.Collaboration & Documentation
Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions.Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies.Requirements
Required Skills & Experience
8-10 years of hands-on Linux system administration experience in production environments.5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage).Strong Ansible automation skills (complex playbooks, roles, variables, templates).Deep understanding of MPI, OpenMP, and GPU/accelerator integration in HPC workloads.Proficient with HPC job schedulers (Slurm, PBS Pro, LSF).Experience with HPC storage (Lustre, BeeGFS, GPFS).Strong knowledge of TCP/IP networking, Infiniband, and RDMA technologies.Experience with performance tuning and benchmarking tools (perf, hpc tool kit, Intel VTune, Iperf, fio).Scripting proficiency in Bash, Python, or Perl for automation and tooling.Preferred Qualifications
Experience with containerized HPC (Singularity, Apptainer, or Podman).Familiarity with cloud-HPC integration (AWS Parallel Cluster, Azure Cycle Cloud, GCP HPC).Knowledge of security compliance standards (CIS benchmarks, STIG).Contribution to HPC community tools or open-source projects.Soft Skills
Strong problem-solving and analytical thinking.Ability to mentor junior engineers and collaborate across teams.Excellent communication skills for technical and non-technical stakeholders. Read Less