Senior Product Manager - AI/HPC Network Infrastructure Management System

Product & Solutions Engineering

Ra'anana

Description

Senior Product Manager - AI/HPC Network Infrastructure Management System

#LI-Hybrid

DriveNets is a leader in cloud-native networking software for hyperscalers and service providers who are building the largest infrastructures in the world for network services, AI platforms and SaaS offerings. Founded in December 2015, DriveNets disrupted some of the most challenging high-scale markets, transforming the way Networks are built, scaled, and consumed. We also built the largest network in the world, with more than half of AT&T’s backbone running on DriveNets’ Network Cloud. DriveNets has raised $587 million in three funding rounds which enable us to dream big and bring on the most talented people.

The role 

DriveNets is seeking a Product Manager focused on Switch network management and orchestration of AI/HPC GPU fabrics network to be a key member of our Product Management team. Join a dynamic and forward-thinking company at the forefront of network transformation. We leverage advanced technologies to develop innovative solutions that drive efficiency, scalability, and exceptional network performance. Collaborate with the industry’s best as we partner with hyperscalers, emerging NeoClouds , and enterprises building AI/HPC GPU fabrics, shaping the future of GPU Ethernet interconnect. Our environment fosters creativity, teamwork, and growth, and offers you the opportunity to make a meaningful impact while working on groundbreaking projects.

As a Product Manager focused on Network Management and Orchestration, you will drive the development of roadmap strategy and features that streamline network deployment, monitoring, troubleshooting, and more. Your work will directly impact the efficiency and performance of diverse GPU clusters, enabling operation at scale with greater uptime, utilization and reliability

Responsibilities

  • Act as a Network Orchestration and management system expert specializing in network management, telemetry, and operations (OPS) automation, providing strategic direction and expertise to enhance our orchestration capabilities for our customers.
  • Design and innovate network orchestration solutions for our Network Cloud Product, focusing on streamlining GPU fabric deployments, manageability, and automation for Hyperscaler, NeoClouds, and enterprise customer networks.
  • Conceptualize and architect network automation and orchestration platforms, enabling end-to-end lifecycle management for switched and routed networks, including bootstrapping, deployment, and ongoing operational management.
  • Create, manage, and present product roadmaps and technology solutions, addressing network orchestration requirements and ensuring alignment with market and customer needs.
  • Collaborate closely with development teams to drive requirements for the management platform and routing and switching technologies that streamline network architecture.
  • Demonstrate proof-of-concept solutions to customers, showcasing the value and capabilities of our offerings.
  • Work effectively with sales teams and other internal groups within DriveNets to support customer engagements and drive business success.

Requirements

Requirements

What we need to see:

  • 10+ years of experience in the network communications industry, with at least 5 years working for or with a large data center network provider.
  • Deep understanding of deployment flows, monitoring schemes, troubleshooting, and auto recovery capabilities.
  • Knowledge of orchestrating and/or automating other data center networking router/switch vendor solutions in customer networks is desirable.
  • Proven experience in end-to-end product management or in the operation of a large data center network.
  • Skilled in developing and tailoring product offers in collaboration with cross-functional teams such as engineering, operations, and external stakeholders and partners.
  • Demonstrated experience in bringing network orchestration, management, and automation platforms to market, overseeing the entire product lifecycle.
  • Knowledge and experience with AI-related projects, large language models, and strategic initiatives, leveraging AI to enhance product capabilities and competitiveness, is a plus.
  • Knowledge in Data Center or high-end enterprise network design and features (e.g. BGP, EVPN, VXLAN, QoS, Multicast)
  • Basics of cloud technologies and solution offerings from AWS, Azure, Google Cloud, and their related technologies like Kubernetes, Docker, and Service Meshes.
  • Expertise in network management, incident management, fault isolation, telemetry and observability, and root cause analysis.
  • Flexible and adaptable, able to handle a diverse set of daily activities and effectively adjust to shifting priorities in a fast-paced environment.
  • Ability to multitask efficiently in a multifaceted environment, ability to work with teams across geographical locations.
  • Clear written and oral communication skills with the ability to effectively collaborate with executives and engineering teams.
  • Ability to write extensive technical content (white papers, technical briefs, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging
  • Travel as needed

 

Ways to stand out from the crowd:

·                     Low-level technical expertise in Data Center or high-end enterprise network design and features (e.g. BGP, EVPN, VXLAN, QoS, Multicast)

·                     Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, NIC, DPU, etc.

·                     Understanding of AI/HPC networking infrastructure solutions, their advantages and disadvantages (AI/HPC networking design, high-speed interconnect technologies)

·                     Scale-up – NVLink, UALink, Scale-up Ethernet (SUE), etc

·                     Scale-out – Ethernet and Enhanced Ethernet (Scheduled Ethernet, dynamic load balancing and adaptive routing, Spectrum-X, UEC, etc), InfiniBand

·                     Understanding of data center operations fundamentals in networking, cooling, and power

·                     Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).

·                     Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP, or OCI) or emerging Neoclouds, as well as cloud-native architectures and software.


EDUCATION

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.


 If your experience is close but doesn’t fulfill all requirements, please apply. DriveNets is on a mission to build a special company comprised of individuals with different backgrounds, perspectives, and experiences.