AI Researcher

R&D

Ra'anana

Description

Senior AI Researcher: LLM Performance on Massive GPU Clusters

Location: Tel Aviv

#Hybrid

DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions.

About the Role

DriveNets is seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization.

At DriveNets, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, we’re applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments.

Responsibilities

· Research, design, and implement new optimization strategies for large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization).

· Profile distributed training and inference pipelines to identify algorithmic, memory, and scheduling inefficiencies.

· Collaborate closely with systems, compiler, and infrastructure teams to co-design efficient communication topologies, memory management, and runtime scheduling.

· Develop internal tools for huge-scale LLM benchmarking, profiling, and automatic tuning.

· Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.

· Present research and engineering results to internal and external technical audiences.

Requirements

· Deep understanding of deep learning internals—transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior.

· Proven hands-on experience training or deploying LLMs on multi-GPU or multi-node clusters.

· Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks.

· Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large-scale training ecosystems.

· Demonstrated ability to translate theoretical optimization ideas into practical, production-level performance improvements.


Nice to Have

· Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large models across multi-node GPU systems.

· Experience co-designing algorithms and hardware (NVIDIA, AMD, TPU, or custom accelerators).

· Research or open-source contributions in distributed ML systems, model compression, or systems for ML.

· Exposure to energy or cost-aware optimization techniques in large-scale training environments.


Why DriveNets

DriveNets builds distributed systems that power some of the world’s largest networks. Now, we’re bringing that same scalability mindset to AI infrastructure—rethinking how performance, cost, and efficiency can coexist at scale.

You’ll work alongside systems engineers, ML researchers, network experts, and infrastructure specialists who collaborate to redefine what “optimized” means for large models. If your idea of fun is making massive AI workloads faster, leaner, and smarter, this is your stage.