R&D
Tel Aviv
Location: Tel Aviv/ Ra'anana
Job Summary
DriveNets is seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization.
At DriveNets, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, we’re applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments.
Key Responsibilities
● Conduct cutting-edge research in artificial intelligence and machine learning, from problem formulation to experimental validation.
● Research, design, implement and evaluate novel algorithms, models, optimization strategies and architectures across areas of large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization).
● Translate research ideas into working prototypes and production-ready solutions.
● Stay up to date with state-of-the-art research, frameworks, and emerging trends in the AI ecosystem.
● Publish research findings internally and externally (papers, technical reports, blog posts, or patents) and present results to internal and external technical audiences.
● Collaborate closely with engineers, product teams, and other researchers to align research with real- world impact
● Profile distributed training and inference pipelines - identifying algorithmic, memory, and scheduling inefficiencies to contribute to a technical decision-making and long-term research roadmaps.
● Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.
● Strong foundation in machine learning, deep learning, and statistical modeling.
● Deep understanding of deep learning internals—transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior.
● Proven hands-on experience training or deploying LLMs on multi-GPU and/or multi-node clusters.
● Ability to read, understand, and critically evaluate academic research papers. Demonstrated ability to translate theoretical ideas into practical, production-level performance improvements.
● Strong problem-solving skills and ability to work independently on open-ended research problems.
● Clear written and verbal communication skills in English.
Optional Qualifications
● MSc or PhD in Computer Science, Electrical Engineering, Mathematics or a related quantitative field.
● Strong mathematical background, including linear algebra, probability, and optimization.
● Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks.
● Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large- scale training ecosystems.
● Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large
models across multi-node GPU systems.