Description
Tel Aviv · Hybrid | Full-Time | Cybersecurity
What You'll Do:
- Design and develop LLM-powered security features and internal AI tools — RAG pipelines, multi-agent workflows, prompt-engineered systems for cybersecurity
- Architect and operate multi-agent systems in production — orchestration, inter-agent communication, task delegation, failure handling at scale
- Build agent monitoring and observability pipelines — tracing, drift/failure detection, alerting, reliability SLAs
- Build and maintain scalable MLOps infrastructure — model serving, eval frameworks, experiment tracking, CI/CD for ML
- Fine-tune and adapt foundation models on internal datasets (network telemetry, security logs, threat intel)
- Establish best practices for model observability, safety, and responsible AI deployment
- Stay current with the LLM/GenAI ecosystem; drive updates to the AI SDLC and AI Research cycle
tions 'from scratch'
Requirements
Must-Have:
- 5–8 years SWE (2–3 in AI/ML)
- Production LLM apps (RAG/agents/tool-use/fine-tuning)
- Production multi-agent systems
- Agent observability
- LangChain/LangGraph/Bedrock AgentCore
- Strong Python
- MLOps pipelines
- Transformers/embeddings/vector DBs
- Cloud + K8s.
Nice-to-Have:
- Cybersecurity background (significant plus)
- Networking (SDN/BGP)
- Model eval (LLM-as-judge/RAGAS)
- MCP
- Telecom/enterprise SaaS
- publications/OSS in GenAI.
Stack:
Python, PyTorch, OpenAI/Anthropic APIs, LangChain, LangGraph, AWS Bedrock AgentCore, LangSmith, Kubernetes, Kafka, Elasticsearch, AWS, PostgreSQL