Lead AI/ML Engineer
Certis· AI Platform & Security Operations
Built and shipped the enterprise Voice AI platform end-to-end; <600ms p95 across 20+ concurrent sessions.
Designed and shipped the AI platform end-to-end: real-time Voice AI, multi-agent orchestration, vision-grounded agents, and centralized model governance. Now running in production at enterprise scale.
- Multi-agent orchestration. LangGraph topology with LiteLLM routing across SEA-LION v4 (self-hosted vLLM) for complex reasoning, Qwen 2.5 14B as the fast self-hosted path, and Claude Sonnet/Haiku via Bedrock as managed fallback. Iterated on prompts, agent topologies, and workflow tradeoffs in production. Latency- and cost-tiered model selection.
- LoRA fine-tuning on real operator data. fine-tuned Qwen 7-8B on three months of operator-call recordings for SOP extraction. Surfaces relevant procedures in the operator UI as conversations unfold.
- CV + VLM-grounded agents. YOLO-family detection paired with Qwen3-VL for grounded reasoning and natural-language queries from agents. Production workflows: lost-and-found, hazard recognition, vehicle/parking, occupancy monitoring.
- p95 <600ms on voice. across 20+ concurrent SIP/WebRTC sessions. vLLM continuous batching, KV/prefix caching, AWQ quantization, Redis semantic caching, and prompt compression on AWS EKS.
- Prototype-to-production lifecycle. prompt/model versioning, RBAC, audit logging, lineage. Automated evaluation gates, progressive rollout, and rollback on quality regression across Bedrock, SageMaker, vLLM, and HuggingFace endpoints.
- Team build + architecture. hired and grew the AI engineering team. Defined the architecture review process, weekly design docs, and on-call rotation. Set engineering standards for evaluation gates, rollout, and rollback.
- Languages
- Python · Go · Rust · JavaScript · TypeScript
- Frameworks
- FastAPI
- AI / ML
- LangGraph · LangChain · LiteLLM · vLLM · PyTorch · YOLO · LLaMA · Large Language Models (LLM) · Large Language Model Operations (LLMOps) · Retrieval-Augmented Generation (RAG) · Vector Databases · Word Embeddings · Prompt Engineering · Fine Tuning · LoRA · AI Agents · Multi-agent Systems · Voice AI · Speech Recognition · Computer Vision · Generative AI · Artificial Intelligence (AI) · Applied Machine Learning · Machine Learning · MLOps · Data Science · AI Infrastructure
- Cloud & Platform
- Amazon Web Services (AWS) · Amazon Bedrock · AWS SageMaker · Amazon EKS · Amazon DynamoDB · Kubernetes · Docker · Helm Charts · Argo · Cloud Computing
- Data
- PostgreSQL · SQL · Relational Databases
- Networking & APIs
- WebRTC · SIP · gRPC · REST APIs · API Development
- Observability & DevOps
- OpenTelemetry · GitHub · DevOps · Continuous Integration (CI)
- Architecture
- Event Driven Programming · Concurrent Programming · Distributed Systems · Performance Tuning · Low Latency · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Scalability
- Process
- Code Review · Agile Methodologies