Shan Wijenayaka
Contact
Based in Singapore

Shan Wijenayaka.

Lead Engineer.Production AI on low-latency distributed systems. Hardened in FinTech, running at enterprise scale.

Nine years across AI, FinTech, and Cloud. Microservices in Python, Go, Java, and Rust as the foundation; the last three years on LLM systems built on top: evaluation harnesses, multi-agent workflows, LoRA fine-tuning, and the inference stack underneath.

Open to

Staff and lead AI engineering roles in Singapore. Selective.

Currently focused on

Production LLM evals · multi-agent systems · low-latency voice infrastructure.

See the work Contact
Work

A trail of production systems.

From derivatives microservices to production LLM platforms. Each role a different cut at low-latency, correctness-critical systems. Most recent first.

Lead AI/ML Engineer

Certis· AI Platform & Security Operations

Built and shipped the enterprise Voice AI platform end-to-end; <600ms p95 across 20+ concurrent sessions.

Designed and shipped the AI platform end-to-end: real-time Voice AI, multi-agent orchestration, vision-grounded agents, and centralized model governance. Now running in production at enterprise scale.

  • Multi-agent orchestration. LangGraph topology with LiteLLM routing across SEA-LION v4 (self-hosted vLLM) for complex reasoning, Qwen 2.5 14B as the fast self-hosted path, and Claude Sonnet/Haiku via Bedrock as managed fallback. Iterated on prompts, agent topologies, and workflow tradeoffs in production. Latency- and cost-tiered model selection.
  • LoRA fine-tuning on real operator data. fine-tuned Qwen 7-8B on three months of operator-call recordings for SOP extraction. Surfaces relevant procedures in the operator UI as conversations unfold.
  • CV + VLM-grounded agents. YOLO-family detection paired with Qwen3-VL for grounded reasoning and natural-language queries from agents. Production workflows: lost-and-found, hazard recognition, vehicle/parking, occupancy monitoring.
  • p95 <600ms on voice. across 20+ concurrent SIP/WebRTC sessions. vLLM continuous batching, KV/prefix caching, AWQ quantization, Redis semantic caching, and prompt compression on AWS EKS.
  • Prototype-to-production lifecycle. prompt/model versioning, RBAC, audit logging, lineage. Automated evaluation gates, progressive rollout, and rollback on quality regression across Bedrock, SageMaker, vLLM, and HuggingFace endpoints.
  • Team build + architecture. hired and grew the AI engineering team. Defined the architecture review process, weekly design docs, and on-call rotation. Set engineering standards for evaluation gates, rollout, and rollback.

LangGraph · LiteLLM · vLLM · Bedrock · Qwen3-VL · AWS EKS

Languages
Python · Go · Rust · JavaScript · TypeScript
Frameworks
FastAPI
AI / ML
LangGraph · LangChain · LiteLLM · vLLM · PyTorch · YOLO · LLaMA · Large Language Models (LLM) · Large Language Model Operations (LLMOps) · Retrieval-Augmented Generation (RAG) · Vector Databases · Word Embeddings · Prompt Engineering · Fine Tuning · LoRA · AI Agents · Multi-agent Systems · Voice AI · Speech Recognition · Computer Vision · Generative AI · Artificial Intelligence (AI) · Applied Machine Learning · Machine Learning · MLOps · Data Science · AI Infrastructure
Cloud & Platform
Amazon Web Services (AWS) · Amazon Bedrock · AWS SageMaker · Amazon EKS · Amazon DynamoDB · Kubernetes · Docker · Helm Charts · Argo · Cloud Computing
Data
PostgreSQL · SQL · Relational Databases
Networking & APIs
WebRTC · SIP · gRPC · REST APIs · API Development
Observability & DevOps
OpenTelemetry · GitHub · DevOps · Continuous Integration (CI)
Architecture
Event Driven Programming · Concurrent Programming · Distributed Systems · Performance Tuning · Low Latency · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Scalability
Process
Code Review · Agile Methodologies
Architecture overview
Diagram coming soon.

Senior Software Engineer

RegASK· Regulatory Intelligence Platform

Sole architect on a production LLM platform over 1M+ regulatory docs; 99.5 to 99.95% uptime, ~35% cloud spend cut.

Sole architect on a production LLM-powered regulatory platform. Search and Q&A over 1M+ regulatory documents, full lifecycle from retrieval pipeline through evaluation, deployment, and operations.

  • Eval + regression harnesses. tracking answer quality, retrieval drift, and model decay. Gated every deployment, caught regressions before production in a compliance-critical environment.
  • LLM + RAG over 1M+ documents. FastAPI platform with ELT pipelines for ingestion, chunking, embedding, and indexing. Translated regulatory analyst workflows into shippable product features through direct stakeholder partnership.
  • 99.5 to 99.95% uptime. diagnosed reliability bottlenecks through targeted instrumentation. Error rates down ~20% over the same period.
  • ~35% cloud spend cut. right-sized compute and tuned auto-scaling policies. Zero SLA regressions.

LangChain · RAG · MongoDB Atlas Vector · AWS EKS

Languages
Python · JavaScript · TypeScript
Frameworks
FastAPI · Django · Node.js · React.js
AI / ML
LangChain · Retrieval-Augmented Generation (RAG) · Large Language Models (LLM) · Large Language Model Operations (LLMOps) · Prompt Engineering · PyTorch · LLaMA · Pandas (Software) · Generative AI · Applied Machine Learning · Machine Learning · Artificial Intelligence (AI) · Data Science
Cloud & Platform
Amazon Web Services (AWS) · Kubernetes · Docker · Terraform · Cloud Computing
Data
SQL · NoSQL · Database Design
Networking & APIs
REST APIs · API Development
Observability & DevOps
OpenTelemetry · Jenkins · Continuous Integration (CI) · DevOps · Linux
Architecture
Microservices · Distributed Systems · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Leadership
Leading Development Teams · Leadership
Process
Code Review · Agile Methodologies
RAG pipeline overview
Diagram coming soon.

Senior Software Engineer

TP ICAP· Derivatives Trading Systems

Java 17 + Spring Boot 3 derivatives stack plus new event-driven Go microservices; ~35k trades/day at sub-100ms, zero-downtime AWS cutover across 12+ services.

Daily production work at the world's largest interdealer broker during a legacy-to-cloud migration. Existing Java 17 + Spring Boot 3 derivatives stack, new event-driven Go microservices, Node.js for IO-heavy auxiliary work. Strict consistency and ordering across OTC derivatives.

  • Java 17 + Spring Boot 3 in production. daily work on the legacy derivatives stack through May 2024: bug fixes, feature additions, integration with new Go-side services via Kafka, inside a regulated SDLC.
  • ~35k trades/day · sub-100ms. designed new event-driven Go microservices end-to-end for interest rate and currency swap instruments. Idempotent, replay-safe consumers and consistency checks under high-throughput and failure scenarios.
  • Zero-downtime migration · 12+ services. Java/MSSQL to AWS-native (Kafka/MSK, DynamoDB Streams, Aurora PostgreSQL). Incremental extraction over big-bang; zero trading interruptions.
  • ML in the pre-clearing loop. anomaly and consistency scoring with explainable flags and a reviewer feedback loop. Downstream operational breaks reduced.
  • OTel + SLI/SLO baselines. instrumented the full service mesh. Unknown tail-latency causes down ~40%. BDD/Cucumber on Jenkins inside a regulated SDLC reduced defect escape ~30%.

Java 17 · Spring Boot 3 · Go · Kafka · MSK · DynamoDB · Aurora PostgreSQL

Languages
Java · Go · Rust · Python · JavaScript · TypeScript
Frameworks
Spring Boot · Django · Node.js · React.js
AI / ML
Generative AI · Large Language Models (LLM) · Artificial Intelligence (AI) · Prompt Engineering
Cloud & Platform
Amazon Web Services (AWS) · AWS EKS · Kubernetes · Istio · Docker · Terraform · Cloud Computing
Data
PostgreSQL · Aurora PostgreSQL · DynamoDB · Redis · Kafka · MSK · SQL · NoSQL · Database Design
Networking & APIs
REST APIs · gRPC · API Development
Observability & DevOps
OpenTelemetry · Prometheus · Grafana · Locust · Jenkins · Continuous Integration (CI) · DevOps · BDD/Cucumber · Linux
Architecture
Microservices · Distributed Systems · Event Driven Programming · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Domain
Trading · FinTech
Leadership
Leading Development Teams · Leadership
Process
Code Review · Agile Methodologies

Senior Software Engineer

Chope· Consumer Dining Platform

Monolith-to-microservices decomposition on a high-traffic APAC reservation platform; latency held through dinner-hour spikes, per-team service ownership.

Modernized the backend of a high-traffic reservation platform: monolith to microservices, keeping real-time booking and availability flows running across APAC markets.

  • Peak-hour SLAs held. defined service boundaries, rate-limiting, and caching strategies that kept latency stable under dinner-hour traffic spikes.
  • Real-time orchestration. microservices for restaurant availability, booking, waitlist. Concurrent users across multiple markets.
  • Independently deployable services. monolith decomposition with per-team ownership and faster release cycles.

Java · Spring Boot · Go · Python · PostgreSQL · Redis · AWS

Languages
Java · Go · Python · JavaScript · TypeScript
Frameworks
Spring Boot · Django · Node.js · React.js
AI / ML
Pandas (Software)
Cloud & Platform
Amazon Web Services (AWS) · Google Cloud Platform (GCP) · Kubernetes · Istio · Docker · Terraform · Cloud Computing
Data
PostgreSQL · Redis · Kafka · SQL · NoSQL · Database Design
Networking & APIs
REST APIs · API Development
Observability & DevOps
Jenkins · Continuous Integration (CI) · DevOps · Locust · Linux
Architecture
Microservices · Distributed Systems · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Domain
FinTech
Leadership
Leading Development Teams · Leadership
Process
Code Review · Agile Methodologies
Earlier · 2017 – 2021

Built a 37M-subscriber USSD platform from scratch at Omobio, then senior backend / full-stack work for Wiley and Sysco across distributed US teams. Senior since 2019.

What I build with

The stack, not the taxonomy.

A cross-cutting view of what's earning its keep right now. Role-by-role depth lives in the Work bullets above. This section is the index.

Backend & distributed systems

Microservices continuous since 2019. Event-driven, multi-tenant, cloud-native, observability-first. The foundation the AI work sits on top of.

  • PythonFastAPI, Django, Flask in production
  • Golow-latency event-driven services
  • Java / Spring Bootenterprise microservices
  • Node.js / TypeScriptbackend services and API layer
  • Event-driven / CQRSevent sourcing, idempotent consumers
  • Multi-tenant SaaSdomain-driven design across products

Inference & serving

Latency-tuned LLM serving for voice and chat. The constraint is p95, not throughput.

  • vLLMcontinuous batching, KV / prefix cache
  • LiteLLMrouting across providers, latency / cost tiers
  • AWQ / GPTQ4-bit quantization, quality preserved
  • Redis cachesemantic cache over the RAG hot path
  • OpenSearchhybrid vector + keyword retrieval
  • pgvectorvector search in Postgres

Models & providers

Open-weights LLMs self-hosted on vLLM, plus managed inference via Bedrock and Groq. Vision-language models paired with detectors; speech in and out in the same loop.

  • SEA-LION v4self-hosted on vLLM
  • Qwen 2.5 14Bself-hosted on vLLM
  • Qwen 3.5open-weights LLM, self-hosted
  • Gemma 4 27Bopen-weights LLM, larger context window
  • Qwen3-VLvision-language reasoning over detector output
  • Claude (Bedrock)in the multi-provider routing layer
  • MERaLiON ASR 10BSingapore-tuned speech recognition
  • Whisper / Google STT-TTSspeech in and out of the voice loop
  • LoRAfine-tuning open-weights models

Evaluation & governance

Harnesses that catch regressions before production. Models versioned, lineage tracked, rollouts gated.

  • Regression harnessesquality gates before deploy
  • Drift & decaymonitoring across the LLM lifecycle
  • Eval gatesautomated quality CI in the pipeline
  • Versioningprompt + model with RBAC
  • Lineagemodel and prompt audit trail
  • Progressive rolloutsafe deployment with rollback

Platforms & infra

AWS-native substrate; Kafka for ordered events; Aurora / Dynamo on the hot path; OpenTelemetry from day one.

  • AWS EKSproduction substrate, GPU node pools
  • Kafka / MSKordered event flow with replay
  • Aurora PGtransactional hot path at trading scale
  • DynamoDBKV when latency is non-negotiable
  • OpenTelemetrytraces, metrics, alerting
  • MCP / Claude CodeAI-native development tooling
Meet Donnaa featured callout · not part of the resume

Also - I built the chat on this site.

A small live demo, built end-to-end. LinkedIn-gated sign-in, prompt-injection guards, streaming SSE, per-user rate limits, classifier-driven capture. Common production patterns, in miniature. Ask her about my work.

excerpt · live chatlive now
What does Shan do?
D
Currently leads AI/ML at Certis. Voice AI at p95 sub-600ms across 20+ concurrent sessions, multi-agent orchestration, LoRA fine-tuning on operator data. Before Certis: sub-100ms derivatives processing in Go at TP ICAP, and production RAG over 1M+ regulatory documents at RegASK. Production AI on low-latency distributed systems, often in regulated environments. What brought you in?
opens the chat · LinkedIn sign-in
Built end-to-end
  • Qwen 2.5 72B·served on DeepInfra; OpenAI-compatible streaming
  • Streaming SSE·delta-by-delta, no spinner theatre
  • Prefix caching·static system prompt cached on DeepInfra for fast TTFT
  • LinkedIn OAuth·real visitors only; gates the chat
  • Prompt-injection + PII guards·input + output filters
  • Per-user rate limit·sliding window per LinkedIn sub
  • Classifier-driven capture·Donna decides when to ask

Selected credentials.

The ones that show up in how the work gets done.

  • Model Context Protocol: Advanced Topics2026
    Anthropic

    Advanced MCP features including sampling, notifications, roots-based file access, and JSON-RPC transports (stdio, StreamableHTTP) for production servers.

  • Generative AI: Fundamentals to Advanced Techniques2025
    National University of Singapore

    NUS programme covering Python, neural networks, transformer models (BERT, GPT), diffusion models, multimodal AI, and a hands-on generative AI capstone.

  • Machine Learning Specialization2024
    Stanford University

    Three-course Andrew Ng program covering supervised and unsupervised learning, neural networks, decision trees, recommenders, and reinforcement learning in Python.

  • Advanced Learning Algorithms2024
    Stanford University

    Neural networks with TensorFlow, multiclass classification, model evaluation and bias-variance tuning, decision trees, random forests, and boosted tree ensembles.

  • Machine Learning in Production2024
    Stanford University

    MLOps foundations: ML project lifecycle, deployment patterns, modeling strategies, baselines, data definition, and production monitoring for deployed models.

  • Generative AI with Large Language Models2025
    DeepLearning.AI

    End-to-end LLM lifecycle: transformer architecture, pretraining, fine-tuning, RLHF, scaling laws, and deployment, built with DeepLearning.AI and AWS.

Education
BSc (Hons) Computer ScienceUniversity College Dublin, Ireland
20142018
Get in touch

Tell me what you're building.

A few sentences works. Recruiter, founder, fellow engineer. Send it directly below, or reach out on LinkedIn.

Already aligned? Skip the back-and-forth and grab 30 minutes on Shan's calendar.

Goes straight to Shan · reply via email, usually within a day.

or · verifies you're a real person and prefills your name + email.

Or reach me directly
Emailmail@shanwijenayaka.comdirect line
GitHubgithub.com/shanwijeselected work
Based inSingaporeSGT (UTC+8)
DonnaOpen the chat ↗ask, then route to Shan