Who is Shan Wijenayaka?

Shan Wijenayaka is a Lead AI/ML Engineer based in Singapore with nine years of experience building production systems across AI, FinTech, and cloud. He has shipped sub-100ms derivatives trading at TP ICAP, sub-600ms real-time voice AI at Certis, and regulatory AI over 1M+ documents at RegASK.

Is Shan Wijenayaka open to new roles?

Yes. Shan is selectively open to Staff and Lead AI engineering roles based in Singapore. The fastest way to reach him is via LinkedIn (linkedin.com/in/shanwije) or the contact form on shanwijenayaka.com.

What is Shan Wijenayaka's technical expertise?

Shan specializes in production AI on low-latency distributed systems: LLM serving and inference (vLLM, LiteLLM, AWQ quantization), multi-agent systems and LangGraph, RAG, LoRA fine-tuning, model governance, and real-time voice AI. He builds backend services in Go, Python, Java, and Rust on AWS, Kafka, and Kubernetes.

Where is Shan Wijenayaka based?

Shan is based in Singapore and is currently a Lead AI/ML Engineer at Certis.

What is Shan currently focused on?

Shan is currently focused on production LLM evaluation harnesses, multi-agent systems, and low-latency voice infrastructure.

Based in Singapore

Shan Wijenayaka.

Lead Engineer.Production AI on low-latency distributed systems. Hardened in FinTech, running at enterprise scale.

Nine years across AI, FinTech, and Cloud. Microservices in Python, Go, Java, and Rust as the foundation; the last three years on LLM systems built on top: evaluation harnesses, multi-agent workflows, LoRA fine-tuning, and the inference stack underneath.

<600msVoice AI p95 · Certis · '26
<100msTrade processing · TP ICAP · '24
99.95%Platform uptime · RegASK · '25
1M+Regulatory docs · RegASK · '25

Open to

Staff and lead AI engineering roles in Singapore. Selective.

Currently focused on

Production LLM evals · multi-agent systems · low-latency voice infrastructure.

See the work Contact

Work

A trail of production systems.

From derivatives microservices to production LLM platforms. Each role a different cut at low-latency, correctness-critical systems. Most recent first.

Lead AI/ML Engineer

Certis· AI Platform & Security Operations

Built and shipped the enterprise Voice AI platform end-to-end; <600ms p95 across 20+ concurrent sessions.

Designed and shipped the AI platform end-to-end: real-time Voice AI, multi-agent orchestration, vision-grounded agents, and centralized model governance. Now running in production at enterprise scale.

Multi-agent orchestration. LangGraph topology with LiteLLM routing across SEA-LION v4 (self-hosted vLLM) for complex reasoning, Qwen 2.5 14B as the fast self-hosted path, and Claude Sonnet/Haiku via Bedrock as managed fallback. Iterated on prompts, agent topologies, and workflow tradeoffs in production. Latency- and cost-tiered model selection.
LoRA fine-tuning on real operator data. fine-tuned Qwen 7-8B on three months of operator-call recordings for SOP extraction. Surfaces relevant procedures in the operator UI as conversations unfold.
CV + VLM-grounded agents. YOLO-family detection paired with Qwen3-VL for grounded reasoning and natural-language queries from agents. Production workflows: lost-and-found, hazard recognition, vehicle/parking, occupancy monitoring.
p95 <600ms on voice. across 20+ concurrent SIP/WebRTC sessions. vLLM continuous batching, KV/prefix caching, AWQ quantization, Redis semantic caching, and prompt compression on AWS EKS.
Prototype-to-production lifecycle. prompt/model versioning, RBAC, audit logging, lineage. Automated evaluation gates, progressive rollout, and rollback on quality regression across Bedrock, SageMaker, vLLM, and HuggingFace endpoints.
Team build + architecture. hired and grew the AI engineering team. Defined the architecture review process, weekly design docs, and on-call rotation. Set engineering standards for evaluation gates, rollout, and rollback.

LangGraph · LiteLLM · vLLM · Bedrock · Qwen3-VL · AWS EKS

Languages: Python · Go · Rust · JavaScript · TypeScript
Frameworks: FastAPI
AI / ML: LangGraph · LangChain · LiteLLM · vLLM · PyTorch · YOLO · LLaMA · Large Language Models (LLM) · Large Language Model Operations (LLMOps) · Retrieval-Augmented Generation (RAG) · Vector Databases · Word Embeddings · Prompt Engineering · Fine Tuning · LoRA · AI Agents · Multi-agent Systems · Voice AI · Speech Recognition · Computer Vision · Generative AI · Artificial Intelligence (AI) · Applied Machine Learning · Machine Learning · MLOps · Data Science · AI Infrastructure
Cloud & Platform: Amazon Web Services (AWS) · Amazon Bedrock · AWS SageMaker · Amazon EKS · Amazon DynamoDB · Kubernetes · Docker · Helm Charts · Argo · Cloud Computing
Data: PostgreSQL · SQL · Relational Databases
Networking & APIs: WebRTC · SIP · gRPC · REST APIs · API Development
Observability & DevOps: OpenTelemetry · GitHub · DevOps · Continuous Integration (CI)
Architecture: Event Driven Programming · Concurrent Programming · Distributed Systems · Performance Tuning · Low Latency · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Scalability
Process: Code Review · Agile Methodologies

Architecture overview

Diagram coming soon.

Senior Software Engineer

RegASK· Regulatory Intelligence Platform

Sole architect on a production LLM platform over 1M+ regulatory docs; 99.5 to 99.95% uptime, ~35% cloud spend cut.

Sole architect on a production LLM-powered regulatory platform. Search and Q&A over 1M+ regulatory documents, full lifecycle from retrieval pipeline through evaluation, deployment, and operations.

Eval + regression harnesses. tracking answer quality, retrieval drift, and model decay. Gated every deployment, caught regressions before production in a compliance-critical environment.
LLM + RAG over 1M+ documents. FastAPI platform with ELT pipelines for ingestion, chunking, embedding, and indexing. Translated regulatory analyst workflows into shippable product features through direct stakeholder partnership.
99.5 to 99.95% uptime. diagnosed reliability bottlenecks through targeted instrumentation. Error rates down ~20% over the same period.
~35% cloud spend cut. right-sized compute and tuned auto-scaling policies. Zero SLA regressions.

LangChain · RAG · MongoDB Atlas Vector · AWS EKS

Languages: Python · JavaScript · TypeScript
Frameworks: FastAPI · Django · Node.js · React.js
AI / ML: LangChain · Retrieval-Augmented Generation (RAG) · Large Language Models (LLM) · Large Language Model Operations (LLMOps) · Prompt Engineering · PyTorch · LLaMA · Pandas (Software) · Generative AI · Applied Machine Learning · Machine Learning · Artificial Intelligence (AI) · Data Science
Cloud & Platform: Amazon Web Services (AWS) · Kubernetes · Docker · Terraform · Cloud Computing
Data: SQL · NoSQL · Database Design
Networking & APIs: REST APIs · API Development
Observability & DevOps: OpenTelemetry · Jenkins · Continuous Integration (CI) · DevOps · Linux
Architecture: Microservices · Distributed Systems · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Leadership: Leading Development Teams · Leadership
Process: Code Review · Agile Methodologies

RAG pipeline overview

Diagram coming soon.

Senior Software Engineer

TP ICAP· Derivatives Trading Systems

Java 17 + Spring Boot 3 derivatives stack plus new event-driven Go microservices; ~35k trades/day at sub-100ms, zero-downtime AWS cutover across 12+ services.

Daily production work at the world's largest interdealer broker during a legacy-to-cloud migration. Existing Java 17 + Spring Boot 3 derivatives stack, new event-driven Go microservices, Node.js for IO-heavy auxiliary work. Strict consistency and ordering across OTC derivatives.

Java 17 + Spring Boot 3 in production. daily work on the legacy derivatives stack through May 2024: bug fixes, feature additions, integration with new Go-side services via Kafka, inside a regulated SDLC.
~35k trades/day · sub-100ms. designed new event-driven Go microservices end-to-end for interest rate and currency swap instruments. Idempotent, replay-safe consumers and consistency checks under high-throughput and failure scenarios.
Zero-downtime migration · 12+ services. Java/MSSQL to AWS-native (Kafka/MSK, DynamoDB Streams, Aurora PostgreSQL). Incremental extraction over big-bang; zero trading interruptions.
ML in the pre-clearing loop. anomaly and consistency scoring with explainable flags and a reviewer feedback loop. Downstream operational breaks reduced.
OTel + SLI/SLO baselines. instrumented the full service mesh. Unknown tail-latency causes down ~40%. BDD/Cucumber on Jenkins inside a regulated SDLC reduced defect escape ~30%.

Java 17 · Spring Boot 3 · Go · Kafka · MSK · DynamoDB · Aurora PostgreSQL

Languages: Java · Go · Rust · Python · JavaScript · TypeScript
Frameworks: Spring Boot · Django · Node.js · React.js
AI / ML: Generative AI · Large Language Models (LLM) · Artificial Intelligence (AI) · Prompt Engineering
Cloud & Platform: Amazon Web Services (AWS) · AWS EKS · Kubernetes · Istio · Docker · Terraform · Cloud Computing
Data: PostgreSQL · Aurora PostgreSQL · DynamoDB · Redis · Kafka · MSK · SQL · NoSQL · Database Design
Networking & APIs: REST APIs · gRPC · API Development
Observability & DevOps: OpenTelemetry · Prometheus · Grafana · Locust · Jenkins · Continuous Integration (CI) · DevOps · BDD/Cucumber · Linux
Architecture: Microservices · Distributed Systems · Event Driven Programming · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Domain: Trading · FinTech
Leadership: Leading Development Teams · Leadership
Process: Code Review · Agile Methodologies

Senior Software Engineer

Chope· Consumer Dining Platform

Monolith-to-microservices decomposition on a high-traffic APAC reservation platform; latency held through dinner-hour spikes, per-team service ownership.

Modernized the backend of a high-traffic reservation platform: monolith to microservices, keeping real-time booking and availability flows running across APAC markets.

Peak-hour SLAs held. defined service boundaries, rate-limiting, and caching strategies that kept latency stable under dinner-hour traffic spikes.
Real-time orchestration. microservices for restaurant availability, booking, waitlist. Concurrent users across multiple markets.
Independently deployable services. monolith decomposition with per-team ownership and faster release cycles.

Java · Spring Boot · Go · Python · PostgreSQL · Redis · AWS

Languages: Java · Go · Python · JavaScript · TypeScript
Frameworks: Spring Boot · Django · Node.js · React.js
AI / ML: Pandas (Software)
Cloud & Platform: Amazon Web Services (AWS) · Google Cloud Platform (GCP) · Kubernetes · Istio · Docker · Terraform · Cloud Computing
Data: PostgreSQL · Redis · Kafka · SQL · NoSQL · Database Design
Networking & APIs: REST APIs · API Development
Observability & DevOps: Jenkins · Continuous Integration (CI) · DevOps · Locust · Linux
Architecture: Microservices · Distributed Systems · Concurrent Programming · Multithreading · Functional Programming · Low Latency · High Throughput · Software Architecture · Software Design Patterns · Complex Systems · Algorithms · Data Structures · Scalability
Domain: FinTech
Leadership: Leading Development Teams · Leadership
Process: Code Review · Agile Methodologies

Earlier · 2017 – 2021

Built a 37M-subscriber USSD platform from scratch at Omobio, then senior backend / full-stack work for Wiley and Sysco across distributed US teams. Senior since 2019.

What I build with

The stack, not the taxonomy.

A cross-cutting view of what's earning its keep right now. Role-by-role depth lives in the Work bullets above. This section is the index.

Backend & distributed systems

Microservices continuous since 2019. Event-driven, multi-tenant, cloud-native, observability-first. The foundation the AI work sits on top of.

PythonFastAPI, Django, Flask in production
Golow-latency event-driven services
Java / Spring Bootenterprise microservices
Node.js / TypeScriptbackend services and API layer
Event-driven / CQRSevent sourcing, idempotent consumers
Multi-tenant SaaSdomain-driven design across products

Inference & serving

Latency-tuned LLM serving for voice and chat. The constraint is p95, not throughput.

vLLMcontinuous batching, KV / prefix cache
LiteLLMrouting across providers, latency / cost tiers
AWQ / GPTQ4-bit quantization, quality preserved
Redis cachesemantic cache over the RAG hot path
OpenSearchhybrid vector + keyword retrieval
pgvectorvector search in Postgres

Models & providers

Open-weights LLMs self-hosted on vLLM, plus managed inference via Bedrock and Groq. Vision-language models paired with detectors; speech in and out in the same loop.

SEA-LION v4self-hosted on vLLM
Qwen 2.5 14Bself-hosted on vLLM
Qwen 3.5open-weights LLM, self-hosted
Gemma 4 27Bopen-weights LLM, larger context window
Qwen3-VLvision-language reasoning over detector output
Claude (Bedrock)in the multi-provider routing layer
MERaLiON ASR 10BSingapore-tuned speech recognition
Whisper / Google STT-TTSspeech in and out of the voice loop
LoRAfine-tuning open-weights models

Evaluation & governance

Harnesses that catch regressions before production. Models versioned, lineage tracked, rollouts gated.

Regression harnessesquality gates before deploy
Drift & decaymonitoring across the LLM lifecycle
Eval gatesautomated quality CI in the pipeline
Versioningprompt + model with RBAC
Lineagemodel and prompt audit trail
Progressive rolloutsafe deployment with rollback

Platforms & infra

AWS-native substrate; Kafka for ordered events; Aurora / Dynamo on the hot path; OpenTelemetry from day one.

AWS EKSproduction substrate, GPU node pools
Kafka / MSKordered event flow with replay
Aurora PGtransactional hot path at trading scale
DynamoDBKV when latency is non-negotiable
OpenTelemetrytraces, metrics, alerting
MCP / Claude CodeAI-native development tooling

On GitHub

Still shipping in public.

Most of the heavy lifting lives in private and company repos, but the public cadence keeps ticking. A year of commits at a glance.

@shanwije on GitHub →

Meet Donnaa featured callout · not part of the resume

Also - I built the chat on this site.

A small live demo, built end-to-end. LinkedIn-gated sign-in, prompt-injection guards, streaming SSE, per-user rate limits, classifier-driven capture. Common production patterns, in miniature. Ask her about my work.

excerpt · live chatlive now

What does Shan do?

Currently leads AI/ML at Certis. Voice AI at p95 sub-600ms across 20+ concurrent sessions, multi-agent orchestration, LoRA fine-tuning on operator data. Before Certis: sub-100ms derivatives processing in Go at TP ICAP, and production RAG over 1M+ regulatory documents at RegASK. Production AI on low-latency distributed systems, often in regulated environments. What brought you in?

opens the chat · LinkedIn sign-in

Built end-to-end

Qwen 2.5 72B·served on DeepInfra; OpenAI-compatible streaming
Streaming SSE·delta-by-delta, no spinner theatre
Prefix caching·static system prompt cached on DeepInfra for fast TTFT
LinkedIn OAuth·real visitors only; gates the chat
Prompt-injection + PII guards·input + output filters
Per-user rate limit·sliding window per LinkedIn sub
Classifier-driven capture·Donna decides when to ask

Built by Shan. Not a wrapper around a chatbot SDK.

Selected credentials.

The ones that show up in how the work gets done.

Model Context Protocol: Advanced Topics2026
Anthropic
Advanced MCP features including sampling, notifications, roots-based file access, and JSON-RPC transports (stdio, StreamableHTTP) for production servers.
Generative AI: Fundamentals to Advanced Techniques2025
National University of Singapore
NUS programme covering Python, neural networks, transformer models (BERT, GPT), diffusion models, multimodal AI, and a hands-on generative AI capstone.
Machine Learning Specialization2024
Stanford University
Three-course Andrew Ng program covering supervised and unsupervised learning, neural networks, decision trees, recommenders, and reinforcement learning in Python.
Advanced Learning Algorithms2024
Stanford University
Neural networks with TensorFlow, multiclass classification, model evaluation and bias-variance tuning, decision trees, random forests, and boosted tree ensembles.
Machine Learning in Production2024
Stanford University
MLOps foundations: ML project lifecycle, deployment patterns, modeling strategies, baselines, data definition, and production monitoring for deployed models.
Generative AI with Large Language Models2025
DeepLearning.AI
End-to-end LLM lifecycle: transformer architecture, pretraining, fine-tuning, RLHF, scaling laws, and deployment, built with DeepLearning.AI and AWS.

Education

BSc (Hons) Computer ScienceUniversity College Dublin, Ireland

2014 – 2018

Get in touch

Tell me what you're building.

A few sentences works. Recruiter, founder, fellow engineer. Send it directly below, or reach out on LinkedIn.

Already aligned? Skip the back-and-forth and grab 30 minutes on Shan's calendar.

Or reach me directly

Emailmail@shanwijenayaka.comdirect line

LinkedInlinkedin.com/in/shanwijeDM open

GitHubgithub.com/shanwijeselected work

Based inSingaporeSGT (UTC+8)

DonnaOpen the chat ↗ask, then route to Shan