Production ML Engineer • Noida, India

I build LLM systems that actually ship to production.

Most recent: Architected a multi-tenant RAG pipeline serving 50K+ queries/day with <200ms p99 latency. Fine-tuned domain-specific language models that reduced manual review time by 73%. Built MLOps infrastructure that took model deployment from 3 weeks to 4 hours.

50K+

Queries/Day

<200ms

P99 Latency

73%

Time Reduction

Get in touch LinkedIn GitHub

Portfolio

Systems I've Built

Production ML infrastructure that handles real traffic, real failures, and real business value. Built for reliability, observability, and iteration speed.

LLM Fine-tuning Infrastructure

Trained LoRA adapters on 100M+ token datasets with automated hyperparameter optimization
Built evaluation frameworks measuring hallucination rates, factual accuracy, and domain coherence
Reduced inference costs by 60% through quantization and KV-cache optimization

Production RAG Architecture

Multi-stage retrieval: keyword → semantic → reranking with cross-encoder models
Implemented semantic caching reducing embedding compute by 40% on repeated queries
Built feedback loops for continuous embedding model improvement based on user interactions

MLOps & Orchestration

Azure ML pipelines with automatic experiment tracking, model versioning, and A/B deployment
Real-time model monitoring detecting distribution shift before accuracy degrades
CI/CD for ML: automated testing for data quality, model performance, and API contracts

NLP at Scale

Entity extraction pipelines processing 2M+ documents monthly with custom transformer models
Multi-label classification achieving 92% F1 on imbalanced datasets through data augmentation
Built custom tokenizers and vocabulary for domain-specific text (legal, medical, financial)

Approach

Technical Philosophy

⚙️

I think in systems, not models

A 90% accurate model that deploys reliably beats a 95% accurate model that breaks in production. I care about latency budgets, error handling, monitoring, and what happens when your database goes down at 3 AM.

⚡

I optimize for iteration speed

Fast feedback loops matter more than perfect architecture. I build prototypes that fail quickly, then productionize what works. Every pipeline I write has observability baked in from day one.

🚀

I read papers, but ship code

Attention mechanisms are elegant. But deployment scripts, error handling, and load testing are what separate demos from products.

Technologies

Tech Stack

Tools I use daily to ship ML systems that work in production environments.

Languages

Python SQL R

ML/DL Frameworks

PyTorch Transformers LangChain spaCy scikit-learn TensorFlow

LLM Ops

Azure OpenAI HuggingFace vLLM ONNX Triton

Infrastructure

Azure ML Docker Kubernetes FastAPI Redis Airflow MLflow

Data Platforms

PostgreSQL MongoDB Pinecone Weaviate PySpark dbt

Cloud Platforms

Azure (AI-102) AWS GCP Oracle OCI

Current Focus

What I'm Learning Right Now

Agent Architectures

ReAct patterns, function calling, tool use, and memory systems that actually work in production environments.

Multimodal Models

Vision-language models, CLIP embeddings, cross-modal retrieval, and building unified representations.

LLM Inference Optimization

Speculative decoding, continuous batching, PagedAttention, and serving models at scale efficiently.

Prompt Engineering

Chain-of-thought reasoning, few-shot learning, structured outputs, and reliability patterns.

Let's Connect

Build Something That Works

Interested in Staff/Principal ML Engineer roles building LLM infrastructure, or technical leadership in teams solving hard NLP/GenAI problems at scale.

k.prakashofficial@gmail.com

Location

Noida, India • Open to remote

GitHub

github.com/kpdagrt22

linkedin.com/in/prakash-kantumutchu

Portfolio

View Full Portfolio