Production ML Engineer • Noida, India

I build LLM systems that actually ship to production.

Most recent: Architected a multi-tenant RAG pipeline serving 50K+ queries/day with <200ms p99 latency. Fine-tuned domain-specific language models that reduced manual review time by 73%. Built MLOps infrastructure that took model deployment from 3 weeks to 4 hours.

50K+
Queries/Day
<200ms
P99 Latency
73%
Time Reduction
Prakash Kantumutchu
Prakash Kantumutchu
MLOps Engineer • AI/ML Architect
Azure ML PyTorch LangChain RAG Fine-tuning
Available for senior roles
Portfolio

Systems I've Built

Production ML infrastructure that handles real traffic, real failures, and real business value. Built for reliability, observability, and iteration speed.

LLM Fine-tuning Infrastructure

  • Trained LoRA adapters on 100M+ token datasets with automated hyperparameter optimization
  • Built evaluation frameworks measuring hallucination rates, factual accuracy, and domain coherence
  • Reduced inference costs by 60% through quantization and KV-cache optimization

Production RAG Architecture

  • Multi-stage retrieval: keyword → semantic → reranking with cross-encoder models
  • Implemented semantic caching reducing embedding compute by 40% on repeated queries
  • Built feedback loops for continuous embedding model improvement based on user interactions

MLOps & Orchestration

  • Azure ML pipelines with automatic experiment tracking, model versioning, and A/B deployment
  • Real-time model monitoring detecting distribution shift before accuracy degrades
  • CI/CD for ML: automated testing for data quality, model performance, and API contracts

NLP at Scale

  • Entity extraction pipelines processing 2M+ documents monthly with custom transformer models
  • Multi-label classification achieving 92% F1 on imbalanced datasets through data augmentation
  • Built custom tokenizers and vocabulary for domain-specific text (legal, medical, financial)
Approach

Technical Philosophy

⚙️

I think in systems, not models

A 90% accurate model that deploys reliably beats a 95% accurate model that breaks in production. I care about latency budgets, error handling, monitoring, and what happens when your database goes down at 3 AM.

I optimize for iteration speed

Fast feedback loops matter more than perfect architecture. I build prototypes that fail quickly, then productionize what works. Every pipeline I write has observability baked in from day one.

🚀

I read papers, but ship code

Attention mechanisms are elegant. But deployment scripts, error handling, and load testing are what separate demos from products.

Technologies

Tech Stack

Tools I use daily to ship ML systems that work in production environments.

Languages
Python SQL R
ML/DL Frameworks
PyTorch Transformers LangChain spaCy scikit-learn TensorFlow
LLM Ops
Azure OpenAI HuggingFace vLLM ONNX Triton
Infrastructure
Azure ML Docker Kubernetes FastAPI Redis Airflow MLflow
Data Platforms
PostgreSQL MongoDB Pinecone Weaviate PySpark dbt
Cloud Platforms
Azure (AI-102) AWS GCP Oracle OCI
Current Focus

What I'm Learning Right Now

Agent Architectures

ReAct patterns, function calling, tool use, and memory systems that actually work in production environments.

Multimodal Models

Vision-language models, CLIP embeddings, cross-modal retrieval, and building unified representations.

LLM Inference Optimization

Speculative decoding, continuous batching, PagedAttention, and serving models at scale efficiently.

Prompt Engineering

Chain-of-thought reasoning, few-shot learning, structured outputs, and reliability patterns.

Let's Connect

Build Something That Works

Interested in Staff/Principal ML Engineer roles building LLM infrastructure, or technical leadership in teams solving hard NLP/GenAI problems at scale.

Location
Noida, India • Open to remote