Skip to content

ombharatiya/ai-system-design-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

102 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 AI System Design Guide

The Complete Interview & Production Reference

GitHub Twitter LinkedIn

Last commit License PRs Welcome Stars Contributors Open issues

The living reference for production AI systems. Continuously updated. Interview-ready depth.

A practical, continuously updated guide to AI system design, RAG architectures, LLM engineering, agentic AI, MCP and A2A protocols, and AI engineering interview preparation. Covers production patterns, model selection, evaluation, and real-world case studies from staff-level interviews.

New here? Jump to the 110-question Interview Bank, the RAG Fundamentals chapter, or pick the right LLM for production.


📚 Quick Navigation

I want to... Start here
Prepare for interviews Question BankAnswer Frameworks
Learn AI systems fast LLM InternalsRAG Fundamentals
Build production RAG ChunkingVector DBsRerankingProduction RAG
Advanced retrieval Contextual RetrievalColBERTMulti-modal RAG
Design multi-tenant AI Isolation PatternsCase Study
Build agents Agent FundamentalsMCP & A2ALangGraph
Tool-use & computer agents LandscapeOpenClawSafety
Autonomous coding agents Claude CodeOpenCoder Landscape
Pick the right model (2026) Model TaxonomyPricing
Evaluate AI in production AI Evals Guide (Phoenix/Langfuse)AI Evals Guide (LangWatch/Langfuse)
Find the best courses to learn AI Recommended Courses & Learning Paths
Transition from my current role to AI Role Transition Guide
Understand the 2026 AI job market Job Market Trends - May 2026NEW
Look up a term Glossary (every term defined)

Pick a path

flowchart TD
    A[New visitor] --> B{Your goal}
    B -->|Interview prep| C[Question Bank]
    B -->|Build RAG| D[RAG Fundamentals]
    B -->|Build agents| E[Agent Fundamentals]
    B -->|Pick a model| F[Model Taxonomy]
    B -->|Evaluate AI| G[AI Evals Guide]
    C --> H[Answer Frameworks]
    D --> I[Chunking + Vector DBs]
    E --> J[MCP and Tool Use]
    F --> K[Pricing 2026]
    G --> L[Phoenix or LangWatch]
Loading

🎯 Why This Guide

Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.

This Guide Printed Books
May 2026 models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4 Pro, Llama 4, Kimi K2.6, Qwen 3.6, Mistral Medium 3.5, Gemma 4) Stuck on GPT-4
MCP 2.0, A2A v1.0, OpenClaw, Computer Use, Agentic RAG, ColBERT, latent reasoning, MoE serving Does not exist
Real pricing with May 2026 verification dates Already wrong
Staff-level interview Q&A (110 questions through May 2026) + Job Market Trends Generic questions

Quick model picker (May 2026): Claude Opus 4.7 for tool-use and long-context reasoning, GPT-5.5 for general production, Gemini 3.1 Pro for multimodal, DeepSeek V4 Pro and Llama 4 for self-hosted. Full breakdown in Model Taxonomy.


🎯 What This Guide Is (and Is Not)

This guide IS:

  • A staff-level reference for designing production AI systems (RAG, agents, MCP, eval pipelines, multi-tenant isolation).
  • An interview-prep companion with 110+ real questions, answer frameworks, and whiteboard exercises through May 2026.
  • A living document tracking new model releases, protocol changes, and emerging patterns as they ship.
  • Opinionated about tradeoffs: latency vs cost, accuracy vs faithfulness, single-agent vs multi-agent.
  • Free, MIT-licensed, and open to PRs from practitioners.

This guide is NOT:

  • A tutorial on Python, PyTorch, or basic ML fundamentals (start with a course; see COURSES.md).
  • A vendor-neutral hedge; it names specific models, prices, and frameworks because real systems require real choices.
  • A replacement for hands-on building; read it alongside a project, not instead of one.
  • A research paper digest; it cites papers when they change practice, not for completeness.

📖 Guide Structure

├── 00-interview-prep/           # Questions (110), frameworks, exercises, job-market trends (May 2026)
├── 01-foundations/              # Transformers, attention, embeddings
├── 02-model-landscape/          # Claude Opus 4.7, GPT-5.5, Gemini 3.1, DeepSeek V4, Llama 4, Kimi K2.6, Qwen 3.6, Mistral Medium 3.5
├── 03-training-and-adaptation/  # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/   # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/    # Prompt engineering, CoT, Extended Thinking, DSPy, prompt injection
├── 06-retrieval-systems/        # RAG, chunking, GraphRAG, Agentic RAG, ColBERT, Contextual Retrieval
├── 07-agentic-systems/          # MCP 2.0, A2A protocol, multi-agent, computer-use
├── 08-memory-and-state/         # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/     # LangGraph, DSPy, LlamaIndex, Claude Code, OpenCoder
├── 10-document-processing/      # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/      # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/   # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/       # Pattern catalog, anti-patterns
├── 16-case-studies/             # Real-world architectures with diagrams
├── 17-tool-use-and-computer-agents/ # OpenClaw, Computer Use, tool agents, safety
├── GLOSSARY.md                  # Every term defined
│
├── ai_evals_comprehensive_study_guide.md      # 🔬 Deep-dive: AI Evals (Phoenix + Langfuse)
└── ai_evals_complete_guide_langwatch_langfuse.md  # 🔬 Deep-dive: AI Evals (LangWatch + Langfuse)
└── COURSES.md                   # 🎓 Recommended courses & learning paths
└── TRANSITION_GUIDE.md          # 🔄 Transition from Backend/QA/PM/EM to AI roles

Chapters by AI System Lifecycle Stage

mindmap
  root((AI System Design Guide))
    Foundations
      LLM Internals
      Model Landscape
      Training and Adaptation
    Build
      Prompting and Context
      Retrieval Systems
      Agentic Systems
      Tool Use and Computer Agents
    Operate
      Inference Optimization
      Memory and State
      Frameworks and Tools
      Infrastructure and MLOps
    Govern
      Security and Access
      Reliability and Safety
      Evaluation and Observability
    Apply
      Design Patterns
      Case Studies
      Interview Prep
Loading

🔥 Featured Case Studies

Real interview problems with complete solutions and diagrams:

Case Study Problem Key Patterns
Real-Time Search 5-minute data freshness at scale Streaming + Hybrid Search
Coding Agent Autonomous multi-file changes Sandboxing + Self-Correction
Multi-Tenant SaaS Coca-Cola and Pepsi on same infra Defense-in-Depth Isolation
Customer Support 60% auto-resolution rate Tiered Routing + Escalation
Document Intelligence 50K contracts/month extraction Vision-LLM + Parallel Extractors
Recommendation Engine Personalized explanations at 50M users ML Ranking + LLM Explanations
Compliance Automation FDA regulation pre-screening Claim Extraction + Precedent DB
Voice Healthcare Real-time clinical note generation On-Prem ASR + HIPAA
Fraud Detection 100ms decision with explainability ML + Rules Hybrid
Knowledge Management 2M docs with access control Permission-Aware RAG
Computer-Use Agent Expense-report automation across 3 legacy UIs Firecracker VMs + Action Gate + IPI Defense
Multi-Tenant Fine-Tuning 280 tenants on shared base + per-tenant LoRA LoRA Hot-Swap + Eval-as-PRD per Tenant
Eval-Gated CI/CD Block PRs that regress AI quality Golden Sets + LLM Judges + Statistical Correction
Customer Distillation Cut $50K/mo frontier spend to $6K with 3-mo payback Trace-Based Distillation + Canary Rollout
MCP Knowledge Agent Cross-system answers from Snowflake/Confluence/Jira/Slack MCP + OAuth Resource Server + Capability Gating

🔬 Bonus Deep-Dive Guides

Two companion guides (3,000+ lines each) covering AI evaluation end-to-end - for Engineers, PMs, and QAs:

Guide Platforms Covered What's Inside
AI Evals: Comprehensive Study Guide Arize Phoenix + Langfuse LLM-as-a-Judge, RAG eval, multi-turn eval, production safety, statistical correction with judgy, 30-day learning path
AI Evals: LangWatch + Langfuse Guide LangWatch + Langfuse Same syllabus with LangWatch's 40+ built-in evaluators, side-by-side platform comparisons, platform choice guidance

Topics covered across both guides:

  • Tracing and observability setup (Phoenix, LangWatch, Langfuse)
  • Error analysis: open coding → axial coding → failure mode taxonomy
  • Building LLM judges with Train/Dev/Test split and ground truth calibration
  • Code-based evaluators (regex, JSON schema, format validators)
  • RAG-specific evals: faithfulness, context recall, answer relevance
  • Multi-step pipeline evaluation and multi-turn conversation eval
  • Production guardrails, safety monitoring, real-time drift detection
  • Statistical correction with judgy library
  • Human annotation best practices and inter-rater reliability
  • Cost/latency optimization for eval pipelines at scale

🎓 For Interview Prep

AI engineering and system design interviews ask questions like:

"Design a multi-tenant RAG system where competitors cannot see each other's data."

"Your agent takes 15 steps for a 3-step task. How do you debug it?"

This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.

➡️ Start with Interview Prep


❓ Frequently Asked Questions

What is AI system design?

AI system design is the discipline of architecting production-grade systems built around LLMs, retrieval, agents, and evaluation. It covers model selection, RAG pipelines, agent orchestration, memory, observability, and safety. See LLM Internals and AI Design Patterns to get oriented.

How do I prepare for an AI engineering interview?

Start with the Question Bank (110 questions through May 2026), then practice with Answer Frameworks and Whiteboard Exercises. Most senior interviews test RAG design, agent debugging, multi-tenant isolation, and cost/latency tradeoffs, all covered in the Case Studies.

What is RAG (Retrieval-Augmented Generation)?

RAG is a pattern where an LLM retrieves relevant context from an external knowledge source (vector DB, search index, graph) before generating an answer, reducing hallucinations and grounding responses in your data. The full pipeline is covered in RAG Fundamentals and scaled in Production RAG at Scale.

What are AI agents and how are they different from chatbots?

AI agents are LLM-driven systems that plan, call tools, and act over multiple steps to accomplish goals, whereas chatbots typically respond in a single turn. Agents introduce loops, memory, error recovery, and tool-use via protocols like MCP. Start with Agent Fundamentals.

What is MCP (Model Context Protocol) and how does it compare to A2A?

MCP is an open protocol that lets LLMs discover and call external tools and data sources in a standardized way. A2A (Agent-to-Agent) is a complementary protocol for inter-agent communication. They solve different layers: MCP is the tool boundary, A2A is the agent boundary. See Tool Use and MCP.

Which LLM should I use in production: Claude, GPT, Gemini, or open-source?

It depends on latency budget, context length, cost per million tokens, tool-use quality, and data residency. The Model Taxonomy and Pricing chapters give a head-to-head for Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4, Llama 4, and others as of May 2026.

How do I evaluate an LLM or RAG system in production?

Combine offline evals (LLM-as-a-judge with ground-truth calibration), online metrics (faithfulness, context recall, answer relevance), and continuous tracing. The companion deep-dives AI Evals: Phoenix + Langfuse and AI Evals: LangWatch + Langfuse walk through this end-to-end.

How do I build a multi-tenant RAG system safely?

Use defense-in-depth: per-tenant indexes or namespaces, query-time access checks, and prompt-layer guards. The Multi-Tenant RAG Isolation chapter and Multi-Tenant SaaS Case Study cover the patterns that hold up in interviews and production.

What is agentic RAG?

Agentic RAG combines retrieval with an agent loop that can decide what to search, when to re-query, and when to escalate, instead of running a single fixed retrieve-then-generate pass. See Agentic RAG for the architectures and tradeoffs.

Is this guide free? Can I contribute?

Yes, MIT-licensed and free. PRs are welcome; see Contributing Guide. If you have production failure modes, new model benchmarks, or interview questions to add, open a PR.

How often is this guide updated?

Continuously. New model releases, protocol changes (MCP, A2A), and emerging patterns are added as they ship. Recent additions include Tool-Use and Computer Agents and the May 2026 Job Market Trends.

Can I use this guide if I am transitioning from backend, QA, PM, or EM into AI?

Yes. The Role Transition Guide maps existing skills to AI engineering, MLE, and AI architect tracks, with reading paths per role. Pair it with COURSES.md for curated learning resources.


🔄 Living Book

This guide tracks:

  • New model releases and real-world performance
  • Emerging patterns (MCP, Agentic RAG, Flow Engineering)
  • Updated pricing and rate limits
  • Deprecations and best practice changes

⭐ Star and Watch to get notified when updates are pushed.


🤝 Contributing

Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.


📄 License

MIT License. See LICENSE.


Built by Om Bharatiya
GitHub Twitter LinkedIn