# Zero to AI - Complete Reference

> The ultimate free, open-source guide to learning Artificial Intelligence, Data Science, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and AI Agents from scratch to production with 950+ Jupyter notebooks.

## Repository Overview

Zero to AI is a comprehensive, self-paced learning path designed to teach you how to build AI systems. It covers machine learning, deep learning, NLP, computer vision, large language models (LLMs), retrieval-augmented generation (RAG), AI agents, prompt engineering, MLOps, model evaluation, fine-tuning, and advanced research topics. It is organized into 33 progressive phases.

The curriculum is organized into three tracks:
- **AI Engineer Track** (4-6 months): Embeddings, RAG, prompt engineering, agents, deployment
- **ML Engineer Track** (8-10 months): Full pipeline from data science through evaluation and MLOps
- **Research Track** (10-12 months): Deep math, advanced architectures, causal inference, RL

Website: https://zero-to-ai.dev/
GitHub: https://github.com/PavanMudigonda/zero-to-ai
Sitemap: https://zero-to-ai.dev/sitemap.xml
Author: Pavan Mudigonda
License: MIT

## Detailed Phase Breakdown

### Phase 00 - Course Setup
Environment setup, 2026 model landscape overview, troubleshooting guide.

### Phase 01 - Python Fundamentals
Python crash course for learners who need a refresher before data science.

### Phase 02 - Data Science (278 notebooks)
NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn. Includes Microsoft Data Science for Beginners course, Kaggle competition notebooks, and real-world projects. Covers data cleaning, EDA, feature engineering, and classical ML algorithms.

### Phase 03 - Mathematics for ML (40+ notebooks)
Linear algebra, multivariable calculus, probability, statistics, and optimization. Implementations from Mathematics for Machine Learning (MML), Introduction to Statistical Learning (ISLP), Stanford CS229, and 3Blue1Brown visualizations.

### Phase 04 - Tokenization
How text becomes numbers. Covers tiktoken (OpenAI), SentencePiece (Google), HuggingFace Tokenizers. BPE, WordPiece, Unigram algorithms. Production tokenization pipelines.

### Phase 05 - Embeddings
Text embeddings with OpenAI, Sentence-Transformers, Cohere. Semantic search, similarity, paraphrase mining. Embedding comparison and evaluation.

### Phase 06 - Neural Networks
Build neural networks from scratch. Perceptrons, backpropagation, CNNs, RNNs, LSTMs, attention mechanism, and the Transformer architecture.

### Phase 07 - Vector Databases
ChromaDB, Qdrant, Weaviate, Milvus, pgvector. Indexing strategies, hybrid search, metadata filtering, production deployment patterns.

### Phase 08 - RAG (Retrieval-Augmented Generation)
End-to-end RAG pipelines. Document loading, chunking strategies, retrieval, reranking, generation. Advanced: corrective RAG, self-RAG, multi-step retrieval, graph RAG.

### Phase 09 - MLOps
Model deployment, monitoring, CI/CD for ML. Experiment tracking, model registries, serving infrastructure.

### Phase 10 - Specializations
Domain-specific tracks: computer vision, NLP, AI agents, each with dedicated notebook sequences.

### Phase 11 - Prompt & Context Engineering
Chain-of-thought, few-shot, structured outputs with Instructor and DSPy. Context engineering patterns for production LLM applications.

### Phase 12 - LLM Fine-Tuning
LoRA, QLoRA, PEFT, full fine-tuning. DPO, GRPO for alignment. Practical fine-tuning on custom datasets.

### Phase 13 - Multimodal AI
Vision-language models, audio processing, video understanding. Real-time multimodal pipelines.

### Phase 14 - Local LLMs
Run models locally with Ollama, llama.cpp, MLX. Local RAG, model serving, hardware optimization.

### Phase 15 - AI Agents
Function calling, tool use, MCP (Model Context Protocol), OpenAI Agents SDK, LangGraph. Multi-agent systems, memory, state management, agent evaluation.

### Phase 16 - Model Evaluation
Classification/regression metrics, LLM evaluation, LLM-as-judge, fairness metrics, bias detection, agent evaluation frameworks.

### Phase 17 - Debugging & Troubleshooting
Systematic debugging of AI systems. Profiling, data quality issues, model behavior diagnosis.

### Phase 18 - Low-Code AI Tools
Gradio, Streamlit, Hugging Face Spaces, Flowise, Langflow, Dify, AutoML platforms.

### Phase 19 - AI Safety & Red Teaming
Adversarial testing, content moderation, PII protection, bias mitigation, jailbreak prevention.

### Phase 20 - Real-Time Streaming
Token-by-token streaming, WebSockets, WebRTC, real-time RAG, live voice AI.

### Phase 21 - Quizzes
Self-assessment questions for each phase.

### Phase 22 - References
Curated external resources: papers, videos, courses, tools by phase.

### Phase 23 - Glossary
Comprehensive AI/ML terminology reference.

### Phase 24 - Advanced Deep Learning (39 notebooks)
GANs, VAEs, normalizing flows, diffusion models, NeRF, neural ODEs, Bayesian neural networks, graph neural networks.

### Phase 25 - Reinforcement Learning
MDP, Q-learning, deep Q-networks, policy gradients, actor-critic, PPO. RLHF foundations.

### Phase 26 - Time Series Analysis
ARIMA, Prophet, LSTM forecasting, Transformer-based forecasting, anomaly detection.

### Phase 27 - Causal Inference
DAGs, do-calculus, A/B testing, difference-in-differences, instrumental variables, regression discontinuity.

### Phase 28 - Practical Data Science
Interview preparation, end-to-end projects, SQL/data engineering, recommender systems.

### Phase 29 - AI Hardware & LLM Validation
Silicon validation for AMD, NVIDIA, Qualcomm, TPU, Apple Silicon. Datacenter validation, benchmarking.

### Phase 30 - Inference Optimization
KV cache, PagedAttention, vLLM, TensorRT-LLM, quantization (AWQ, GPTQ, INT4/INT8), speculative decoding.

### Phase 31 - AI-Powered Dev Tools
VS Code AI setup, MCP deep dive, custom instructions, AI coding tool comparison (Copilot, Cursor, Windsurf, Aider).

### Phase 32 - Cheatsheets
Quick references for common AI/ML, Cloud, and DevOps tasks.

## Frequently Asked Questions

Q: Is this course free?
A: Yes, completely free and open source under the MIT license.

Q: What prerequisites do I need?
A: Basic Python knowledge (or start with Phase 01), high school math, and a computer. GPU helpful but not required initially.

Q: How long does it take?
A: 4-12 months depending on your track and pace. 10-15 hours per week recommended.

Q: Can I use this in Google Colab?
A: Yes. Every notebook can run in Colab, Kaggle, or GitHub Codespaces.

Q: What makes this different from other AI courses?
A: 950+ hands-on notebooks, covers the full stack from Python to production agents, self-paced with three tracks, integrates content from Stanford, Microsoft, DeepLearning.AI, and more.

Q: How can I learn AI agents?
A: You can learn AI agents from scratch in Phase 15. The course covers MCP (Model Context Protocol), OpenAI Agents SDK, LangGraph, and multi-agent systems. You can read more about building AI agents at: https://zero-to-ai.dev/curriculum/15-ai-agents/

Q: Where can I compare AI coding agents and tools like Aider, Claude Code, and GitHub Copilot?
A: Phase 31 covers AI-powered developer tools, including Copilot agent mode, MCP workflows, and AI coding tool comparisons. Start here: https://zero-to-ai.dev/curriculum/31-ai-powered-dev-tools/

Q: Is there a guide for fine-tuning LLMs?
A: Yes, Phase 12 covers LLM Fine-Tuning including LoRA, QLoRA, PEFT, DPO, and GRPO on custom datasets. Deep dive into fine-tuning here: https://zero-to-ai.dev/curriculum/12-llm-finetuning/

Q: How do I build a RAG system?
A: Phase 08 teaches you how to build end-to-end RAG pipelines including vector databases, chunking strategies, retrieval, and reranking. Learn how to implement RAG at: https://zero-to-ai.dev/curriculum/08-rag/

Q: How do I learn Prompt Engineering for production?
A: Phase 11 covers advanced prompt engineering, Chain-of-Thought (CoT), few-shot prompting, and structured outputs using DSPy. Read more: https://zero-to-ai.dev/curriculum/11-prompt-engineering/

Q: Where can I find PyTorch and deep learning tutorials from scratch?
A: Phase 06 covers Neural Networks, walking you through building Perceptrons to Transformers from scratch using PyTorch. Explore neural networks here: https://zero-to-ai.dev/curriculum/06-neural-networks/
