AGAI-10110 Weeks · 60 Hours

Applied GenAI: LLMs, RAG & Agents

Go from zero GenAI knowledge to building and deploying production-grade AI applications. 10 weeks of live sessions, 8 mini-projects, 1 capstone — all on our Jupyter-powered AI platform.

Format

Live Cohort

Schedule

Sat + Sun, 3hrs

Projects

18 + Capstone

Batch Size

15–25 Students

Every Week, Three Tracks

SATURDAY — 3 Hours

Concepts & Live Demos

Instructor-led session. Theory explained with intuition, followed by live coding demos. You watch, ask questions, and understand the “why” before the “how”.

SUNDAY — 3 Hours

Hands-On Lab & Workshop

You build on the bsigma platform. Guided labs with increasing difficulty, ending in a mini-project you push to GitHub. AI instructor assists you in real time.

SUPPLEMENTAL — 2-3 Hours

Real-World Workshop

Apply the week's techniques to live data from real APIs. Build portfolio projects using Hacker News, USGS Earthquakes, GitHub, and more. Optional Challenge D for production-tier implementations.

4 Phases, 10 Weeks

Phase 1

LLM Foundations

Weeks 1–2

Phase 2

RAG — Retrieval-Augmented Generation

Weeks 3–5

Phase 3

AI Agents

Weeks 6–8

Phase 4

Fine-Tuning, Security & Capstone

Weeks 9–10

Week-by-Week Curriculum

Click any week to see the full session breakdown, lab exercises, and project details.

LLM Foundations & Your First AI Interaction

Phase 1 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

What are Large Language Models — the 10,000ft view
How LLMs are built: pre-training, SFT, RLHF (intuition, not math)
Self-attention as soft dictionary lookup — how transformers actually work
Model capability ladder — when to use which model (cost vs quality)
Reasoning models (o1, o3, DeepSeek-R1) — the next frontier beyond prompting
The LLM landscape: OpenAI, Anthropic, Google, Meta (open-source)
Live Demo: Same prompt across GPT-4, Claude, Llama — comparing outputs
Tokens, context windows, and why they matter for cost & quality
API anatomy: system/user/assistant messages, temperature, top-p

Sunday — Hands-On Lab

Set up bsigma workspace — note cells, AI prompt cells, code cells
Call multiple LLM providers via API (OpenAI, Anthropic, Ollama)
Experiment with temperature, top-p, system prompts — observe changes
Model selection experiment — same task, different models, quality comparison

Mini-Project

Build a Model Comparison Tool — same prompt, 3 models, side-by-side output with quality assessment

Real-World Workshop

AI News Intelligence Dashboard

Live APIs: Hacker News

Fetch live tech news, AI-powered categorization, audience-adapted briefings, token economics at scale

Prompt Engineering & Solving Real NLP Tasks

Phase 1 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

Prompt engineering as a discipline — why it matters
Zero-shot vs few-shot prompting with live examples
Chain-of-thought (CoT) and step-by-step reasoning
Why techniques work — CoT as scratchpad, in-context learning, system message mechanics
Role prompting and persona-based approaches
Structured output: getting JSON, tables, specific formats
Constrained decoding — structured output across providers (OpenAI, Anthropic, Outlines)
Prompt templates and reusable patterns
Prompt evaluation — consistency metrics and accuracy measurement

Sunday — Hands-On Lab

Zero-shot classification — categorize support tickets by topic
Few-shot sentiment analysis — extract sentiment + aspects from reviews
Chain-of-thought — solve multi-step reasoning problems
Cross-provider structured output — OpenAI vs Anthropic comparison
Prompt consistency experiment — same prompt, 5 runs, measure agreement

Mini-Project

Build a Customer Review Analyzer — raw reviews to structured JSON with multi-provider comparison and evaluation dashboard

Real-World Workshop

AI Recipe Transformer

Live APIs: TheMealDB

Dietary adaptation, cuisine fusion, chain-of-thought meal planning, structured nutrition extraction

Embeddings, Vector Search & RAG Fundamentals

Phase 2 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

Why LLMs hallucinate and why RAG exists
What are embeddings — turning text into numbers
How embeddings are trained — contrastive learning, InfoNCE loss, training data determines similarity
Choosing embedding models — MTEB benchmark, task-specific vs general-purpose
Semantic search vs keyword search (live comparison)
Vector databases: ChromaDB — and how they search (ANN, HNSW algorithm)
The RAG pipeline: Load → Split → Embed → Store → Retrieve → Generate
Chunking strategies: fixed-size, recursive, semantic
Principled retrieval — K as precision/recall trade-off, distance thresholding

Sunday — Hands-On Lab

Generate embeddings, visualize similarity
Set up ChromaDB, load and query a vector store
Experiment with different chunk sizes — observe retrieval quality
K selection experiment — vary K, measure noise vs recall
Distance threshold tuning — find the sweet spot between rejection and hallucination

Mini-Project

Build a Company Knowledge Base Bot — RAG with confidence-aware retrieval and distance thresholding

Real-World Workshop

AI Book Recommendation Engine

Live APIs: Open Library

Semantic search over real books, metadata filtering, RAG-powered reading recommendations

Building a Production RAG Application

Phase 2 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

The pipeline pattern — RAG in pure Python first, then LangChain
Document loaders: text, web pages, CSVs (LangChain + framework-free)
RecursiveCharacterTextSplitter — chunking with boundary awareness
Persistent ChromaDB with metadata enrichment
RAG with source citations — context labeling for traceable answers
Retrieval failure taxonomy — the 4 types and how to diagnose each
Distance thresholds and metadata filtering — when to say 'I don't know'
Hybrid search — combining BM25 keyword + semantic search from first principles
Conversational RAG — query rewriting for follow-up questions

Sunday — Hands-On Lab

Build a multi-source document loader (text + web + CSV)
Retrieval failure diagnosis lab — trigger and classify all 4 failure types
Build hybrid search from scratch — BM25 + ChromaDB with tunable alpha
Distance threshold calibration — find the sweet spot empirically

Mini-Project

Build a Document Q&A System — multi-document RAG with citations, hybrid search, threshold calibration, and failure diagnosis

Real-World Workshop

AI Country Intelligence Briefing

Live APIs: REST Countries + Open-Meteo

Multi-source RAG with live weather data, intelligent chunking, conversational retrieval

RAG Evaluation & Optimization

Phase 2 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

How do you know if your RAG is good? Evaluation frameworks
RAGAS metrics: faithfulness, answer relevancy, context precision
LLM-as-Judge: using one LLM to evaluate another
Validating your judge — Cohen's kappa, calibration, and reliability
Statistical rigor — confidence intervals, minimum sample sizes, paired comparison
Cost/latency trade-offs — the missing dimension in RAG optimization
Query expansion and HyDE (Hypothetical Document Embeddings)
Common RAG failures and how to debug them
Live Demo: Evaluating & improving a RAG pipeline end-to-end

Sunday — Hands-On Lab

Run RAGAS evaluation on your RAG chatbot — baseline score
Implement query expansion — compare retrieval before/after
A/B test: chunk size, retrieval method, re-ranking configs
Judge validation — compute Cohen's kappa, check for scoring bias
Statistical rigor lab — confidence intervals, sample size analysis, paired comparison

Mini-Project

Build a RAG Evaluation Pipeline — baseline scores, optimization experiments, judge validation, statistical rigor, cost/latency tracking

Real-World Workshop

AI Earthquake Analysis

Live APIs: USGS Earthquake

Live seismic data evaluation, retrieval metrics, LLM-as-judge, statistical rigor

AI Agents & Function Calling

Phase 3 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

What is an AI agent vs a chatbot — the key difference
Function calling / tool use — LLMs interacting with the world
Multi-provider tool calling — the common 5-step abstraction (OpenAI, Anthropic)
The agent loop pattern: observe → decide → act → repeat
Tool design principles — description quality, granularity trade-offs
Agent safety — max iterations, cost budgets, loop detection
Error handling and guardrails for production agents
Live Demo: Multi-tool personal assistant agent
Agent design patterns: ReAct, plan-and-execute, router

Sunday — Hands-On Lab

Build single-tool and multi-tool agents from scratch
Implement the agent loop with error handling and safety guards
Tool description quality experiment — measure selection accuracy
Multi-provider comparison — same agent on OpenAI and Anthropic

Mini-Project

Build a Personal Assistant Agent — 6 tools, cost tracking, safety guards, multi-provider comparison

Real-World Workshop

AI Space Tracker Agent

Live APIs: Open Notify (ISS) + Sunrise-Sunset + Open-Meteo

Real-time ISS tracking, multi-tool agent, tool design experiments

LangGraph & Agentic Workflows

Phase 3 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

The graph abstraction — why workflows need nodes, edges, and state
Pure Python graph executor — understand what frameworks do under the hood
LangGraph: nodes, edges, conditional routing, state management
Human-in-the-loop: agents that ask for approval
Agent memory: short-term (conversation) vs long-term (persistent)
Self-reflection pattern — generate, evaluate, revise cycles
Graph anti-patterns — god nodes, infinite cycles, state explosion
Live Demo: Multi-step research agent with branching logic
Comparison: LangChain agents vs LangGraph vs CrewAI

Sunday — Hands-On Lab

Build your first LangGraph workflow — 3-step research pipeline
Add conditional routing — agent decides which path to take
Implement human-in-the-loop — agent pauses for approval
Build a pure Python graph executor — understand the abstraction
Anti-pattern lab — identify and fix god nodes, infinite cycles, state explosion
Implement self-reflection — generate, evaluate, revise with scoring

Mini-Project

Build a Content Pipeline Agent — research, draft, review cycles with self-reflection, anti-pattern awareness, and graph design patterns

Real-World Workshop

AI Content Publishing Pipeline

Live APIs: Wikipedia + Hacker News + Quotable

LangGraph editorial workflow, self-reflection review cycles, checkpointing

Multi-Agent Systems & MCP

Phase 3 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

Why multi-agent systems — single agent vs specialized team
Multi-agent architectures: supervisor, sequential, hierarchical, swarm
Coordination strategies — supervisor vs debate vs consensus vs round-robin
Supervisor pattern with LangGraph — coordinator + specialized workers
Workers with tools — ReAct agents as sub-graphs
Multi-agent failure modes — cascading failures, delegation loops, conflicting outputs
Model Context Protocol (MCP) — the universal tool standard, JSON-RPC under the hood
Raw MCP messages — understanding initialize, tools/list, tools/call
Building MCP servers with FastMCP and connecting to agents

Sunday — Hands-On Lab

Build supervisor + workers multi-agent system from scratch
Create a custom MCP server with 5 tools and connect it to agents
Raw MCP messages exercise — write JSON-RPC by hand
Multi-agent failure mode detection and fixing
Coordination strategy comparison — supervisor vs round-robin

Mini-Project

Build a Multi-Agent NovaTech System — supervisor + specialized workers + MCP tool server, with failure recovery and coordination strategy comparison

Real-World Workshop

AI OSINT Intelligence Team

Live APIs: GitHub + Hacker News + Wikipedia

Multi-agent competitive intelligence, MCP server, supervisor pattern

Fine-Tuning & LLM Security

Phase 4 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

When to fine-tune: prompting vs RAG vs fine-tuning decision framework
Fine-tuning pitfalls — catastrophic forgetting, overfitting, cost-benefit analysis
LoRA & QLoRA: fine-tuning with limited compute
Why LoRA works — low-rank weight updates, rank selection, forgetting prevention
Dataset preparation for instruction tuning
Live Demo: Fine-tune a small model with Unsloth on Colab
LLM security: prompt injection, jailbreaks, data leakage
Alignment as the first defense — RLHF, Constitutional AI, guardrails frameworks
Guardrails: input/output validation, topic filtering, OWASP Top 10

Sunday — Hands-On Lab

Prepare a fine-tuning dataset in chat format
Fine-tune a small LLM (Llama 3.2 1B) with QLoRA on Colab
Compare base model vs fine-tuned model — with catastrophic forgetting check
Add input guardrails to your RAG app — detect prompt injection
Red team lab — attack and defend prompt injection in pairs

Mini-Project

Fine-tune a model + add guardrails to RAG app, with forgetting checks, held-out validation, and red team exercise. Capstone kickoff.

Real-World Workshop

AI Security Red Team Lab

Live APIs: NVD (NIST CVE database)

Real vulnerability data, layered defenses, automated red team attacks

Capstone Project & Graduation

Phase 4 · 6 hours (Sat + Sun)

Saturday — Concepts & Demos

The journey so far — 9 weeks in review, the full AI toolkit
Capstone architecture walkthrough — layered system design
Framework independence — architecture-first thinking, the portability test
Advanced orchestration — self-reflection, parallel execution, human-in-the-loop
Production deployment — FastAPI, monitoring, observability, graceful degradation
Instructor-guided capstone work session
1-on-1 mentoring and code reviews

Sunday — Hands-On Lab

Student Presentations — live demo + architecture walkthrough (10 min each)
Framework-independent architecture description exercise
Production readiness review — latency tracking, deployment config
Peer feedback and Q&A
Certificate ceremony
Next steps: how to keep learning, community access

Mini-Project

Capstone — NovaTech AI Assistant with self-reflection workflows, framework-independent architecture, and production deployment sketch

Real-World Workshop

Build Your Own AI Startup MVP

Live APIs: Student's choice (all prior APIs)

Choose from 5 startup ideas, combine all course techniques, investor pitch

Every Week

Real-World Workshops

Every week includes a supplemental workshop using live APIs — no mock data, no tutorials, real production data that changes with every run. Each includes an optional Challenge D for production-tier implementations.

AI News Intelligence Dashboard

Hacker News

Fetch live tech news, AI-powered categorization, audience-adapted briefings, token economics at scale

AI Recipe Transformer

TheMealDB

Dietary adaptation, cuisine fusion, chain-of-thought meal planning, structured nutrition extraction

AI Book Recommendation Engine

Open Library

Semantic search over real books, metadata filtering, RAG-powered reading recommendations

AI Country Intelligence Briefing

REST Countries + Open-Meteo

Multi-source RAG with live weather data, intelligent chunking, conversational retrieval

AI Earthquake Analysis

USGS Earthquake

Live seismic data evaluation, retrieval metrics, LLM-as-judge, statistical rigor

AI Space Tracker Agent

Open Notify (ISS) + Sunrise-Sunset + Open-Meteo

Real-time ISS tracking, multi-tool agent, tool design experiments

AI Content Publishing Pipeline

Wikipedia + Hacker News + Quotable

LangGraph editorial workflow, self-reflection review cycles, checkpointing

AI OSINT Intelligence Team

GitHub + Hacker News + Wikipedia

Multi-agent competitive intelligence, MCP server, supervisor pattern

AI Security Red Team Lab

NVD (NIST CVE database)

Real vulnerability data, layered defenses, automated red team attacks

Build Your Own AI Startup MVP

Student's choice (all prior APIs)

Choose from 5 startup ideas, combine all course techniques, investor pitch

Week 10

Capstone Project

Choose one of these projects — or propose your own. You'll build it, deploy it, and present it live on Demo Day.

AI Research Assistant

RAG + Agents + Web Search + Summarization

Customer Support Bot

RAG + Guardrails + Multi-turn Conversation

Code Review Agent

Agents + Tool Use + GitHub Integration

Document Intelligence System

Advanced RAG + Multi-modal + Evaluation

Multi-Agent Content Studio

Multi-Agent + MCP + External APIs

+Propose your own project

What You Walk Away With

📂

18 Projects on GitHub

8 guided mini-projects + 10 real-world workshops with live APIs, each portfolio-ready with README

🚀

1 Deployed Capstone

A full-stack GenAI app — deployed, demo-ready, and shareable

💻

bsigma.ai Platform Access

2 workspaces, 10 notebooks each. Top up credits for more power anytime.

🎓

Certificate of Completion

Official bsigma.ai certificate to showcase on LinkedIn

👥

Private Community Access

Alumni community for ongoing support, networking, and job leads

🎬

Lifetime Session Recordings

All 20 sessions recorded — revisit any topic anytime

Prerequisites

✓Comfortable with Python (variables, functions, loops, dictionaries)
✓Basic understanding of APIs (what a REST API is, HTTP requests)
✓A laptop with internet access
ℹNo ML/AI background required — we start from zero on GenAI concepts
ℹNo GPU or expensive hardware needed — everything runs on our platform

Tools & Platforms Used

bsigma.ai

Primary lab environment

LangChain

Agent framework

LangGraph

Stateful workflows

CrewAI

Multi-agent systems

ChromaDB

Vector database

Ollama

Local LLMs

HuggingFace

Models & deployment

FastAPI

API deployment

Hacker News API

Live tech news

GitHub API

Repository analytics

USGS / Open-Meteo

Earthquake & weather

10+ Live APIs

Real-world data sources

Ready to Build with GenAI?

Batch 1 starts March 2026. Limited to 25 seats.

Join the Waitlist View Pricing