Multi-agent orchestration, tool use, memory architecture, evals, and production debugging — taught by engineers who've shipped agentic systems into production. Not a framework tour. Real patterns, real failure modes, real code.
Day 1 covers agent architecture fundamentals and core production patterns. Day 2 goes deep on multi-agent systems, output reliability, evals, and observability.
ReAct, Plan-Execute, LATS, and reflection loops — when each pattern applies, what breaks them in production, and how to choose the right architecture before you write a line of code.
Beyond hello-world tool calls: parallel tool invocation, tool call failure recovery, tool output validation, streaming tool results, and defensive patterns for tools that return garbage.
Short-term (in-context), long-term (vector + structured store), episodic, and semantic memory — how to design memory that scales across long-running agent sessions without bloating context.
The most underrated production problem. Compaction strategies, selective summarisation, KV cache reuse, structured truncation — how agents stay coherent over thousands of turns.
Supervisor-worker, peer-to-peer, and pipeline topologies for multi-agent systems. State passing between agents, shared memory design, and avoiding the classic failure: agents talking in circles.
Forcing reliable JSON from any LLM, input/output guardrails that don't kill performance, domain-specific validation, and building self-correcting agents that catch their own schema errors.
How to write evals that actually catch failures before users do — LLM-as-judge patterns, trajectory evaluation, regression test suites for agents, and CI/CD integration for AI systems.
Tracing agent execution, identifying where reasoning goes wrong, latency profiling, cost attribution across agent chains, and building dashboards that make silent failures visible.
Every participant leaves with working code and reusable frameworks — not just slides.
Each team ships a working multi-agent system during the workshop — with memory, tool use, and basic evals wired in. Code is yours to keep and extend.
A documented framework for choosing agent architectures, orchestration patterns, and memory strategies for your specific use cases — usable by your team long after the workshop.
A ready-to-run eval harness with LLM-as-judge patterns, trajectory tests, and CI integration templates — so your team can measure agent quality from day one of production.
This is a technical workshop. Participants should be comfortable with Python and have used at least one LLM API. We skip the basics and go straight to production-grade patterns.
The thing that separates an agent demo from a production system isn't the LLM or the framework — it's knowing where the system will fail silently, what to measure, and how to recover without user-visible errors. That's what this workshop is about.
Labs are structured around real business use cases — not toy demos. We adapt these to your industry in the pre-workshop brief.
Participants should be comfortable with Python and have used at least one LLM API (OpenAI, Anthropic, or similar). We don't teach prompt engineering basics — we go straight to production patterns. Senior engineers new to AI but experienced in Python will do fine.
Labs are provider-agnostic by design — we show the same patterns on Anthropic Claude and OpenAI GPT-4o. If your team uses a specific model or hosts your own (Llama, Mistral, etc.), let us know in the pre-workshop brief and we'll adapt the examples.
We send a pre-workshop setup guide 1 week in advance: Python environment, API keys for the LLM provider we'll use, and a few libraries installed. Setup takes about 30 minutes. If your team uses corporate machines with restrictions, we'll plan around that in the brief.
Yes — we adapt the use cases and lab exercises to your domain in the pre-workshop brief. BFSI, healthcare, SaaS, logistics, and manufacturing all have specific agent patterns worth teaching to. Generic demos waste everyone's time.
It's updated every quarter. The current version covers MCP (Model Context Protocol), LangGraph's latest state management APIs, multi-agent patterns from AutoGen v0.4+, and context compaction strategies — not what was best practice in 2023.
30 days of async Q&A with the facilitators is included in every booking. You can ask specific implementation questions, get architecture reviews, or request code walkthroughs as your team moves into production.
Tell us about your team, your tech stack, and what you want to build. We'll tailor the workshop to your use cases and confirm availability.
We respond within 1 business day. No sales scripts.