Track 02

Agentic Distributed Systems

Your agent already is a distributed system. Here is the theory that explains why it breaks.

9 modules · 0 lessons published

The Agent as a NodeYour agent is a JSON-RPC node that hallucinates. Most of distributed systems still applies.

·Tool calls as RPC: HTTP, MCP, A2A as the agent transport
·Agent identity, addressing, and capability declaration
·Idempotency keys and retry semantics for tool calls

Time, Order, and CausalityAsync tools, retries, and nested agents have no global clock. Logical time still works.

·Why wall-clock time fails for agent traces
·Lamport and vector clocks applied to agent flows
·Causality as the basis for debugging multi-agent loops

Discovery, Routing, and GossipMCP registries, A2A directories, and model routers are service discovery — same patterns, new failures.

·MCP server registries: agent service discovery
·A2A and emerging agent directory protocols
·Model routing as cost-and-capability-aware load balancing

State and Consensus Across AgentsTwo agents asked the same question give two answers. Here is when consensus matters and when it does not.

·Why agents disagree even with identical prompts
·Shared scratchpads, CRDTs, and agent memory consistency
·Leader election among agent ensembles

Coordinating Long-Running WorkSagas, 2PC, orchestrator-worker — for tool calls that must commit together and agent flows that span hours.

·Sagas for multi-step agent flows with compensation
·Two-phase commit across atomic tool sequences
·Orchestrator-worker patterns for parallel fanout

Scaling Agent SystemsPrompt caches, semantic caches, model gateways, vector indexes, work queues — the agent CDN stack.

·Prompt caching as distributed caching, with new failure modes
·Semantic versus exact caches and cache invalidation
·Agent gateways and proxies (LiteLLM, OpenRouter, internal gateways)

Agent Memory & StorageAppend-only memory logs, artifact stores, and how to migrate a memory schema without amnesia.

·Append-only memory logs and replay
·Artifact stores for agent outputs
·Compaction strategies for long-running agents

Operating Agent SystemsTracing 100 agents at 3am, prompt injection as a trust boundary, schema rollouts that do not break running agents.

·Distributed tracing for agents (LangSmith, OpenTelemetry GenAI)
·Prompt injection as a security boundary problem
·Sandbox isolation and per-agent IAM

Sharding, Fanout, and Byzantine AgentsParallel agent fleets, map-reduce fanout, and what to do when sub-agents lie.

·Work partitioning across agent fleets
·Map-reduce style fanout (orchestrator/worker)
·Hot-shard problems with cost-asymmetric agents