Claude Agent SDK
Part 10 of 11: Agent Architecture & AI Model Internals Series AI Doctor Assistant Project
Table of Contents
- Learning Objectives
- What Is the Claude Agent SDK?
- Architecture: How the SDK Works Under the Hood
- The query() Function
- Structured Output
- Error Handling
- Tool Creation (V2 Preview)
- Testing and Mocking
- Permission Modes
- Hooks and Observability (V2 Preview)
- Summary
Learning Objectives
After reading this document, you will understand:
- Where the Claude Agent SDK sits in the abstraction spectrum (raw API → SDK → framework) and why AI Doctor chose it
- How the SDK works under the hood — the CLI subprocess model, message flow, and why it does not call the API directly
- What
query()does, parameter by parameter, and how async iteration works - How structured output flows from a Pydantic model to a JSON Schema to a validated response
- What errors the SDK can throw and how to wrap them in domain-specific exceptions
- How to create custom tools for V2 using the
@tooldecorator and MCP server registration - How to test agent code without calling a real LLM
- When to use each permission mode (
bypassPermissions,acceptEdits,default,plan) - What hooks enable for tool-level control and observability
Key mental models to internalize:
- The SDK is a thin orchestration layer. It manages the agentic loop so you do not have to, but it does not hide the API's semantics.
query()is an async generator. Eachyieldis a message in the conversation — you iterate, not poll.- Structured output is enforced at the API level, not parsed after the fact. The model is constrained to produce valid JSON matching your schema.
What this document is NOT:
- A framework comparison. Doc 07 covers that in the "When to Use What" decision tree and the comparison table. This document goes deep on the SDK itself.
- A tutorial for the raw Anthropic API. Doc 01 covers authentication, messages, and parameters.
1. What Is the Claude Agent SDK?
The Claude Agent SDK is a Python (and TypeScript) library that sits between your application code and the Claude API. It automates the agentic loop — the cycle of sending a prompt, receiving a response, dispatching tool calls, and looping until done — while staying close enough to the API that you can reason about what is happening.
Where It Sits
Doc 07 introduced three levels of abstraction for building with LLMs:
┌────────────────────────────────────────────────────────────┐
│ Level 1: Raw API (anthropic SDK) │
│ You write the loop. You dispatch tools. Full control. │
├────────────────────────────────────────────────────────────┤
│ Level 2: Agent SDK (claude_agent_sdk) ← YOU ARE HERE │
│ Loop handled. Tools registered. Structured output built │
│ in. Still single-provider, still close to the metal. │
├────────────────────────────────────────────────────────────┤
│ Level 3: Framework (LangChain, LlamaIndex, CrewAI) │
│ Multi-provider. Rich ecosystem. Heavy abstraction. │
└────────────────────────────────────────────────────────────┘
What the SDK Automates vs What You Still Control
| SDK Handles | You Control |
|-------------|-------------|
| Agentic loop (call → check → dispatch → repeat) | System prompt content |
| Tool dispatch and result routing | Which tools to register and allow |
| Structured output schema enforcement | Schema design (Pydantic models) |
| CLI process lifecycle | When to call query() and what prompt to send |
| Permission model enforcement | Which permission mode to use |
| Error propagation from CLI subprocess | How to handle each error type |
| Async iteration over message stream | Business logic for each message type |
What This Document Covers vs Doc 07
Doc 07's coverage (~180 lines) answers "why the SDK instead of alternatives?" — positioning it in the framework landscape with a comparison table and decision tree.
This document answers "how does the SDK actually work?" — architecture, query() internals, structured output flow, error handling, tools, testing, permissions, and hooks. If Doc 07 is the menu, this document is the recipe.
2. Architecture: How the SDK Works Under the Hood
The CLI Subprocess Model
The Claude Agent SDK does not make HTTP requests to api.anthropic.com directly. Instead, it spawns the Claude Code CLI as a child process and communicates via stdin/stdout:
┌──────────────────────┐
│ Your Application │
│ (briefing_service) │
└──────────┬───────────┘
│ query(prompt, options)
▼
┌──────────────────────┐
│ Claude Agent SDK │
│ (claude_agent_sdk) │
│ │
│ - Serializes options│
│ - Spawns subprocess │
│ - Parses JSON msgs │
└──────────┬───────────┘
│ stdin: JSON commands
│ stdout: JSON messages
▼
┌──────────────────────┐
│ Claude Code CLI │
│ (child process) │
│ │
│ - Manages API auth │
│ - Runs agentic loop │
│ - Executes tools │
│ - Enforces perms │
└──────────┬───────────┘
│ HTTPS
▼
┌──────────────────────┐
│ Anthropic Messages │
│ API │
│ api.anthropic.com │
└──────────────────────┘
Why a Subprocess?
You might ask: why not just call the API directly like the anthropic Python SDK does?
-
Tool execution environment. The CLI provides a sandboxed environment where Claude can execute tools (Read files, run Bash commands, etc.). Your Python process does not need to implement a file reader or shell executor — the CLI already has them.
-
Permission enforcement. The CLI's permission model (which tools are allowed, which need human approval) is battle-tested. The SDK inherits this without reimplementing it.
-
Auth delegation. The CLI handles API key management, including reading from environment variables or the Claude Code configuration. Your code passes
modelandsystem_prompt, not API keys (though you can configureANTHROPIC_API_KEYin your environment for production — seebackend/src/config.py). -
Consistent behavior. The same agentic loop that powers
claudeon the command line powers your SDK calls. Bug fixes and improvements to the CLI propagate to the SDK automatically.
Message Flow: The Full Round Trip
Here is what happens when you call query():
Your Code SDK CLI Process API
│ │ │ │
│ query(prompt, opts) │ │ │
│───────────────────────>│ │ │
│ │ spawn subprocess │ │
│ │────────────────────────>│ │
│ │ stdin: JSON config │ │
│ │────────────────────────>│ │
│ │ │ POST /messages │
│ │ │───────────────────>│
│ │ │ 200 OK + response │
│ │ │<───────────────────│
│ │ stdout: JSON message │ │
│ │<────────────────────────│ │
│ yield AssistantMessage│ │ │
│<───────────────────────│ │ │
│ │ │ │
│ ... (tool calls if │ │ │
│ max_turns > 1) ... │ │ │
│ │ │ │
│ │ stdout: result JSON │ │
│ │<────────────────────────│ │
│ yield ResultMessage │ │ │
│<───────────────────────│ │ │
│ │ process exits │ │
│ │<────────────────────────│ │
│ iteration ends │ │ │
│ │ │ │
Key observations:
- Async iteration. Each
yieldcorresponds to a message from the CLI's stdout. You process messages as they arrive, not all at once. - Process lifecycle. The CLI process starts when
query()begins iterating and exits when the conversation completes. Onequery()call = one subprocess. - JSON over pipes. All communication between the SDK and CLI uses newline-delimited JSON over stdin/stdout. The SDK parses each line into typed Python objects (
AssistantMessage,ResultMessage, etc.).
System Prompt Placement
In Doc 01, we discussed how the raw API accepts a system parameter separate from the messages array. The SDK follows the same pattern:
# SDK: system_prompt is a separate field, NOT a message
options = ClaudeAgentOptions(
system_prompt="You are a clinical decision support assistant...",
)
# This becomes the "system" parameter in the API call.
# It is NOT injected into the messages array as {"role": "system", ...}.
This matters because the system prompt receives special treatment in the transformer's attention mechanism — it is always "visible" to the model without consuming a conversation turn. See Doc 01's discussion of the system parameter for details.
3. The query() Function
query() is the primary entry point for one-shot SDK interactions. It sends a prompt, runs the agentic loop, and yields messages as an async generator.
Signature
from claude_agent_sdk import query, ClaudeAgentOptions
async for message in query(prompt: str, options: ClaudeAgentOptions = None):
# process each message
...
ClaudeAgentOptions
The options object configures every aspect of the agent's behavior:
options = ClaudeAgentOptions(
system_prompt: str, # System prompt (separate from messages)
model: str, # Model ID: "claude-opus-4-6", "claude-sonnet-4-5-20250929"
output_format: dict, # Structured output schema (see Section 4)
max_turns: int, # Max agentic loop iterations (safety limit)
permission_mode: str, # "default", "acceptEdits", "plan", "bypassPermissions"
allowed_tools: list[str], # Which tools the agent can use
cwd: str, # Working directory for tool execution
hooks: dict, # PreToolUse/PostToolUse hooks (see Section 9)
can_use_tool: callable, # Permission callback function
)
Let us walk through each parameter:
| Parameter | Purpose | AI Doctor Value |
|-----------|---------|-----------------|
| system_prompt | Instructions that frame the agent's role | Clinical decision support prompt (60 lines) |
| model | Which Claude model to use | settings.ai_model → "claude-opus-4-6" |
| output_format | JSON Schema for structured responses | PatientBriefing.model_json_schema() |
| max_turns | Safety limit on agentic loop iterations | 2 (no tool use → only needs 1 turn + result) |
| permission_mode | How tool permissions are handled | "bypassPermissions" (backend automation) |
| allowed_tools | Whitelist of permitted tools | Not set in V1 (no tools needed) |
| cwd | Working directory for file/bash tools | Not set in V1 (no file operations) |
| hooks | Pre/post tool use callbacks | Not set in V1 (no observability yet) |
Async Iteration Pattern
query() returns an async generator. You iterate with async for:
async for message in query(prompt="...", options=options):
if isinstance(message, AssistantMessage):
# The model's text response or tool use requests
for block in message.content:
if isinstance(block, TextBlock):
print(block.text)
elif isinstance(block, ToolUseBlock):
print(f"Tool call: {block.name}")
elif isinstance(message, ResultMessage):
# Final message — contains structured output, cost, duration
if message.structured_output:
data = message.structured_output # dict matching your schema
print(f"Cost: ${message.total_cost_usd:.4f}")
print(f"Turns: {message.num_turns}")
The message types you will encounter:
| Type | When | Contains |
|------|------|----------|
| AssistantMessage | Each model response | content blocks: TextBlock, ToolUseBlock, ThinkingBlock |
| UserMessage | Tool results fed back | content blocks: TextBlock, ToolResultBlock |
| SystemMessage | System events | subtype and data fields |
| ResultMessage | Conversation complete | structured_output, result, is_error, total_cost_usd, num_turns, duration_ms |
AI Doctor: Walking Through briefing_service.py
AI DOCTOR EXAMPLE:
Here is how the AI Doctor's briefing service (backend/src/services/briefing_service.py)
uses query(), annotated line by line:
# 1. Build the options — everything the agent needs to know
options = ClaudeAgentOptions(
system_prompt=SYSTEM_PROMPT, # 60-line clinical prompt (lines 25-63)
model=settings.ai_model, # From config.py: "claude-opus-4-6"
output_format={ # Structured output (Section 4)
"type": "json_schema",
"schema": PatientBriefing.model_json_schema(),
},
max_turns=2, # 1 turn for response + 1 safety margin
permission_mode="bypassPermissions", # No human approval needed
)
# 2. Send the patient JSON as the prompt and iterate
result = None
async for message in query(prompt=patient_json, options=options):
# 3. Look for the final result
if isinstance(message, ResultMessage):
# 4. Happy path: validate structured output against Pydantic
if not message.is_error and message.structured_output is not None:
briefing = PatientBriefing.model_validate(message.structured_output)
result = BriefingResponse(
**briefing.model_dump(),
generated_at=datetime.datetime.now(datetime.UTC),
)
# 5. Error path: agent reported an error
if message.is_error:
raise BriefingGenerationError(
code="AGENT_ERROR",
message=message.result or "Agent returned an error",
)
Why max_turns=2?
AI Doctor V1 has NO tools. The agent receives the full patient record in
the prompt and responds with structured output. This needs exactly 1 turn.
max_turns=2 provides a safety margin — if the model somehow requests a
tool call, the loop will terminate after 2 iterations instead of running
indefinitely.
Why patient_json as the prompt (not in messages)?
query() accepts a single prompt string. The SDK places it as the user
message. The system_prompt goes into the separate system field. This
mirrors the raw API pattern from Doc 01: system is separate, user
message is in the messages array.
query() vs ClaudeSDKClient
The SDK offers two usage patterns:
| | query() | ClaudeSDKClient |
|-|-----------|-------------------|
| Pattern | One-shot, stateless | Multi-turn, stateful |
| Lifecycle | Single async generator | Context manager (async with) |
| State | No conversation memory | Maintains conversation across queries |
| Use when | Single prompt → result | Interactive/conversational agents |
| AI Doctor | Yes (one patient → one briefing) | Not used in V1 |
# query() — stateless, one-shot (AI Doctor pattern)
async for message in query(prompt="Analyze this", options=options):
...
# ClaudeSDKClient — stateful, multi-turn
async with ClaudeSDKClient(options=options) as client:
await client.query("First question")
async for msg in client.receive_response():
...
await client.query("Follow-up question") # remembers context
async for msg in client.receive_response():
...
AI Doctor uses query() because each briefing is independent — there is no conversation to maintain. The patient record goes in, the briefing comes out, done.
4. Structured Output
Structured output is how you get the model to return data in a predictable, validated format instead of free-form text. The SDK enforces this at the API level — the model is constrained to produce JSON matching your schema.
The Flow: Pydantic → JSON Schema → API → Validated Response
┌─────────────────┐ model_json_schema() ┌─────────────────┐
│ Pydantic Model │ ──────────────────────────> │ JSON Schema │
│ (schemas.py) │ │ (dict) │
└─────────────────┘ └────────┬────────┘
│
output_format │
in options │
▼
┌─────────────────┐
│ ClaudeAgent │
│ Options │
└────────┬────────┘
│
query() │
▼
┌─────────────────┐
│ Claude API │
│ (constrained │
│ generation) │
└────────┬────────┘
│
structured_output (dict)
│
▼
┌─────────────────┐
│ model_validate()│
│ → Pydantic obj │
└─────────────────┘
Step 1: Define the Pydantic Models
The schema is your contract. Every field, every type, every constraint.
AI DOCTOR EXAMPLE:
The AI Doctor's output schema lives in backend/src/models/schemas.py.
Three nested models compose into PatientBriefing:
class Flag(BaseModel):
category: Literal["labs", "medications", "screenings", "ai_insight"]
severity: Literal["critical", "warning", "info"]
title: str
description: str
source: Literal["ai"] # Always "ai" in V1
suggested_action: str | None = None
class Summary(BaseModel):
one_liner: str
key_conditions: list[str]
relevant_history: str
class SuggestedAction(BaseModel):
action: str
reason: str
priority: int
class PatientBriefing(BaseModel):
flags: list[Flag]
summary: Summary
suggested_actions: list[SuggestedAction]
Note the use of Literal types. The model CANNOT return severity: "high"
or category: "other" — the JSON Schema constrains generation to only the
enum values you define.
Step 2: Convert to JSON Schema
Pydantic v2's model_json_schema() converts your model to a JSON Schema dict:
schema = PatientBriefing.model_json_schema()
# Returns a dict like:
# {
# "type": "object",
# "properties": {
# "flags": {"type": "array", "items": {"$ref": "#/$defs/Flag"}},
# "summary": {"$ref": "#/$defs/Summary"},
# "suggested_actions": {"type": "array", "items": {"$ref": "#/$defs/SuggestedAction"}}
# },
# "required": ["flags", "summary", "suggested_actions"],
# "$defs": { ... }
# }
Step 3: Pass to ClaudeAgentOptions
options = ClaudeAgentOptions(
output_format={
"type": "json_schema", # Tell the API to use JSON Schema mode
"schema": schema, # The schema dict from step 2
},
...
)
The output_format dict tells the API: "constrain the model's output to valid JSON matching this schema." This is not the same as asking the model "please respond in JSON" in the system prompt — the API enforces it during token generation.
Step 4: Validate the Response
The ResultMessage.structured_output field contains the parsed dict. You validate it with Pydantic:
if isinstance(message, ResultMessage):
if message.structured_output is not None:
# structured_output is a dict — validate it into a typed object
briefing = PatientBriefing.model_validate(message.structured_output)
# Now briefing.flags[0].severity is a typed Literal, not a raw string
Why validate if the API already enforced the schema? Defense in depth. The API guarantees valid JSON structure, but model_validate() also runs Pydantic validators (custom validators, field constraints) and gives you a typed Python object instead of a raw dict.
Common Pitfall: model_validate vs parse_obj
Pydantic v2 renamed parse_obj() to model_validate(). If you see parse_obj in older code, it is the v1 pattern:
# WRONG (Pydantic v1 pattern)
briefing = PatientBriefing.parse_obj(data)
# RIGHT (Pydantic v2 pattern — what AI Doctor uses)
briefing = PatientBriefing.model_validate(data)
5. Error Handling
The SDK defines a hierarchy of exceptions that map to different failure modes in the CLI subprocess model. Each exception tells you where the failure occurred.
Exception Hierarchy
┌────────────────────────────────────────────────────────────┐
│ Your Code calls query() │
│ │
│ What can go wrong? │
│ │
│ 1. CLINotFoundError │
│ └─ The Claude Code CLI binary is not installed │
│ or not on PATH │
│ │
│ 2. CLIConnectionError │
│ └─ The CLI process started but communication │
│ over stdin/stdout failed │
│ │
│ 3. ProcessError │
│ └─ The CLI process crashed (non-zero exit code) │
│ │
│ 4. CLIJSONDecodeError │
│ └─ The CLI sent output that was not valid JSON │
│ │
│ 5. ResultMessage.is_error = True │
│ └─ The conversation completed but the model │
│ reported an error (not a Python exception) │
└────────────────────────────────────────────────────────────┘
| Exception | Cause | Typical Resolution |
|-----------|-------|--------------------|
| CLINotFoundError | CLI binary missing | Install Claude Code CLI (npm install -g @anthropic-ai/claude-code) |
| CLIConnectionError | Pipe communication failed | Check process resources, restart |
| ProcessError | CLI process crashed | Check CLI logs, API key validity |
| CLIJSONDecodeError | Malformed CLI output | SDK/CLI version mismatch, update both |
| is_error=True | Model-level error | Check prompt, schema, model availability |
Wrapping SDK Errors in Domain Exceptions
A good practice is to catch SDK exceptions and re-raise them as your application's domain exceptions. This decouples your business logic from the SDK's error types.
AI DOCTOR EXAMPLE:
The briefing service (backend/src/services/briefing_service.py, lines 66-151)
wraps every SDK error into a BriefingGenerationError with a domain-specific
error code:
class BriefingGenerationError(Exception):
def __init__(self, code: str, message: str) -> None:
self.code = code # e.g., "CLI_NOT_FOUND", "PROCESS_ERROR"
self.message = message # Human-readable description
# In generate_briefing():
try:
async for message in query(prompt=patient_json, options=options):
...
except CLINotFoundError:
raise BriefingGenerationError(
code="CLI_NOT_FOUND",
message="Claude Code CLI not found. Ensure it is installed.",
)
except CLIConnectionError as e:
raise BriefingGenerationError(
code="CLI_CONNECTION_ERROR",
message=f"Failed to connect to Claude CLI: {e}",
)
except ProcessError as e:
raise BriefingGenerationError(
code="PROCESS_ERROR",
message=f"Agent process failed: {e}",
)
except CLIJSONDecodeError as e:
raise BriefingGenerationError(
code="JSON_DECODE_ERROR",
message=f"Failed to parse agent response: {e}",
)
This pattern has three benefits:
1. The router layer catches BriefingGenerationError, not SDK errors.
Changing SDKs later does not change the router.
2. Error codes ("CLI_NOT_FOUND") are stable strings for API consumers.
3. The original exception is chained (via raise ... from) for debugging.
Note: BriefingGenerationError is re-raised early in the except chain to
avoid accidentally catching it in later except blocks.
6. Tool Creation (V2 Preview)
V1 Note: The AI Doctor V1 does not use tools. This section previews V2 patterns for context. Skip to Section 7 (Testing) if you only need V1 knowledge.
In V2, agents will have tools — functions they can call to retrieve data, perform calculations, or interact with external systems. The SDK provides a decorator-based tool creation pattern.
The @tool Decorator
from claude_agent_sdk import tool
@tool("fetch_patient", "Retrieve a patient record by ID", {"patient_id": int})
async def fetch_patient(args: dict) -> dict:
patient = await db.get_patient(args["patient_id"])
return {
"content": [{
"type": "text",
"text": json.dumps(patient.to_dict())
}]
}
The decorator registers three things:
- Name (
"fetch_patient") — how the model refers to the tool - Description — helps the model decide when to use it (see Doc 07's tool definitions)
- Parameters schema — JSON Schema for the tool's input
MCP Server Registration
Tools are grouped into MCP servers (see Doc 09's MCP protocol coverage):
from claude_agent_sdk import create_sdk_mcp_server
tools_server = create_sdk_mcp_server(
name="briefing_tools",
version="1.0.0",
tools=[fetch_patient, check_lab_ranges, get_drug_interactions]
)
Tool Naming Convention
When referencing tools in allowed_tools, use the MCP naming format:
# Format: mcp__<server_name>__<tool_name>
options = ClaudeAgentOptions(
allowed_tools=[
"mcp__briefing_tools__fetch_patient",
"mcp__briefing_tools__check_lab_ranges",
],
)
This naming convention connects to the MCP protocol from Doc 09 — each tool is namespaced by its server.
Why V1 Has No Tools
AI DOCTOR EXAMPLE:
In V1, the briefing service sends the FULL patient record as the prompt:
patient_json = _serialize_patient(patient)
async for message in query(prompt=patient_json, options=options):
...
The agent receives everything it needs in one message. There is nothing
to "look up" — no reason to call a tool.
V2 will change this. Instead of sending the full record, the agent will
receive a patient_id and use tools to:
- fetch_patient: Get demographics and conditions
- check_lab_ranges: Compare labs against reference ranges
- get_drug_interactions: Check medication combinations
This lets the agent reason about WHAT data it needs, not just analyze
what it is given. max_turns will increase from 2 to allow multiple
tool call rounds.
7. Testing and Mocking
Golden rule: NEVER call a real LLM in unit tests.
LLM calls are slow (~1-5 seconds), expensive (tokens cost money), and non-deterministic (same prompt can produce different output). Tests must be fast, free, and repeatable.
Strategy: Mock at the SDK Boundary
Mock query() itself — not the HTTP client, not the CLI process. This gives you a clean seam between "SDK behavior" (mocked) and "your business logic" (tested).
from unittest.mock import patch
@patch("src.services.briefing_service.query")
async def test_generate_briefing_success(mock_query, fake_patient):
# Set up mock to return a fake ResultMessage
msg = _make_result_message(structured_output=VALID_STRUCTURED_OUTPUT)
mock_query.return_value = _async_iter([msg])
# Call the real business logic
result = await generate_briefing(fake_patient)
# Assert on the business logic's output
assert len(result.flags) == 1
assert result.flags[0].title == "HbA1c elevated"
Creating Fake ResultMessage Objects
The test needs objects that pass isinstance(msg, ResultMessage) checks. The pattern:
def _make_result_message(*, structured_output=None, is_error=False, result=None):
"""Create a mock ResultMessage with the given fields."""
msg = AsyncMock()
msg.structured_output = structured_output
msg.is_error = is_error
msg.result = result
# Make isinstance() work by setting the class
msg.__class__ = ResultMessage
return msg
The msg.__class__ = ResultMessage trick is critical — without it, isinstance(msg, ResultMessage) returns False and your business logic skips the mock.
Async Generator Helper
query() returns an async generator, so your mock must too:
async def _async_iter(items):
"""Async generator that yields each item."""
for item in items:
yield item
# Usage in test:
mock_query.return_value = _async_iter([msg])
Testing Error Paths
AI DOCTOR EXAMPLE:
The test suite (backend/tests/test_briefing_service.py) tests four scenarios:
1. Happy path: valid structured output → BriefingResponse
mock_query returns a ResultMessage with structured_output=VALID_DICT
Assert: result.flags[0].title == "HbA1c elevated"
2. Agent error: model reports an error
mock_query returns a ResultMessage with is_error=True
Assert: raises BriefingGenerationError(code="AGENT_ERROR")
3. No result: agent yields nothing
mock_query returns an empty async iterator
Assert: raises BriefingGenerationError(code="NO_RESULT")
4. CLI not found: SDK throws CLINotFoundError
mock_query.side_effect = CLINotFoundError()
Assert: raises BriefingGenerationError(code="CLI_NOT_FOUND")
Each test verifies that the error wrapping from Section 5 works correctly —
SDK exceptions become domain exceptions with stable error codes.
What NOT to Test
- Do not test that the SDK correctly calls the API. That is the SDK's job.
- Do not test that
model_json_schema()produces correct JSON Schema. That is Pydantic's job. - DO test your business logic: prompt construction, response validation, error wrapping, result transformation.
8. Permission Modes
The SDK's permission model controls what the agent can do without human approval. This maps directly to the CLI's permission system.
Available Modes
┌───────────────────────────────────────────────────────────────────────┐
│ PERMISSION MODES │
│ │
│ "default" Standard behavior. CLI prompts for │
│ tool permissions as configured. │
│ │
│ "acceptEdits" Auto-accept file edits (Read, Write, Edit). │
│ Other tools still require approval. │
│ Use for: coding assistants, refactoring tools. │
│ │
│ "plan" Planning mode — the agent can read and │
│ analyze but NOT execute changes. │
│ Use for: code review, architecture analysis. │
│ │
│ "bypassPermissions" Skip ALL permission checks. The agent can │
│ use any allowed tool without approval. │
│ Use for: backend automation, CI/CD pipelines. │
│ ⚠ Use with caution — no human in the loop. │
└───────────────────────────────────────────────────────────────────────┘
When to Use Which
| Mode | Use Case | Risk Level |
|------|----------|------------|
| default | Interactive applications with a human watching | Low |
| acceptEdits | Coding assistants where file changes are expected | Medium |
| plan | Read-only analysis, code review, planning | Low |
| bypassPermissions | Backend services, automation, CI/CD | High (but controlled) |
Combining with allowed_tools
Permission modes work alongside allowed_tools for defense in depth:
# Read-only agent: can only search and read, no modifications
options = ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep"],
permission_mode="bypassPermissions", # Safe because tools are read-only
)
# Full-power agent with human oversight
options = ClaudeAgentOptions(
allowed_tools=["Read", "Write", "Bash"],
permission_mode="default", # Human approves each tool use
)
AI DOCTOR EXAMPLE:
The briefing service uses bypassPermissions because:
1. It runs as a backend service — no human is watching each request.
2. V1 has NO tools — the agent only produces structured output.
There is nothing to "permit" or "deny."
3. max_turns=2 limits the blast radius even if the model tried
to use a tool unexpectedly.
options = ClaudeAgentOptions(
permission_mode="bypassPermissions",
max_turns=2,
)
In V2, when tools are added, this will likely change to a more
restrictive mode — or use allowed_tools to whitelist only the
specific tools the briefing agent needs.
9. Hooks and Observability (V2 Preview)
V1 Note: The AI Doctor V1 does not use hooks or observability. This section previews V2 patterns. Skip to Section 10 (Summary) if you only need V1 knowledge.
Hooks let you intercept tool use before and after execution. They are the SDK's extension point for security, logging, and observability.
Hook Types
| Hook | When It Fires | Use Case |
|------|---------------|----------|
| PreToolUse | Before a tool executes | Validation, blocking dangerous commands, logging |
| PostToolUse | After a tool completes | Logging results, metrics, audit trail |
HookMatcher Pattern
Hooks are registered with matchers that filter which tools they apply to:
from claude_agent_sdk import ClaudeAgentOptions, HookMatcher
async def validate_bash(input_data, tool_use_id, context):
"""Block dangerous bash commands."""
if input_data["tool_name"] == "Bash":
command = input_data["tool_input"].get("command", "")
if "rm -rf" in command:
return {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "Dangerous command blocked",
}
}
return {}
async def log_tool_use(input_data, tool_use_id, context):
"""Log all tool invocations for audit."""
print(f"Tool: {input_data.get('tool_name')}")
return {}
options = ClaudeAgentOptions(
hooks={
"PreToolUse": [
HookMatcher(matcher="Bash", hooks=[validate_bash]), # Only Bash
HookMatcher(hooks=[log_tool_use]), # All tools
],
"PostToolUse": [
HookMatcher(hooks=[log_tool_use]),
],
}
)
The matcher field filters by tool name. Omitting it matches all tools.
Langfuse Integration Pattern (V2)
For production observability, hooks can forward data to Langfuse:
from langfuse import Langfuse
langfuse = Langfuse()
async def langfuse_hook(input_data, tool_use_id, context):
langfuse.trace(
name=f"tool:{input_data.get('tool_name')}",
input=input_data.get("tool_input"),
)
return {}
Why V1 Skips Observability
V1 has no tools, so there are no tool calls to observe. The complexity budget for V1 is focused on getting the core briefing pipeline working correctly. Observability (Langfuse traces, hook-based logging) is planned for V2 when tools introduce more moving parts to monitor.
10. Summary
The SDK as the Right Level of Abstraction
The Claude Agent SDK occupies a sweet spot for single-provider applications like AI Doctor:
- More than the raw API: You do not write the agentic loop, tool dispatch, or permission checks.
- Less than a framework: No multi-provider abstraction, no chain system, no memory backends. Fewer moving parts = easier debugging.
- Close to the metal: The SDK's concepts (messages, tools, schemas) map directly to API concepts. Understanding the SDK means understanding the API.
What Doc 07 Covered vs What This Doc Covered
| Topic | Doc 07 | Doc 10 (this doc) |
|-------|--------|-------------------|
| Framework comparison (SDK vs LangChain vs raw API) | Detailed table + decision tree | Reference only |
| SDK architecture (subprocess model, message flow) | Not covered | Full treatment |
| query() internals and parameters | Brief code example | Parameter-by-parameter walkthrough |
| Structured output flow | Mentioned | End-to-end: Pydantic → JSON Schema → API → validate |
| Error handling | Not covered | Full exception hierarchy + wrapping pattern |
| Tool creation | Not covered | @tool decorator + MCP registration (V2) |
| Testing patterns | Not covered | Mock strategy + async generator helpers |
| Permission modes | Listed | When-to-use guide + defense-in-depth |
| Hooks | Not covered | PreToolUse/PostToolUse + Langfuse pattern |
Checklist: What You Should Be Able to Explain
After reading this document, you should be able to explain:
- [ ] Why the SDK spawns a CLI subprocess instead of calling the API directly
- [ ] What
query()returns and how to iterate over it - [ ] The four steps of structured output: Pydantic model → JSON Schema → API constraint → model_validate()
- [ ] The five error types and where each failure occurs in the subprocess chain
- [ ] Why you mock
query()(not the HTTP client) in tests - [ ] The difference between
bypassPermissionsandacceptEdits - [ ] Why AI Doctor V1 uses
max_turns=2withbypassPermissionsand no tools - [ ] How hooks intercept tool execution and why V1 does not use them
Previous: 09 — MCP & A2A Protocols Series Overview: 00 — Overview