Agentic Engineering

Your engineers are writing code line by line while AI agents can build features in parallel, review their own PRs, and monitor production autonomously. Teams running parallel agent sessions in git worktrees are reporting 3-5x throughput on parallelizable work — and the tooling is improving every month.

I install SDLC 2.0 in your engineering team. Not a slide deck about "AI transformation" — actual infrastructure, workflows, and muscle memory that makes your team measurably faster. From requirements to production monitoring, every phase gets an agent-native upgrade.

  SPECIFY ──→ PLAN ──→ IMPLEMENT ──→ REVIEW ──→ TEST ──→ DEPLOY ──→ MONITOR
     │          │          │            │         │         │          │
  CLAUDE.md  Subagents  Worktrees   Vercel    Agent     Rolling   Autonomous
  Spec Kit   Skills     Codex       Agent     Teams     Releases  Agents
             MCP        Sandbox     Hooks     Sandbox   Gates     AIOps

The methodology

Every engagement follows the same pipeline. I audit your current workflow, install agent-native tooling at each SDLC phase, train your team to operate it, and measure the delta. The engagement runs 4-8 weeks depending on team size and stack complexity.

01Requirements & Specification

Claude CodePlan ModeCLAUDE.mdGitHub Spec Kit

Specs replace tickets. A high-level description of what you're building and why gets expanded into a full specification — user journeys, acceptance criteria, edge cases, technical constraints. The agent explores your codebase in Plan Mode (read-only, no writes) before proposing architecture. CLAUDE.md files establish hierarchical project memory: coding standards, architecture decisions, and three-tier agent boundaries.

# CLAUDE.md — three-tier agent boundaries

## Always (agent acts freely)
- Run tests, linting, type checks
- Read any file in the repository
- Create branches and commits

## Ask First (requires human approval)
- Modify database schemas
- Change authentication logic
- Delete files or directories

## Never (hard stops)
- Commit .env files or secrets
- Push directly to main
- Modify CI/CD pipeline configs

Every spec covers six areas: executable commands with flags, testing strategy with framework and coverage targets, project structure with explicit paths, code style via real examples (not prose descriptions), git workflow with branch naming and PR conventions, and agent boundaries. A well-structured spec eliminates the ambiguity that causes agents to hallucinate or drift.

02Architecture & Planning

SubagentsAgent TeamsSkillsMCP Servers

The spec decomposes into a technical blueprint via specialized subagents. Each subagent operates with isolated context and a restricted tool set — one explores the existing codebase for reusable abstractions, another audits the security surface, a third evaluates infrastructure requirements. They share a parent context but cannot interfere with each other's state or files.

# .claude/skills/react-patterns/SKILL.md
---
name: React Patterns
description: Project-specific React conventions and component patterns
globs: ["src/components/**/*.tsx", "app/**/*.tsx"]
---

## Component Structure
- Use function components with named exports
- Colocate types in the same file
- Use server components by default, add 'use client' only for interactivity

## State Management
- Use React 19 use() for data fetching in server components
- Prefer URL state (searchParams) over client state for filters

Skills activate contextually based on file globs. Editing a React component triggers the React skill, which injects current API documentation and project-specific patterns. Skills are stored in .claude/skills/ with SKILL.md descriptors — they're systematic knowledge injection, not prompt engineering.

MCP (Model Context Protocol) servers connect agents to live infrastructure — databases, APIs, monitoring dashboards, project management tools. The agent queries your actual schema, reads real metrics, and references open issues directly. No guessing, no stale context.

03Parallel Implementation

Git WorktreesClaude CodeCodexVercel Sandbox

This is where throughput multiplies. Multiple agents work simultaneously on independent tasks, each in its own git worktree — an isolated checkout of the repository with its own branch. The main codebase stays untouched until changes pass review and merge.

# Spawn 3 parallel agents on independent tasks
$ claude --worktree "Implement POST /api/invoices endpoint per spec §3.2"
  → .claude/worktrees/api-invoices/ (branch: agent/api-invoices)

$ claude --worktree "Build InvoiceTable component with sorting and filters"
  → .claude/worktrees/invoice-table/ (branch: agent/invoice-table)

$ claude --worktree "Add integration tests for invoice lifecycle"
  → .claude/worktrees/invoice-tests/ (branch: agent/invoice-tests)

# Each agent works independently — isolated filesystem, isolated branch
# PRs created automatically on completion for human review

Anthropic validated this at scale: 16 Claude Opus agents working in parallel built a 100,000-line C compiler in Rust over ~2,000 sessions, passing 99% of GCC torture tests. Agents self-organized by claiming tasks via files in a shared current_tasks/ directory, with git synchronization preventing duplicate work. incident.io went from zero to 4-5 parallel agents daily within four months, reporting an 18% build performance improvement from a single worktree session that cost $8 in API credits.

# Sandboxed code execution — Firecracker microVM, deny-all network
import { Sandbox } from "@vercel/sandbox";

const sandbox = await Sandbox.create({
  runtime: "python3.13",
  timeout: 30_000,
  networkPolicy: "deny-all",    // zero outbound access
  source: { type: "snapshot", snapshotId: SNAPSHOT_ID },
});

await sandbox.writeFiles([{ path: "model.py", content: code }]);
const result = await sandbox.runCommand("python3", ["model.py"]);
console.log(await result.stdout());  // agent sees output, iterates
await sandbox.stop();                // VM destroyed, nothing persists

For code that requires runtime validation — algorithm correctness, data pipeline outputs, model inference — sandboxed execution runs the code in ephemeral Firecracker microVMs with deny-all network policy. The agent writes code, executes it in an isolated VM, inspects stdout/stderr, and iterates. Zero blast radius.

Cloud agents like Codex handle long-running tasks asynchronously. Each task runs in its own isolated sandbox, preloaded with the repository, internet disabled. Assign a migration, a refactor, or a test coverage sprint — it runs tests iteratively until passing and delivers a PR. You review the diff in the morning.

04AI Code Review

Vercel AgentCodeRabbitHooksConformance

Every PR gets reviewed by AI before a human sees it. Vercel Agent analyzes the full repository context (not just the diff), generates fix patches, and validates them in a secure sandbox with real builds, tests, and linters. It only surfaces suggestions that actually pass CI. CodeRabbit performs multi-layered analysis — AST evaluation, SAST scanning, and generative AI review across 40+ integrated linters and security scanners.

// .claude/settings.json — hooks enforce quality gates structurally

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Write|Edit",
      "command": "eslint --fix $FILE && tsc --noEmit",
      "description": "Lint and type-check before every file write"
    }],
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "command": "pnpm test --related $FILE",
      "description": "Run related tests after every file change"
    }]
  }
}

Hooks aren't optional review comments — they're hard gates that execute on every change, from every agent and every human, and block the action if they fail. PreToolUse hooks run formatting and linting before the agent writes (cheaper than burning LLM tokens on style issues). PostToolUse hooks validate that every modified file still passes type checks and related tests.

05Testing & Validation

Agent TeamsSandboxSpec ConformanceBrowser Automation

Agent teams handle testing in parallel with strict isolation. A test-writer agent generates unit and integration tests from the spec. A browser-automation agent verifies UI flows end-to-end via headless Chrome. A security-audit subagent scans for OWASP Top 10 vulnerabilities. Each agent runs with restricted tool access — the test agent can read source but not modify it, the security agent can read but not push to git.

# Agent team — parallel testing with isolated permissions

Agent: test-writer
  Tools: Read, Grep, Glob, Write (tests/ only), Bash (pytest/jest only)
  Prompt: "Generate tests for spec §3.2. Do not modify source files."

Agent: security-audit
  Tools: Read, Grep, Glob, WebFetch (OWASP only)
  Prompt: "Scan for injection, XSS, auth bypass. Report only, no edits."

Agent: e2e-validator
  Tools: Read, BrowserAutomation, Screenshot
  Prompt: "Verify all user journeys from spec §2. Capture failures."

Sandboxed execution runs tests in ephemeral environments that mirror production — same runtime, same dependencies, same OS. Spec conformance checks validate that the implementation matches the original specification, catching drift between intent and reality before it reaches production.

06Deployment & Rollout

VercelRolling ReleasesPreview DeploymentsGated Pipelines

Deployment is the phase where agents execute but humans own the decision. Preview deployments generate a full production-equivalent URL for every PR. Rolling releases route a configurable percentage of traffic to the new version, monitor error rates and performance metrics, and auto-promote or auto-rollback based on real signals.

# Rolling release — canary → promote/rollback based on signals

deploy:
  strategy: rolling
  canary:
    percentage: 1%          # start with 1% of traffic
    duration: 10m           # observe for 10 minutes
    promote_if:
      error_rate: < baseline * 1.05
      p95_latency: < baseline * 1.10
    rollback_if:
      error_rate: > baseline * 1.20
      5xx_count: > 0
  promotion:
    steps: [1%, 5%, 25%, 100%]
    observation_period: 5m

Gated pipelines encode operational knowledge as executable rules. The scarce resource is no longer writing deployment scripts — it's the judgment of what is safe to ship, and that judgment gets codified into automated gates.

07Monitoring & Autonomous Agents

AWS DevOps AgentAIOpsSelf-HealingScheduled Agents

The final phase is agents that run continuously without human prompting. AWS DevOps Agent monitors applications 24/7, investigates anomalies by correlating telemetry with code changes and deployment history, and resolves incidents autonomously. AWS reports preview customers seeing 75% lower MTTR and 94% root cause accuracy — this class of "frontier agent" operates independently, scales to concurrent incidents, and runs persistently.

# Scheduled agents — cron-triggered, sandboxed, PR-based output

0 6 * * 1    claude "Audit dependencies for CVEs. Create PR if patches found."
0 8 * * *    claude "Run performance regression suite. Page oncall if p95 > 200ms."
0 0 1 * *    claude "Clean up stale branches older than 30 days."
0 9 * * 5    claude "Generate weekly engineering metrics report from Linear + GitHub."

Scheduled agents handle recurring maintenance on cron: CVE patching, performance regression checks, stale branch cleanup, metrics reporting. They run in isolated sandboxes, create PRs with their findings, and page a human only when something requires judgment. The shift is from "automated" (scripts that follow deterministic rules) to "autonomous" (agents that investigate, correlate, and decide).

The stack

coding agents Claude Code, Codex, Cursor

orchestration CLAUDE.md, Skills, Hooks, Subagents, Agent Teams

parallelism Git Worktrees, Vercel Sandbox, Codex Cloud

integration MCP Servers, Plugins, GitHub Actions

code review Vercel Agent, CodeRabbit, Hooks Conformance

deployment Vercel, Rolling Releases, Gated Pipelines

monitoring AWS DevOps Agent, AIOps, Scheduled Agents

Engagement structure

wk 1Audit — Map current SDLC end-to-end, identify bottlenecks, quantify where agent-native tooling has the highest ROI.

wk 2-3Foundation — Install CLAUDE.md hierarchies, skills, hooks, spec templates, and agent boundary policies.

wk 4-6Execution — Run the first agent-native sprint alongside your team on a real feature. Parallel worktrees, AI review pipelines, sandboxed validation.

wk 7-8Handover — Train the team, document the playbook, ensure the system runs independently.

The deliverable is a repeatable system, not a dependency on a consultant. The CLAUDE.md constitutions, skills, hooks, and pipeline configs live in your repository. Your team operates the playbook after I leave.

Why me

I don't just teach this — I ship with it daily. This portfolio site was built with parallel Claude Code agents across git worktrees. The AI chat executes live code in Vercel Sandbox microVMs with deny-all network policy. I've been building production AI for 14 years, co-founded three startups, and I'm actively running agent teams, multi-agent orchestration, and autonomous monitoring in production engagements.

Most "AI transformation" consultants have read the blog posts. I've pushed the commits.

Get in touch

Email me at raman.shrivastava.7@gmail.com. Tell me where your team is and where you want to be.