AI Agent Frameworks Compared: 2026 Guide

DS
LDS Team
Let's Data Science
15 min

Eighteen months ago, building an AI agent meant writing a ReAct loop from scratch, wiring up tool calls by hand, and praying your state management held together past the demo. That era is over. By March 2026, at least six production-grade AI agent frameworks are competing for your codebase, each with a distinct philosophy. LangGraph reached 1.0 GA back in October 2025 and has since climbed to v1.0.9. CrewAI crossed 44,600 GitHub stars and shipped v1.10.1 with native MCP and A2A support. OpenAI released an Agents SDK (v0.10.2) that works with 100+ non-OpenAI models, while Anthropic shipped the Claude Agent SDK (v0.1.48) and Google's ADK hit v1.26.0.

Too many choices. Not enough honest comparison. This guide puts five major frameworks through the same test: an email triage agent that reads incoming mail, classifies it as urgent, normal, or spam, drafts responses for routine messages, and escalates urgent ones to a human. Same problem, different architectures.

Why Agent Frameworks Exist

Raw API calls work fine for simple single-tool agents. But once your agent needs any two of the following, a framework starts earning its keep:

  • Multi-step orchestration with branching logic
  • Persistent memory across sessions
  • Tool management across dozens of MCP servers or function definitions
  • Error recovery when an LLM call fails mid-workflow
  • Human-in-the-loop checkpoints for high-stakes decisions
  • Observability and tracing across agent execution

Frameworks abstract these concerns. The question is which abstraction model fits your problem. Building AI agents from first principles teaches you the mechanics, but production agents demand more scaffolding.

In Plain English: Think of agent frameworks like web frameworks (Django, Rails, Express). You could build a web app with raw sockets and HTTP parsing, but a framework handles routing, middleware, sessions, and error handling so you focus on business logic. Agent frameworks do the same thing for LLM orchestration: they handle state, tool dispatch, failure recovery, and multi-step coordination so you focus on what your agent actually does.

Framework decision tree for choosing the right agent frameworkFramework decision tree for choosing the right agent framework

LangGraph: Graphs for Complex Workflows

Version: 1.0.9 (1.0 GA since October 2025) | GitHub: 24.6K stars | PyPI: 38M+ monthly downloads

LangGraph models agents as directed graphs. Nodes are functions. Edges are transitions. State flows through the graph as a typed dictionary, and every node can read from and write to that state. This is the most flexible architecture in the comparison, and also the most verbose.

Architecture

The core abstraction is a StateGraph. You define a state schema (usually a TypedDict), add nodes as Python functions, connect them with edges (including conditional edges that branch based on state), and compile the graph into a runnable. Built-in checkpointing means every state transition persists automatically, so a crashed agent resumes exactly where it stopped. LangGraph 1.0 added durable state that survives server restarts, cross-thread memory for sharing context between sessions, and Command for dynamic edgeless flows where nodes decide the next step at runtime.

Email Triage in LangGraph

python
from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal

class EmailState(TypedDict):
    email: str
    category: Literal["urgent", "normal", "spam"] | None
    draft_response: str | None
    escalated: bool

def classify_email(state: EmailState) -> EmailState:
    # LLM call to classify the email
    category = llm.invoke(f"Classify this email: {state['email']}")
    return {"category": category}

def draft_response(state: EmailState) -> EmailState:
    response = llm.invoke(f"Draft a reply to: {state['email']}")
    return {"draft_response": response}

def escalate(state: EmailState) -> EmailState:
    return {"escalated": True}

def route_email(state: EmailState) -> str:
    if state["category"] == "urgent":
        return "escalate"
    elif state["category"] == "normal":
        return "draft_response"
    return END  # spam gets dropped

graph = StateGraph(EmailState)
graph.add_node("classify", classify_email)
graph.add_node("draft_response", draft_response)
graph.add_node("escalate", escalate)
graph.set_entry_point("classify")
graph.add_conditional_edges("classify", route_email)
graph.add_edge("draft_response", END)
graph.add_edge("escalate", END)

app = graph.compile(checkpointer=MemorySaver())

The graph structure is explicit. You can see every possible execution path before running anything. That visibility is LangGraph's biggest strength for production systems where you need to audit agent behavior.

Pro Tip: LangGraph's interrupt_before parameter lets you pause execution before any node, making human-in-the-loop trivial. Add interrupt_before=["escalate"] to the compile call and the agent halts, waits for human approval, then continues.

When LangGraph Fits

Complex, stateful workflows with many conditional branches. Financial compliance agents. Multi-step data pipelines with approval gates. Anything where you need deterministic control flow with LLM decision points. Companies like Uber, LinkedIn, and Klarna have run LangGraph agents in production for over a year.

When It Does Not

Simple single-agent tasks. The graph abstraction adds meaningful overhead for straightforward workflows. Expect a one to two week learning curve before your team is productive.

CrewAI: Role-Based Multi-Agent Teams

Version: 1.10.1 (March 2026) | GitHub: 44.6K stars | PyPI: 450M+ monthly workflows

CrewAI takes a fundamentally different approach. Instead of graphs, you define agents with roles, goals, and backstories, then organize them into crews that collaborate on tasks. The mental model is a team of specialists working together, not a flowchart.

Architecture

Three core concepts: Agents (with roles and tool access), Tasks (units of work assigned to agents), and Crews (the orchestration layer that manages execution order and delegation). CrewAI handles agent communication, task delegation, and result passing automatically. The Flows API adds event-driven orchestration for enterprise deployments. Native MCP support through crewai-tools-mcp lets agents declare MCP servers inline while the framework manages connection lifecycle, transport negotiation (Stdio, SSE, Streamable HTTPS), and tool discovery automatically. CrewAI also added A2A protocol support, making it the only major framework alongside Google ADK with both MCP and A2A built in.

Email Triage in CrewAI

python
from crewai import Agent, Task, Crew

classifier = Agent(
    role="Email Classifier",
    goal="Accurately categorize emails as urgent, normal, or spam",
    backstory="Senior executive assistant with 10 years of experience "
              "managing high-volume inboxes for C-suite executives.",
    llm="gpt-4.1",
)

responder = Agent(
    role="Email Responder",
    goal="Draft professional, contextually appropriate email responses",
    backstory="Communications specialist who writes concise, "
              "friendly replies that maintain professional tone.",
    llm="gpt-4.1",
)

classify_task = Task(
    description="Classify this email: {email}. Return: urgent, normal, or spam.",
    agent=classifier,
    expected_output="One word: urgent, normal, or spam",
)

respond_task = Task(
    description="If the classification is 'normal', draft a response. "
                "If 'urgent', output 'ESCALATE'. If 'spam', output 'DISCARD'.",
    agent=responder,
    expected_output="A draft response, ESCALATE, or DISCARD",
    context=[classify_task],
)

crew = Crew(
    agents=[classifier, responder],
    tasks=[classify_task, respond_task],
    verbose=True,
)

result = crew.kickoff(inputs={"email": email_content})

The code reads almost like a job description. That is the point. CrewAI optimizes for rapid prototyping and intuitive multi-agent coordination.

When CrewAI Fits

Team-based workflows where different agents have distinct expertise. Content pipelines (researcher, writer, editor). Customer support triage with specialized handlers. CrewAI gets you from idea to working prototype about 40% faster than LangGraph, according to benchmark comparisons.

When It Does Not

Workflows requiring fine-grained control over execution paths. If you need to specify exactly which node executes after which condition, CrewAI's higher-level abstractions can feel limiting.

OpenAI Agents SDK: Lightweight and Practical

Version: 0.10.2 (February 2026) | GitHub: 19K stars | PyPI: 10.3M monthly downloads

The OpenAI Agents SDK (formerly Swarm) strips agent building down to four primitives: Agents, Handoffs, Guardrails, and Tools. It is the least opinionated framework in this comparison. Despite the name, the SDK now supports 100+ LLMs through the Chat Completions API, not just OpenAI models.

Architecture

An Agent is an LLM with instructions, tools, and optional handoff targets. When an agent decides it cannot handle a request, it performs a handoff to another agent, transferring the conversation context. Handoff history is now packaged into a single assistant message instead of exposing raw turns, giving downstream agents a concise recap. Guardrails validate inputs and outputs. The entire SDK is about 2,000 lines of code, making it easy to read and modify. New in v0.10: WebSocket transport, sessions for maintaining working context, and Python 3.14 compatibility.

Email Triage in OpenAI Agents SDK

python
from agents import Agent, Runner, handoff, InputGuardrail

spam_filter = Agent(
    name="Spam Filter",
    instructions="You discard spam. Reply with 'SPAM_DETECTED' for spam emails.",
)

responder = Agent(
    name="Email Responder",
    instructions="Draft a professional response to normal emails.",
)

escalator = Agent(
    name="Urgent Handler",
    instructions="Flag urgent emails for human review. Output the urgency reason.",
)

triage_agent = Agent(
    name="Email Triage",
    instructions="Classify incoming emails. Hand off to the appropriate specialist.",
    handoffs=[
        handoff(spam_filter, "Email is spam"),
        handoff(responder, "Email needs a routine response"),
        handoff(escalator, "Email is urgent and needs human attention"),
    ],
)

result = Runner.run(triage_agent, input=email_content)

Under 30 lines. No graph definition, no role backstories, no state schema. The handoff pattern is elegant for delegation-style workflows.

Key Insight: The handoff model is conceptually close to how human teams work. A manager triages incoming work and routes it to specialists. If your workflow naturally maps to delegation, the Agents SDK will feel obvious.

When It Fits

Fast prototyping. Delegation-style workflows. Teams that want minimal framework overhead and direct control.

When It Does Not

Complex stateful workflows that need checkpointing and resumption. The SDK has no built-in persistence layer. If your agent workflow spans multiple sessions or needs to survive server restarts, you will build that yourself.

Architecture comparison across frameworksArchitecture comparison across frameworks

Claude Agent SDK: MCP-Native Agents

Version: 0.1.48 (March 2026) | Languages: Python, TypeScript | Status: Alpha

Anthropic renamed the Claude Code SDK to the Claude Agent SDK to reflect its broader scope beyond coding tasks. The framework gives Claude access to a computer, including terminal commands, file system operations, and custom tools, using the same engine that powers Claude Code.

Architecture

The defining feature is native MCP (Model Context Protocol) integration. Custom tools are implemented as in-process MCP servers that run directly within your Python application, eliminating the need for separate processes. Hooks provide lifecycle control: before_tool_call, after_tool_call, on_error, letting you inject logging, validation, or human approval at any point.

Email Triage Concept with Claude Agent SDK

python
from claude_agent_sdk import Agent, Tool, hook

@Tool
def classify_email(email_body: str) -> str:
    """Classify an email as urgent, normal, or spam."""
    pass

@Tool
def send_escalation(email_body: str, reason: str) -> str:
    """Escalate an urgent email to the on-call team via PagerDuty."""
    pass

@hook("before_tool_call")
def log_tool_usage(tool_name, args):
    print(f"Agent calling: {tool_name} with {args}")

agent = Agent(
    model="claude-opus-4-6",
    tools=[classify_email, send_escalation],
    instructions="You are an email triage assistant. Classify each email, "
                 "draft responses for normal emails, and escalate urgent ones.",
)

result = agent.run(f"Process this email: {email_content}")

The hooks system is uniquely powerful for production observability. Every tool call, every error, every state change can trigger custom logic without modifying the agent's core behavior.

When It Fits

Teams already invested in the Anthropic ecosystem. Workflows requiring deep MCP integration with multiple tool servers. Agents that need fine-grained lifecycle hooks for compliance, logging, or approval flows.

When It Does Not

Multi-model workflows. The SDK is Anthropic-only. If your architecture requires routing between Claude, GPT, and Gemini based on task type, you will need a different orchestration layer on top.

Google ADK and the Rest of the Field

Google Agent Development Kit (ADK)

Version: 1.26.0 (February 2026) | GitHub: 17K stars | Languages: Python, TypeScript, Go

Google's ADK is optimized for the Gemini ecosystem but model-agnostic by design. Standout features include native multimodal support (text, image, video, audio in agent workflows), built-in A2A Protocol integration for agent-to-agent communication, and workflow agents (Sequential, Parallel, Loop) for deterministic pipelines. The built-in dev UI at localhost:4200 makes debugging surprisingly pleasant.

Microsoft Agent Framework

Microsoft merged AutoGen and Semantic Kernel into the Microsoft Agent Framework, which reached Release Candidate on February 19, 2026. The framework brings graph-based workflows, A2A and MCP protocol support, streaming, checkpointing, and human-in-the-loop patterns. Available for Python and .NET, it is the natural choice for teams deep in the Azure ecosystem. GA is expected by end of March 2026.

Framework ecosystem comparisonFramework ecosystem comparison

The Comparison Table

This table reflects the state of each framework as of March 2026. Community sizes are approximate.

FeatureLangGraphCrewAIOpenAI Agents SDKClaude Agent SDKGoogle ADK
ArchitectureStateful graphsRole-based crewsHandoff chainsTools + hooksWorkflow agents
Latest Version1.0.91.10.10.10.20.1.481.26.0
GitHub Stars24.6K44.6K19K~8K17K
Model SupportAny LLMAny LLM100+ LLMsAnthropic onlyAny (Gemini optimized)
MCP SupportVia LangChain toolsFirst-class nativeBuilt-in integrationNative (in-process)Via tool adapters
Multi-AgentGraph compositionBuilt-in crewsHandoff delegationSingle agent + toolsHierarchical agents
PersistenceBuilt-in checkpointerCrew memorySessions (new)Session-basedCloud Run / Vertex
Human-in-the-LoopFirst-class APIHITL self-loopGuardrail hooksLifecycle hooksTool confirmation
A2A ProtocolNoYes (new)NoNoYes (native)
Learning CurveSteep (1-2 weeks)Moderate (2-3 days)Low (hours)Moderate (2-3 days)Moderate (3-5 days)
Best ForComplex stateful flowsTeam-based workflowsFast prototypingMCP-heavy workflowsMultimodal agents
Production MaturityHigh (1.0 GA)High (1.x GA)Medium (0.x)Alpha (0.x)High (1.x GA)

Key Insight: No single framework dominates every category. LangGraph leads on production maturity and persistence. CrewAI leads on community size and protocol breadth. OpenAI Agents SDK leads on simplicity. Claude Agent SDK leads on lifecycle control. Google ADK leads on multimodal and A2A support.

How to Choose Your Framework

The right framework depends on four factors. Answer these honestly and the choice usually becomes obvious.

Workflow complexity

For simple single-agent function calling workflows with one or two tools, skip the framework entirely. Raw API calls with structured outputs will serve you better. For moderate complexity (3-5 agents, conditional routing), CrewAI or OpenAI Agents SDK will get you running fast. For complex graphs with loops, parallel branches, and approval gates, LangGraph is the right tool.

Model provider lock-in

If your entire stack runs on Anthropic, the Claude Agent SDK gives you the tightest integration. All-in on Google Cloud? ADK is the natural fit. Need model flexibility? LangGraph and CrewAI both support any LLM without friction.

Multi-agent collaboration needs

CrewAI was built for multi-agent from day one. LangGraph supports it through graph composition, which is more flexible but more work. The OpenAI Agents SDK handles it through handoffs, which works well for delegation patterns but gets awkward for true parallel collaboration.

Production timeline

Need a demo by Friday? OpenAI Agents SDK or CrewAI. Building a system that will run for years? LangGraph's graph-based checkpointing and LangSmith observability stack are hard to beat.

Complexity vs flexibility tradeoff across frameworksComplexity vs flexibility tradeoff across frameworks

When to Use a Framework (and When Not To)

Use a framework when:

  • Your agent needs persistent state across sessions
  • You are orchestrating three or more agents or tool sets
  • Human-in-the-loop approval is a requirement
  • You need observability and tracing in production
  • Your team has more than one person building agents

Skip the framework when:

  • A single LLM call with function calling solves your problem
  • You are building a one-off script or experiment
  • The framework's abstraction does not match your workflow shape
  • You need absolute control over every API call and token
  • Your agent is simple enough that the framework adds more code than it removes

The worst outcome is adopting a framework that fights your architecture. I have seen teams spend weeks forcing a delegation-style problem into a graph framework, and other teams trying to build complex stateful workflows in a framework designed for simple handoffs. Match the abstraction to the problem.

Common Pitfall: Framework migration is painful. Choosing LangGraph for a prototype, then trying to move to CrewAI for production (or vice versa) means rewriting most of your agent logic. Pick the framework that fits your production needs, not just your prototyping speed. One practical hedge: build your tool integrations as MCP servers regardless of framework choice. The interoperability pays for itself when you add agents or switch frameworks later.

Conclusion

The AI agent framework market in March 2026 has settled into clear lanes. LangGraph owns complex, stateful orchestration with the strongest persistence and checkpointing story. CrewAI owns rapid multi-agent prototyping with the largest community and the broadest protocol support (MCP + A2A). OpenAI Agents SDK owns simplicity, getting teams from zero to working agent in hours. Claude Agent SDK owns MCP-native development with its in-process server model and lifecycle hooks. Google ADK owns multimodal agent workflows backed by the same infrastructure Google uses internally.

For most teams starting a new agent project today, I would recommend CrewAI if multi-agent collaboration is central, LangGraph if workflow complexity is the primary challenge, and OpenAI Agents SDK if you want the fastest path to something working. All five frameworks can build the email triage agent from this article. The differences only matter at scale, under production pressure, when you need the specific capability one framework provides and the others do not.

Whatever you choose, invest time understanding the underlying patterns: function calling and tool use, MCP for tool integration, and agent-to-agent communication via A2A. Frameworks come and go. The patterns endure.

Frequently Asked Interview Questions

Q: What distinguishes a graph-based agent framework from a role-based one, and when would you pick each?

Graph-based frameworks like LangGraph define agents as nodes in a directed graph with explicit edges controlling execution flow, making the system auditable and deterministic. Role-based frameworks like CrewAI define agents with personas and goals, letting the framework handle coordination. If you can draw your agent's control flow as a flowchart, LangGraph fits. If it's better described as "these roles collaborate toward a goal," CrewAI's abstraction works better.

Q: How would you implement human-in-the-loop approval in an agent workflow?

In LangGraph, use interrupt_before on the node requiring approval; the graph pauses, persists state via the checkpointer, and resumes after human input. In CrewAI, use the HITL self-loop functionality. In OpenAI Agents SDK, implement a guardrail that blocks execution pending external approval. The critical requirement across all frameworks is that state must persist across the interruption.

Q: When should you avoid using an agent framework entirely?

When your problem is solvable with a single LLM call plus one or two tool invocations. Frameworks add abstraction overhead, dependency weight, and learning curve. If the framework adds more code than it removes, skip it.

Q: Explain the handoff pattern and how it differs from graph-based orchestration.

A handoff occurs when one agent transfers control and conversation context to another agent. The triage agent classifies the task, then hands off to a specialist who receives a concise recap rather than raw message history. This differs from graph-based orchestration where the graph structure is predefined and state flows through typed dictionaries. Handoffs are more dynamic but less auditable; graphs are more explicit but require upfront design.

Q: What is MCP and why has it become the standard for agent tool integration?

MCP (Model Context Protocol) standardizes how agents discover and invoke tools using JSON-RPC 2.0 in a client-host-server architecture. Instead of each framework implementing its own tool interface, MCP provides a universal protocol that makes tools portable across frameworks. In March 2026, CrewAI offers the deepest integration with three transport mechanisms (Stdio, SSE, Streamable HTTPS), while the Claude Agent SDK runs MCP servers in-process for zero-latency tool calls.

Q: How would you handle failures in a multi-step agent workflow at production scale?

Implement checkpointing so the workflow resumes from the last successful step rather than restarting. Add retry logic with exponential backoff for transient LLM failures. Use guardrails to validate outputs at each step, catching hallucinated or malformed responses before they propagate downstream.

Q: You need an agent that processes documents using both Claude and GPT-4.1 for different subtasks. Which framework would you choose?

LangGraph or CrewAI, since both are model-agnostic with first-class multi-model support. LangGraph lets you assign different models to different graph nodes. CrewAI lets you assign different LLMs per agent via the llm parameter, mapping naturally to specialists with different capabilities. The Claude Agent SDK is Anthropic-only and would not work here.

Q: How do A2A and MCP protocols complement each other in multi-agent architectures?

MCP handles agent-to-tool connections: how an agent discovers, authenticates with, and invokes external tools. A2A handles agent-to-agent connections: how agents from different frameworks or organizations discover each other, negotiate capabilities, and collaborate. Build tool integrations as MCP servers; use A2A as the interoperability layer when agents span organizational boundaries.