LangChain: Architecture, LCEL, Agents, LangGraph, Retrieval, and Production Patterns
LangChain is no longer best understood as a grab bag of prompt helpers, legacy chains, and one-off agent wrappers. In the current Python ecosystem, LangChain is the high-level framework for building agentic applications on top of a shared runtime, while LangGraph is the lower-level orchestration layer for stateful workflows that need persistence, branching, interrupts, and precise control.
The old mental model of LangChain centered on LLMChain, prompt templates, memory objects, and a long catalog of agent types. That model still explains a large amount of older code on GitHub, but it does not describe the modern stack very well.
Today the center of gravity is different:
- langchain provides the high-level developer experience for models, tools, structured output, agents, middleware, retrieval composition, streaming, and runtime configuration.
- langgraph provides the durable runtime for stateful agent execution. It is the layer you use when an application has real workflow structure instead of a single request-response loop.
- The common abstractions live in langchain-core: messages, documents, prompts, tools, runnables, output parsers, callbacks, and other shared interfaces.
The practical consequence is simple. If an application is a straightforward tool-using agent, start with LangChain. If it needs long-lived state, resumability, branching control flow, human approval, or multi-agent coordination, drop to LangGraph without abandoning the same model, message, tool, and runnable abstractions.
The package split matters because most outdated tutorials assume everything lives under one import tree. Modern LangChain is intentionally modular.
| Package | Role | When to install it |
| langchain | High-level Python framework for agents, models, tools, middleware, structured output, streaming, and application composition. | Install for almost every new LangChain project. |
| langchain-core | Shared interfaces and primitives such as messages, documents, prompts, tools, and runnables. | Usually comes in as a dependency rather than a package you install directly. |
| langgraph | State graphs, checkpoints, persistence, interrupts, human-in-the-loop flows, and durable execution. | Install when the workflow has state beyond a single agent call. |
| langchain-openai, langchain-anthropic, and similar provider packages | Provider-specific chat models, embeddings, and integration code. | Install only the providers you actually use. |
| langchain-community | Community-maintained integrations such as loaders, vector stores, and third-party utilities. | Install when you need community loaders or storage integrations. |
| langchain-text-splitters | Text splitting utilities separated into their own package. | Install for ingestion and chunking pipelines. |
| langchain-classic | Compatibility package for many legacy chains, retrievers, and v0-style APIs. | Install only when migrating or maintaining older code. |
Install the framework, the orchestration runtime, and only the integrations you need. The old langchain[all] habit is obsolete.
|
1 2 3 4 5 6 7 |
pip install -U langchain langgraph langchain-openai # Common optional packages pip install -U langchain-community langchain-text-splitters faiss-cpu # Only when you are migrating old tutorials or old production code pip install -U langchain-classic |
Modern LangChain treats chat models as the default interface. A chat model receives a list of messages and returns an AI message that may contain plain text, tool calls, or provider-specific metadata. The framework normalizes enough of this structure that application code can stay stable while the provider changes.
The most important message kinds are:
- system for developer instructions and global behavior.
- human for user input.
- ai for model output.
- tool for tool execution results returned to the model.
Recent LangChain versions also standardize message content through typed content blocks. That matters for multimodal input, citations, reasoning traces exposed by some providers, and tool call metadata. A modern agent loop is therefore message-centric, not string-centric.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from langchain.chat_models import init_chat_model from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser prompt = ChatPromptTemplate.from_messages( [ ("system", "You write concise technical summaries."), ("human", "Summarize {topic} in three sentences."), ] ) model = init_chat_model("openai:gpt-4.1") chain = prompt | model | StrOutputParser() print(chain.invoke({"topic": "LangChain middleware"})) |
The Runnable interface is still one of the most important ideas in LangChain, even though modern marketing material talks more about agents. Models, prompts, retrievers, output parsers, and many custom components all implement the same operational shape. That gives you a common set of execution patterns:
- invoke and ainvoke for single synchronous or asynchronous calls.
- batch and abatch for parallel request execution.
- stream and event streaming APIs for incremental output.
- Pipe composition with | for sequential flows.
- Dictionary composition for fan-out and fan-in patterns.
This composition model is often called LCEL, the LangChain Expression Language. Older tutorials treated chains as distinct classes. Current LangChain treats most application logic as composition of runnables.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
from operator import itemgetter from langchain.chat_models import init_chat_model from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate prompt = ChatPromptTemplate.from_template( "Answer strictly from the supplied context.\n\n" "Context:\n{context}\n\n" "Question:\n{question}" ) model = init_chat_model("openai:gpt-4.1") simple_chain = ( { "context": itemgetter("context"), "question": itemgetter("question"), } | prompt | model | StrOutputParser() ) print( simple_chain.invoke( { "context": "LangGraph adds persistence, interrupts, and stateful graphs.", "question": "What does LangGraph add?", } ) ) |
Free-form text is convenient for demos and annoying in production. Modern LangChain therefore makes structured output a first-class workflow instead of an afterthought built from regexes and brittle parsers.
There are two common approaches:
- Call with_structured_output on a chat model when you want model output parsed directly into a schema.
- Pass response_format to create_agent when you want the final agent result in a validated structure.
LangChain will use provider-native structured output when a model supports it. Otherwise it can fall back to tool-calling-based strategies. That abstraction removes a large amount of provider-specific branching from application code.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
from pydantic import BaseModel, Field from langchain.chat_models import init_chat_model class ExtractedFact(BaseModel): subject: str = Field(description="Entity or concept being discussed") relation: str = Field(description="Relationship or claim") value: str = Field(description="Object or fact value") model = init_chat_model("openai:gpt-4.1") structured_model = model.with_structured_output(ExtractedFact) print( structured_model.invoke( "LangGraph powers the runtime beneath LangChain create_agent." ) ) |
Tools are still the bridge between the model and the external world, but the modern tool story is more standardized than the old agent-tool abstractions suggested. A tool is simply a function with a name, a description, and a schema the model can call.
In practice, tools fall into three categories:
- Pure computation tools such as math, formatting, or local business rules.
- Data access tools such as search, SQL, vector retrieval, and API lookups.
- Action tools such as ticket creation, email sending, code execution, or deployment operations.
Good tools are narrow. They do one thing, return stable output, validate input strictly, and hide messy implementation details from the model.
|
1 2 3 4 5 6 7 |
from langchain.tools import tool @tool def get_weather(city: str) -> str: """Return the current weather for a city.""" return f"{city}: 18 C, light rain" |
The modern entry point is create_agent. It replaces most of the old discussion around agent classes, planner types, and specialized enums such as AgentType.ZERO_SHOT_REACT_DESCRIPTION.
A current LangChain agent is built from four ingredients:
- A chat model.
- A set of tools.
- Optional middleware that intercepts and modifies execution.
- Optional response schema, state persistence, and runtime configuration.
The key design change is that the high-level agent API now sits on top of LangGraph. That means the friendly LangChain entry point can still participate in persistence, streaming, memory management, and other stateful runtime features.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
from pydantic import BaseModel, Field from langchain.agents import create_agent from langchain.tools import tool @tool def get_weather(city: str) -> str: """Return the current weather for a city.""" return f"{city}: 18 C, light rain" class WeatherReport(BaseModel): city: str = Field(description="Resolved city name") condition: str = Field(description="Weather condition") recommendation: str = Field(description="Practical advice for the user") agent = create_agent( model="openai:gpt-4.1", tools=[get_weather], response_format=WeatherReport, ) result = agent.invoke( { "messages": [ { "role": "user", "content": "Should I bring an umbrella in Seattle?", } ] } ) print(result["structured_response"]) |
Middleware is one of the biggest architectural improvements in LangChain v1. Instead of forcing every behavior into prompts or custom wrappers, middleware lets you intervene at clear points in the model and tool execution loop.
Typical middleware responsibilities include:
- Summarizing old conversation history when token usage grows.
- Enforcing human approval before selected tool calls.
- Switching models dynamically based on cost, latency, or risk.
- Adding retries, fallback behavior, or call limits.
- Injecting guardrails, context edits, or compliance checks.
This makes modern LangChain agents far easier to reason about than the old approach of stacking callbacks, prompt hacks, and custom chain subclasses.
LangChain is comfortable when the application still looks like an agent loop. LangGraph exists for the point where that abstraction stops being enough.
Use LangGraph when the workflow has explicit state transitions, durable checkpoints, interrupts, or multiple coordinated actors. Common examples include:
- A research agent that plans, searches, extracts, verifies, and writes through distinct stages.
- An approval flow where a human reviews a tool call before execution.
- A multi-agent system where one agent delegates to specialists and merges results.
- A long-running pipeline that must survive process restarts or resume after failure.
LangGraph models these systems as graphs over shared state. Nodes read and update the state. Edges control where execution goes next. Checkpointers persist thread-scoped state so a run can pause and resume instead of restarting from the beginning.
Short-term memory in the modern stack is not an isolated memory object attached to a chain. It is part of the graph state for a thread. That state normally includes message history, retrieved context, tool outputs, uploaded files, and any other values the workflow needs to carry forward.
This shift matters because memory is now tied to execution semantics. A step can read state, update state, and persist the updated snapshot through a checkpointer. That is a cleaner model than the old pattern of mutating a ConversationBufferMemory object on the side.
Long-term memory is different. It stores information across threads and sessions: user preferences, durable facts, profile data, recurring tasks, or learned application-specific knowledge. LangGraph treats this as stored data with namespaces rather than an ever-growing transcript.
Thread state and durable application storage solve different problems. Thread state holds the execution context for one conversation. Durable storage holds facts and preferences that should survive across conversations. Mixing the two is one of the fastest ways to build a confused agent.
Retrieval remains the standard way to give a model access to external knowledge without fine-tuning. The core pipeline has not changed, but the framing is clearer than in older LangChain material:
- Load documents from files, web pages, APIs, or SaaS systems.
- Split them into chunks that preserve semantic coherence.
- Embed the chunks into vectors.
- Store the vectors in a vector database or local vector index.
- Expose retrieval as a retriever that returns relevant documents for a query.
- Compose that retriever into a runnable chain or graph.
A vector store is storage plus similarity search. A retriever is the application-facing abstraction. Every vector store can usually be wrapped as a retriever, but not every retriever needs to be backed by a vector store. That distinction is worth keeping straight.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
from langchain_community.document_loaders import WebBaseLoader from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter loader = WebBaseLoader( web_paths=("https://docs.langchain.com/oss/python/langchain/overview",) ) docs = loader.load() splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, ) chunks = splitter.split_documents(docs) vectorstore = FAISS.from_documents(chunks, OpenAIEmbeddings()) retriever = vectorstore.as_retriever( search_type="mmr", search_kwargs={"k": 4}, ) print(retriever.invoke("What does LangGraph add beyond create_agent?")) |
The cleanest modern RAG pattern is to treat retrieval as just another runnable. That keeps the application modular and avoids a large amount of v0-style chain scaffolding.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
from operator import itemgetter from langchain.chat_models import init_chat_model from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate prompt = ChatPromptTemplate.from_template( "Answer from the supplied context only.\n\n" "Context:\n{context}\n\n" "Question:\n{question}" ) rag_chain = ( { "context": itemgetter("question") | retriever, "question": itemgetter("question"), } | prompt | init_chat_model("openai:gpt-4.1") | StrOutputParser() ) print(rag_chain.invoke({"question": "What does LangGraph add?"})) |
Production retrieval rarely stops at plain top-k similarity search. Useful advanced patterns include:
- Hybrid retrieval that mixes lexical and vector search.
- Maximum marginal relevance to reduce redundant chunks.
- Reranking with a second model after an initial broad recall step.
- Context compression to strip irrelevant text from otherwise relevant documents.
- Parent-child or hierarchical retrieval when the indexed chunk size is smaller than the answering context you want to show.
- History-aware retrieval when the user asks follow-up questions that depend on prior turns.
Many of the v0 convenience classes for retrieval still exist, but several live in langchain-classic. The durable design principle in v1 is to keep retrieval as a clean component inside a runnable pipeline or a LangGraph workflow instead of treating RAG as a monolithic chain type.
Modern LangChain is built for interactive systems, so streaming is a first-class feature rather than a callback hack. There are three distinct things you may want to stream:
- Model tokens for chat-like latency.
- Agent progress events such as tool calls and intermediate reasoning steps.
- Custom application updates emitted during a workflow.
The same runtime also supports asynchronous execution and request batching. That matters when an application fans out retrieval calls, runs multiple model invocations in parallel, or serves many concurrent users.
Observability is the other half of production readiness. Without traces, state snapshots, tool call inspection, and prompt/version tracking, agent debugging becomes guesswork. LangSmith is the standard companion product for tracing and evaluating LangChain and LangGraph applications.
Model Context Protocol (MCP) is now part of the LangChain story. MCP standardizes how tools and resources are exposed by external servers, which makes it easier to connect agents to editor state, local resources, internal APIs, and hosted services through a common protocol instead of a one-off adapter for each target.
In practice, MCP does not replace ordinary LangChain tools. It expands the set of systems that can be surfaced as tools and context sources. When an environment already exposes an MCP server, LangChain can consume it instead of reimplementing the integration manually.
Most real failures in LangChain systems are not caused by the model wrapper. They come from weak application boundaries. A production-ready design usually follows a few rules:
- Keep tools narrow and deterministic where possible.
- Separate thread state from durable memory.
- Prefer structured output over free-form parsing.
- Treat retrieval quality as an indexing and ranking problem, not just a prompt problem.
- Use middleware for control-plane behavior instead of burying policies inside prompts.
- Use LangGraph when workflow state and failure recovery matter.
- Trace everything before calling the system unreliable.
The common anti-pattern is to overfit a complex workflow into a single prompt and a pile of tools. That works for a demo. It collapses under real data, real users, and real failure modes.
Much of the internet still teaches a LangChain that no longer represents the preferred API surface. The table below maps the most common legacy concepts to current replacements.
| Legacy pattern | Current replacement | Reason |
| LLMChain | Runnable composition such as prompt | model | parser | Simpler composition, better interoperability, fewer bespoke chain classes. |
| initialize_agent(..., agent=AgentType...) | create_agent(...) | The new API is simpler and sits on top of LangGraph. |
| ConversationBufferMemory and similar memory classes | Thread state, checkpointers, summarization middleware, and long-term stores | Memory is now modeled as execution state plus durable storage. |
| Provider imports under langchain.llms or old chat-model namespaces | Provider packages such as langchain-openai or unified initialization through init_chat_model | Cleaner packaging and fewer unnecessary dependencies. |
| Monolithic chain classes for QA, summarization, or conversational retrieval | Composable runnables or LangGraph workflows | Current APIs make dataflow more explicit and easier to customize. |
| Old retriever helpers under langchain.retrievers | Current retriever interfaces plus langchain-classic for legacy helpers | The ecosystem moved several v0 components into a compatibility package. |
For a new project, the shortest correct learning path is:
- Learn chat models, messages, prompts, and structured output.
- Learn tools and the create_agent API.
- Learn runnable composition for non-agent flows and RAG pipelines.
- Learn LangGraph when the system needs checkpoints, interrupts, or multi-step stateful control flow.
- Learn LangSmith before the project reaches production.
That sequence matches how the current framework is actually designed. It also avoids the trap of spending days on APIs that only exist because old tutorials have a long afterlife.
Leave a Reply