Introducing Managed Deep Agents | Interrupt 26

60 slides extracted.

Slide 1 — 0:08 (watch)

Hello, I'm Sydney, an open source engineer at LangChain.

Slide 2 — 0:14 (watch)

Hi, I'm Victor, a product manager, and today we'll discuss managed deep agents.

Slide 3 — 0:30 (watch)

Before we discuss managed deep agents, Sydney will explain deep agents as a harness. First, let's define an agent. An agent is a simple model and a tool-calling loop that iteratively calls tools until it completes a task and returns a final result.

Slide 4 — 0:52 (watch)

A harness can be defined as the combination of a model and its surrounding components. It connects the model to the real world and facilitates task completion. The harness includes skills, memory, the base system prompt, tools, sub-agents, and any additional context.

Slide 5 — 1:16 (watch)

The job of a harness is to provide the model with the right context at the right time for a given task. A model's power is directly tied to the context it receives, and the harness serves to bridge this gap.

Slide 6 — 1:30 (watch)

A harness is essential because agents have many responsibilities. They must operate in an environment where they can take actions.

Slide 7 — 1:52 (watch)

This action-taking provides agents with agency, making them useful. They must connect to your data to ensure their actions are relevant to your use case. Agents need to manage growing context over long runs to avoid context overflow. They should be able to parallelize tasks to complete complex operations efficiently. Additionally, they need to connect with humans in the loop for sensitive workflows.

Slide 8 — 2:12 (watch)

Finally, they should improve over time to remain relevant and useful.

Slide 9 — 2:20 (watch)

Deep agents are customizable agent harnesses specifically designed for complex real-world tasks.

Slide 10 — 2:30 (watch)

First, I will cover the four main capabilities of the deep agents harness, followed by an in-depth exploration of each one.

Slide 11 — 2:46 (watch)

First, we have the execution environment, which serves as the backbone of a deep agent. It begins with the file system, and you can optionally enhance this with a sandbox or a code interpreter. Next, we have context management, which I believe is the most important capability.

Slide 12 — 3:14 (watch)

Deep agents come equipped with various utilities to assist with context management. These include skill support, built-in short- and long-term memory, summarization capabilities, context offloading, and prompt caching. Another key capability is delegation. As agents operate over extended periods and handle complex workflows, they must plan and organize tasks effectively, utilizing sub-agents to delegate work.

Slide 13 — 3:38 (watch)

Finally, we incorporate steering into the deep agents harness, providing first-class human-in-the-loop support.

Slide 14 — 3:44 (watch)

Now, let's take a deep dive.

Slide 15 — 4:10 (watch)

Starting with the execution environment, which is the backbone of a deep agent, this capability powers all other functions. An agent utilizes a file system to read and write scratch files while tackling tasks, load and store persistent memories in the hot path, invoke skills relevant to specific tasks, and more. Agents excel at using file systems because they are trained in environments that emphasize file system usage and are exposed to extensive code. This is why providing an agent with a sandbox or its lighter-weight counterpart, the code interpreter, is highly effective.

Slide 16 — 4:40 (watch)

Providing an agent with code execution tools creates a secure environment for writing and running code. This capability enhances the agent's ability for creative problem-solving and enables dynamic runtime behavior.

Slide 17 — 4:50 (watch)

The second capability is context management.

Slide 18 — 5:10 (watch)

Deep agents come with built-in summarization and context offloading. As shown in the graph, deep agents periodically evict large messages—such as human messages, tool results, and tool calls—to the file system. This prevents context from building up too quickly in the window. Summarization is triggered less frequently, occurring when the history approaches the model's context limit. These features aim to avoid context overflow, a common issue for long-running or high-context agents.

Slide 19 — 5:34 (watch)

Deep agents also include built-in memory support. I would argue that memory is the most important type of context because it changes from run to run, allowing your agent to improve over time.

Slide 20 — 6:16 (watch)

It also includes provider-agnostic prompt caching, which is crucial for long-running, high-context agents that need to operate cost-effectively. Additionally, deep agents come with built-in skills support. Skills are integral to the context management system through a mechanism called progressive disclosure. Deep agents load minimal information about available skills into the system prompt initially, allowing the agent to dynamically access full skill resources and invoke those skills and their scripts as needed for specific tasks.

Slide 21 — 6:36 (watch)

This umbrella of context management addresses the need for a framework that provides the model with the appropriate context at the right time for each specific task.

Slide 22 — 6:48 (watch)

The third capability of the deep agents harness is delegation.

Slide 23 — 7:04 (watch)

The deep agents harness includes a planning tool that enables the model to organize work for complex tasks. It also supports sub-agents out of the box, which can be either general-purpose or specialized. For instance, when building a coding agent, you might want to attach specialized sub-agents for architecture design, code review, security review, and test writing and execution.

Slide 24 — 7:20 (watch)

Sub-agents are important for several reasons.

Slide 25 — 7:30 (watch)

First, sub-agents operate with isolated context, which aids in overall context management.

Slide 26 — 7:42 (watch)

When the main agent invokes a sub-agent, it begins with a fresh context that is relevant only to its specific task. The sub-agent then returns a streamlined final result to the main agent, ensuring that the main context window remains unpolluted.

Slide 27 — 7:56 (watch)

Secondly, sub-agents can be used to parallelize work, allowing your agent to run tasks end-to-end more efficiently through this parallelization.

Slide 28 — 8:00 (watch)

Finally, sub-agents can utilize any model and any provider, allowing you to align model capabilities with task complexity.

Slide 29 — 8:34 (watch)

The fourth and final capability of the deep agents harness is support for steering through first-class human-in-the-loop primitives. Human-in-the-loop is essential for two reasons. First, it allows for real-time user feedback on sensitive actions, or tool calls. Second, it enables real-time feedback from the user when needed to unblock the model. There are four common decision patterns we incorporate into deep agents. The first is an approval flow, such as approving an email before it is sent. The second is an edit, like editing a tweet before publication. The third is a reject decision, which involves rejecting a proposed financial transaction. The fourth is the respond pattern, where the agent interrupts to ask the user a question to facilitate its future progress.

Slide 30 — 9:06 (watch)

We have thoroughly explored the capabilities of the deep agents harness. Now, let's discuss the reasons for using deep agents.

Slide 31 — 9:12 (watch)

Deep agents are provider agnostic.

Slide 32 — 9:30 (watch)

You can use any provider or model and swap them at any time, even mixing and matching. Your main agent can utilize a different model than your sub-agents. Major providers include Anthropic, OpenAI, and Google. You can also use local models with Olama, or opt for increasingly performant and cost-effective open-source models, such as those from Fireworks, NVIDIA, and OpenRouter.

Slide 33 — 9:48 (watch)

Deep agents are highly customizable. Here’s a quick recap of the core agent loop and an overview of the deep agents loop.

Slide 34 — 10:02 (watch)

Deep agents provide a set of hooks around the core agent loop. We refer to this system as middleware, which allows you to incorporate any custom logic into your agent.

Slide 35 — 10:18 (watch)

This middleware can accommodate bespoke business logic, deterministic code, policy enforcement such as PII redaction, or dynamic agent control. For instance, it allows for changing the model and tools available at runtime based on the current task.

Slide 36 — 10:38 (watch)

Even with a capable harness, deploying to production is challenging. Your agent must operate for extended periods, recover from unexpected failures, manage human-in-the-loop interactions and unpredictable behavior, and support bursty traffic. All of this must be done while maintaining a secure posture and adhering to ever-changing interoperability standards. Now, I will hand it off to Vic, who will explain how we simplify this process. Thank you, Sydney.

Slide 37 — 11:00 (watch)

We just discussed what a deep agent is. Now, let's talk about what it takes to deploy one of these deep agents into production.

Slide 38 — 11:12 (watch)

I recognize several familiar faces in the audience who have successfully deployed these agents to customers in production. Based on our firsthand experience, today we will discuss the introduction of managed deep agents in private beta, aimed at simplifying this process.

Slide 39 — 11:30 (watch)

There are four core pillars to managed deep agents. The first is the harness, which Sydney just explained. The second is the runtime, which enables the use of this agent in production. The third pillar is integration with Context Hub. Lastly, we have the ability to execute safe code in a sandbox.

Slide 40 — 11:38 (watch)

Let's first discuss the runtime.

Slide 41 — 12:02 (watch)

Managed deep agents are built on top of Langsmith deployment, providing essential primitives for real-time interaction and scalability required by agents in production. This includes endpoints for creating, updating, and invoking agents as needed. We also have a purpose-built task queue and horizontal scaling to manage bursty request loads that agents may encounter. For example, if a support agent is overwhelmed when systems go down, it is crucial to handle that traffic effectively. Additionally, SDKs enable the use of these agents across various platforms, with integrations for CopilotKit, Assistant UI for Gen UI, and more.

Slide 42 — 12:26 (watch)

The second core pillar of this production runtime is durable execution.

Slide 43 — 12:44 (watch)

Durable execution may seem unexciting, but it is highly reliable in production. By running deep agents on the Langraph runtime, we can checkpoint each step your agent takes. These checkpoints are stored in durable storage, allowing us to resume and restart from any point. If your agent fails at step 49 out of 50, you can pick up from step 49 without restarting the entire process. Additionally, we can replay and fork the agent from any point in its state.

Slide 44 — 13:02 (watch)

This enables advanced user workflows, such as forking a conversation from a specific point.

Slide 45 — 13:08 (watch)

It also enables human approval through a human-in-the-loop approach. Since everything is checkpointed in the database, we can await human input indefinitely.

Slide 46 — 13:42 (watch)

This enables ambient agent use cases such as Langsmith Engine, which will be discussed in the next session. Security and authentication are crucial for production agents, requiring multiple layers of authentication to support these use cases. The first layer is inbound from your application, ensuring that users are authenticated and authorized to use the agent. The second layer is outbound from within the agent to your external services, where reliable authentication and correct permissions must be assumed at runtime. The third layer involves managing who can create, update, and manage these agents. This includes internal AI engineers or a CTO who may need to make quick changes, necessitating role-based access control (RBAC) and attribute-based access control (ABAC) to define permissions for these actions.

Slide 47 — 14:08 (watch)

Agent interoperability is increasingly important as you aim to utilize your agents across various use cases. A key feature of Langsmith deployment that we are incorporating into Managed Deep Agents is the capability to access your agent through a remote graph.

Slide 48 — 14:24 (watch)

You can call your agent built on Managed Deep Agents in a custom LandGraph application deployed on Langsmith with just one line of code. Additionally, we support the A2A protocol.

Slide 49 — 14:38 (watch)

Many agents currently utilize the A2A protocol for agent-to-agent communication, and we support this functionality out of the box in both our standard Langsmith deployment and Managed Deep Agents. Additionally, we enable you to bring this agent to the environments where you actually perform work.

Slide 50 — 14:46 (watch)

We build and deploy many agents internally on the Langsmith platform, and we utilize them in our work. Whether it's Deep Agents code or Cloud Desktop, it's essential to bring these agents to the environments where they are needed.

Slide 51 — 14:54 (watch)

We considered various production use cases so you don't have to.

Slide 52 — 15:00 (watch)

Features such as double texting and canceling a run mid-flight require significant engineering effort, often taking weeks or even months to develop. We provide all of these functionalities out of the box for you.

Slide 53 — 15:10 (watch)

The third core pillar is the context hub integration.

Slide 54 — 15:26 (watch)

The Context Hub is integrated into Managed Deep Agents, allowing us to version and save all the files that your agent operates on. This includes the agent metadata and the skills mentioned by Harrison during the keynote, which are increasingly popular, as well as the memories your agent retains about users.

Slide 55 — 15:38 (watch)

All of these are saved and versioned within Context Hub, allowing your team to manage various levels of promotion. You can begin in staging and then move to production for different skills. This approach enables you to democratize these skills across different agents.

Slide 56 — 16:02 (watch)

The next core feature of Context Hub in the Managed Deep Agents primitive is our integration with Langsmith Engine. You will hear more about Langsmith Engine in the next session. Essentially, it will analyze real production usage of these deep agents to make quality improvements and adjustments to your prompts, systems, and skills. This process will lead to better behavior over time as the loop and engine continue to evolve.

Slide 57 — 16:14 (watch)

The fourth key component of Managed Deep Agents is sandboxes.

Slide 58 — 16:26 (watch)

As Sydney mentioned, nearly every agent is evolving into a coding agent. Even in research use cases, agents may need to analyze quick statistics and incorporate them into reports. Enabling your agents to perform these tasks in production can lead to more creative outcomes.

Slide 59 — 16:50 (watch)

We are launching Langsmith Sandboxes, which will be integrated directly with Managed Deep Agents. This Langsmith Sandbox primitive includes several core features. The first is an authentication proxy that securely injects credentials at runtime, ensuring that none of your important environment variables are exposed to the agent or the sandbox itself. The second feature is the ability to snapshot and restore, allowing your agent to maintain the correct execution environment. We will have a session for a deeper dive tomorrow with Mikhail, but these sandboxes are highly effective for most agent use cases.

Slide 60 — 17:12 (watch)

This encompasses everything you need to take an idea or a working deep agent into production. That's why we're launching Managed Deep Agents, starting the private beta today. We encourage you to join the waitlist, and thank you for your time.

Slide 1 — 0:08 (watch)#

Slide 2 — 0:14 (watch)#

Slide 3 — 0:30 (watch)#

Slide 4 — 0:52 (watch)#

Slide 5 — 1:16 (watch)#

Slide 6 — 1:30 (watch)#

Slide 7 — 1:52 (watch)#

Slide 8 — 2:12 (watch)#

Slide 9 — 2:20 (watch)#

Slide 10 — 2:30 (watch)#

Slide 11 — 2:46 (watch)#

Slide 12 — 3:14 (watch)#

Slide 13 — 3:38 (watch)#

Slide 14 — 3:44 (watch)#

Slide 15 — 4:10 (watch)#

Slide 16 — 4:40 (watch)#

Slide 17 — 4:50 (watch)#

Slide 18 — 5:10 (watch)#

Slide 19 — 5:34 (watch)#

Slide 20 — 6:16 (watch)#

Slide 21 — 6:36 (watch)#

Slide 22 — 6:48 (watch)#

Slide 23 — 7:04 (watch)#

Slide 24 — 7:20 (watch)#

Slide 25 — 7:30 (watch)#

Slide 26 — 7:42 (watch)#

Slide 27 — 7:56 (watch)#

Slide 28 — 8:00 (watch)#

Slide 29 — 8:34 (watch)#

Slide 30 — 9:06 (watch)#

Slide 31 — 9:12 (watch)#

Slide 32 — 9:30 (watch)#

Slide 33 — 9:48 (watch)#

Slide 34 — 10:02 (watch)#

Slide 35 — 10:18 (watch)#

Slide 36 — 10:38 (watch)#

Slide 37 — 11:00 (watch)#

Slide 38 — 11:12 (watch)#

Slide 39 — 11:30 (watch)#

Slide 40 — 11:38 (watch)#

Slide 41 — 12:02 (watch)#

Slide 42 — 12:26 (watch)#

Slide 43 — 12:44 (watch)#

Slide 44 — 13:02 (watch)#

Slide 45 — 13:08 (watch)#

Slide 46 — 13:42 (watch)#

Slide 47 — 14:08 (watch)#

Slide 48 — 14:24 (watch)#

Slide 49 — 14:38 (watch)#

Slide 50 — 14:46 (watch)#

Slide 51 — 14:54 (watch)#

Slide 52 — 15:00 (watch)#

Slide 53 — 15:10 (watch)#

Slide 54 — 15:26 (watch)#

Slide 55 — 15:38 (watch)#

Slide 56 — 16:02 (watch)#

Slide 57 — 16:14 (watch)#

Slide 58 — 16:26 (watch)#

Slide 59 — 16:50 (watch)#

Slide 60 — 17:12 (watch)#