Secure Agent with Pydantic AI

Overview

Pydantic AI - a Python agent framework from the creators of Pydantic that provides a type-safe, production-ready approach to building AI agents. It offers unified LLM provider support (OpenAI, Anthropic, Gemini, etc.), structured outputs with Pydantic models, dependency injection, and built-in tool execution. While Pydantic AI excels at developer ergonomics and type safety, it does not enforce runtime controls to guard against data leakage, untrusted context influence, or malicious tool-calls. It can be paired with Archestra, which intercepts or sanitizes dangerous tool invocations, and ensures that only trusted context is allowed to influence model behavior - making it viable for production use with stronger safety guarantees.

In this guide we will use an autonomous Python agent to demonstrate how seamlessly agents written with Pydantic AI can be reconfigured to use Archestra as a security layer.

The full example can be found on: https://github.com/archestra-ai/archestra/tree/main/platform/examples/pydantic-ai

Problem

Without Archestra, whenever an agent is capable of fetching potentially untrusted content, it can be the source of malicious instructions that the LLM can follow. This demonstrates the Lethal Trifecta vulnerability pattern.

In our example, the agent:

Has access to external data via the get_github_issue tool
Processes untrusted content from a GitHub issue containing a hidden prompt injection
Has the ability to communicate externally via a send_email tool

The GitHub issue (archestra-ai/archestra#669) contains hidden markdown that attempts to trick the agent into exfiltrating sensitive information via email.

Step 1. Get your OpenAI API Key

To use OpenAI models (such as GPT-4 or o3-mini), you need an API key from a supported provider.

You can use:

OpenAI directly (https://platform.openai.com/account/api-keys)
Azure OpenAI
Any OpenAI-compatible service (e.g., LocalAI, FastChat, Helicone, LiteLLM, OpenRouter etc.)

👉 Once you have the key, copy it and keep it handy.

Step 2. Get a GitHub Personal Access Token

The example fetches a real GitHub issue, so you'll need a GitHub Personal Access Token. You can create one at: https://github.com/settings/tokens

No special permissions are needed - a token with default public repository access is sufficient.

Step 3. Run the example without Archestra (Vulnerable)

First, let's see how the agent behaves without any security layer:

git clone git@github.com:archestra-ai/archestra.git
cd platform/examples/pydantic-ai

# Create .env file with your keys
cat > .env << EOF
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
GITHUB_TOKEN="YOUR_GITHUB_TOKEN"
EOF

# Build and run
docker build -t pydantic-ai-archestra-example .
docker run pydantic-ai-archestra-example

Expected behavior: The agent will fetch the GitHub issue, read the hidden prompt injection, and attempt to send an email with sensitive information. Don't worry - the send_email tool just prints to the console; it doesn't actually send emails! 🙈

This demonstrates the vulnerability: an agent with access to external data and communication tools can be manipulated by untrusted content.

Step 4. Run Archestra Platform locally

Now let's add the security layer:

docker run -p 9000:9000 -p 3000:3000 archestra/platform

This starts Archestra Platform with:

API proxy on port 9000
Web UI on port 3000

Step 5. Run the example with Archestra (Secure)

docker run pydantic-ai-archestra-example --secure

Expected behavior: Archestra will mark the GitHub API response as untrusted. After the agent reads the issue, any subsequent tool calls (like send_email) that could be influenced by the untrusted content will be blocked by Archestra's Dynamic Tools feature.

Step 6. Integrate Pydantic AI with Archestra in your own code

To integrate Pydantic AI with Archestra, configure the OpenAI model to point to Archestra's proxy which runs on http://localhost:9000/v1 (or http://host.docker.internal:9000/v1 from within Docker):

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
import os

agent = Agent(
    model=OpenAIChatModel(
        model_name="gpt-4o",
        provider=OpenAIProvider(
            base_url="http://localhost:9000/v1",  # Point to Archestra
            api_key=os.getenv("OPENAI_API_KEY"),
        ),
    ),
    instructions="Be helpful and thorough."
)

That's it! Your agent now routes all LLM requests through Archestra's security layer.

Step 7. Observe agent execution in Archestra

Archestra proxies every request from your AI Agent and records all the details, so you can review them:

Open http://localhost:3000 and navigate to Chat
In the table with conversations, open the agent's execution by clicking on Details
You'll see the complete conversation flow, including the task, tool calls, and how Archestra marked the GitHub API response as untrusted

Step 8. Configure policies in Archestra

Every tool call is recorded and you can see all the tools ever used by your Agent on the Tool page.

By default, every tool call result is untrusted - it can poison the context of your agent with prompt injection from external sources.

Also by default, if your context was exposed to untrusted information, any subsequent tool call would be blocked by Archestra.

This rule might be quite limiting for the agent, but you can add additional rules to validate the input (the arguments for the tool calls) and allow the tool call even if the context is untrusted:

Add Tool Call Policy

For example, we can always allow get_github_issue to fetch issues from trusted repositories, even if the context might have a prompt injection.

We can also add a rule to define what to consider as trusted content. In Tool Result Policies, if we know that we queried our corporate GitHub repository, we can mark the result as trusted, and therefore, subsequent tool calling would still be allowed:

Add Tool Result Policy

The decision tree for Archestra would be:

Archestra Decision Tree

All Set!

Now you are safe from Lethal Trifecta type attacks and prompt injections cannot influence your agent. With Archestra, the GitHub API response is automatically marked as untrusted, and any subsequent dangerous tool calls (like send_email) are blocked.

To learn more about how Archestra's Dynamic Tools feature works, see the Dynamic Tools documentation.