5 Thoughts On Agent Engineering

March 9, 2025

While Agents became the buzzword of 2024, their real-world usage thus far has fallen short. The idea of agentic systems is likely to further embed itself into software engineering discourse despite most developers still having an ambiguous understanding of both what an agent is and the ambiguity of how to design and engineer for this new paradigm.

It’s not hard to imagine this paradigm becoming as practical and commonplace as the usage of web-browsers for office work but the journey to that arrival is far from clear-cut as there are still many challenges about how these agents should be designed in a manner that allows them to be easily reused, adapted, deployed and monitored. Here are some of my initial thoughts on the topic:

1. Design Patterns Will Diverge From CRUD Apps

Arguably the most capable application of LLMs for designing and implementing end-to-end software systems has been when utilized for the development of create-read-update-delete web applications. This is largely beneficial as it allows devs to spin up complex web apps in a fraction of the time that it would have taken before. Unfortunately, the design patterns that LLMs tend to prefer (and are extremely common for CRUD apps) are often at odds with building agentic systems. At present, this mostly results in initial agents that utilize brittle patterns and design choices and result in systems more akin to a Workflow than open-ended agents.

2. Data Structures Become Footguns

Competency in modern programming is underpinned by understanding the data flow and the underlying data structures of complex programs. This notion is not going away for agent systems, but utilizing these data structures from models will likely result in brittle systems and additional complexity that is ends up being model specific. Most likely, this means requiring additional tests and checks for transforming text to and from data structures and can cause hurdles about updating models or being model provider agnostic. For a minimal example of this, using pydantic-ai we have a straightforward example of marshalling the models output to a data structure will fail:

from typing import Annotated

from pydantic import Field
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from typing_extensions import NotRequired, TypedDict


class Whale(TypedDict):
    name: str
    length: Annotated[float, Field(description="Average length of an adult whale in meters.")]
    weight: NotRequired[
        Annotated[
            float,
            Field(description="Average weight of an adult whale in kilograms.", ge=50),
        ]
    ]
    description: NotRequired[Annotated[str, Field(description="Short Description")]]


agent = Agent(OpenAIModel("qwq", provider=OpenAIProvider(base_url="http://localhost:11434/v1")), result_type=list[Whale])
result = await agent.run("Generate me details of 3 species of Whale.")
print(result.data)

Regardless of what model being used (or all the self-hosted and most of the external models I tested), its common to run into issues:

pydantic_core._pydantic_core.ValidationError: 3 validation errors for typed-dict
response.0
  Input should be an object [type=dict_type, input_value='1. Blue Whale (Balaenopt...angered due to whaling.', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/dict_type
response.1
  Input should be an object [type=dict_type, input_value='2. Humpback Whale (Megap...le conservation status.', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/dict_type
response.2
  Input should be an object [type=dict_type, input_value='3. Sperm Whale (Physeter...ales are more solitary.', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/dict_type

The above exception was the direct cause of the following exception:
(...continues with additional errors...)

In the case where we forgo the data structure and solely ask the model to output text, the model is quite capable:

agent = Agent(OpenAIModel("qwq", provider=OpenAIProvider(base_url="http://localhost:11434/v1")))
result = await agent.run("Generate me details of 3 species of Whale.")
print(result.data)

**Species 1: Humpback Whale**

* Scientific Name: Megaptera novaeangliae
* Habitat: Warm and temperate waters around the world, including oceans and seas in all major continents.
...(continues and provides additional details and additional species)

It is easy to disregard the trade-offs that will come from dealing with the complexity of models outputting data structures (or just anticipate the next models being more capable) but it also can mean that the best design decision is to forgo data structures unless absolutely necessary when utilizing models for agentic systems. When designing systems where models are primarily using text as the data structure, many of the standard ideas of how to use data structures that are common place in software development will need to be reconfigured.

3. Thinking Models Are The Greatest Leverage Point For Open Ended Tasks

I am relatively surprised at how well the latest open source thinking models work when integrated into agentic systems. Rather than having to explicitly define workflows and directed graphs that the agent can act upon, thinking models allow you to offload large amounts of these design decisions to the prompt and agent tools itself and allow the model to direct its own workflow. Ive found this particularly true for a deep research implementation where allowing the model to direct its own research direction results in the model arriving at vastly more intricate research directions than trying to explicitly define this process.

4. The Low Hanging Fruit Are Not Workflows

The AI obsessed seem to envision agents replacing the majority of tasks office workers do through a mouse and keyboard. This assumption presumes these tasks are akin to straightforward workflows that tools such as LangChain seem highly suited for but the reality is that most tasks which can be automated into a workflow have a better interface to do these through or have barriers to doing these via code. Rather, the low-hanging fruit that AI agents will be able to be developed for and have measurable impact will require more open-ended approaches that allow the system to generate and iterate on its own approach.

5. Agents Will Drive Local Model Usage

Right now, the overwhelming majority of AI usage is via hosted models. Once AI has become more saturated and become embedded into most software, there will become increasing interest to offload parts these systems to the client. Right now this is almost non-existent despite already being possible. For instance with recent versions of Chrome, pull up the javascript console and run the following:

session = await ai.languageModel.create({systemPrompt: "Generate a JSON that contains a question and answer based on the following post"})
output = await session.prompt(document.documentElement.innerText)
console.log(output)

Which can give you the following result:

{
  "question": "What are the key challenges and potential solutions associated with designing agents?",
  "answer": "Designing agents presents several challenges.  One is the need for more sophisticated data structures beyond simple key-value pairs, as LLMs excel at working directly with text.  Another challenge is the lack of clear guidelines for designing and deploying these agents, especially in complex scenarios where open-ended tasks are required.  However, thinking models offer a potential solution by allowing agents to generate and iterate on their own workflows, potentially automating tasks that are difficult to define as traditional workflows.  The increasing availability of local AI models and the desire to reduce reliance on external APIs could also contribute to the development of agents."
}

While tools like MCP are allowing agents to have more breadth from applications, agents which utilize a browser have been mostly limited to using frameworks such as Playwright. To more closely integrate these systems together, various forms of local models (e.g. run from the browser) may be a primitive that enable agentic systems to become integrated into day-to-day office work. Or, as the pervasiveness of AI slop becomes unbearable, these local models will intermediate experiences such as browsing the web without requiring vast amounts of personal data to be sent to external services. Usages such as cleaning up web browsing experience may not seem like agentic systems but the underlying engineering that builds these amalgamations is likely to be much more probabilistic than the traditional software engineering approaches that most engineers become familiar with.

Conclusion

I am very interested in building or working on any ML/agentic software, if this is something you are working on or would like to integrate at your company, please reach out to me at graham.annett@gmail.com. Additionally, if you have any feedback at all please reach out to me.