Skip to content

Blog / Agent development and automation

Drops of determinism: the fix for MCP agents that keep getting it wrong

MCP execution failures aren't a model accuracy problem. They're an architecture problem. Here's why raw tool design produces inconsistent agent behavior, and what composite, workflow-backed tools do differently.

Marcus Dubreuil did not set out to solve a philosophy problem. He was trying to make AI agents reliable enough to use in production at J.W. Pepper, one of the largest sheet music retailers in the world. The answer his team landed on was practical rather than theoretical: instead of asking agents to reason through hundreds of raw tools, give them a small set of purpose-built workflows that encode the business logic directly.

He described it as “adding little drops of determinism into what the agent can do.”

That phrase captures something the broader MCP conversation keeps missing. When AI agents produce inconsistent results, send messages to the wrong people, or silently fail to complete tasks, the instinct is to blame the model, tweak the prompt, or upgrade to the latest version. The model isn’t the problem. It’s the architecture of the tools the model is working with.

What most MCP tools actually do

The majority of MCP tools deployed today are thin wrappers around APIs. They expose individual operations: get record, update field, send message, run query. The agent receives a task, surveys its available tools, and decides how to sequence them.

On paper, reasonable. In practice, this hands every consequential execution decision to probabilistic inference.

Take a simple workflow: a sales rep asks an agent to send a follow-up Slack message to the account owner after updating a CRM opportunity. That requires looking up the opportunity, identifying the owner, resolving their Slack handle, and sending the message. Four tool calls. Each one is an opportunity for the model to take a slightly different path.

Run it three times. The message goes to the wrong Tom. The SOQL query fails silently and no message gets sent at all. On the third run it works, and you cannot reproduce why.

This is not a hallucination. The model did not fabricate anything. It made a series of plausible but inconsistent decisions about how to execute a task that should have had one correct, repeatable path.

Where the variance compounds

LLMs are probabilistic by design. The same input does not always produce the same output, and that variability is useful for reasoning, synthesis, and judgment. But when an agent is executing a business process, you do not want variability. You want the same correct result every time.

Raw, granular tool architectures amplify this problem in two specific ways.

First, they multiply decision points. Every tool call is a chance for the model to make a slightly different choice about sequencing, parameterization, or error handling. Chain five tools and you have five compounding probabilities, each feeding the next.

Second, they leave business logic in the prompt. If you need the agent to always look up the canonical account owner rather than the last person to modify the record, that rule has to live somewhere. In a raw-tool architecture, it lives in the system prompt. Prompts drift. Instructions get rewritten. The logic is invisible, unversioned, and untested.

The result is an agent whose behavior is only as stable as the language used to describe it.

The fix is in the tool, not the prompt

Composite tools move decisions that should not vary out of the model and into the tool itself.

Good MCP tool design moves business logic out of the prompt and into the tool itself with purpose-specific workflows that encode exactly what should happen, in what order, every time. A composite tool called “send follow-up to account owner” handles every step internally: the CRM lookup, the deduplication, the handle resolution, the message send. The model decides to use that tool. The tool handles how it executes, every time, identically.

The model decides what to do. The tool handles how to do it reliably.

Composite tools are specifically suited for workflows requiring deterministic execution, strict ordering, transactional consistency, and state management — categories where an LLM’s probabilistic reasoning is not appropriate for carrying out orchestration.

Think of composite tools as microservices for agent access. You are not handing the agent a catalogue of raw API endpoints. You are building purpose-built operations with guarantees baked in, and exposing only those to the agent.

What J.W. Pepper actually built

Early in their MCP adoption, J.W. Pepper’s instinct was the common one: expose as many tools as possible. More context, more options, more flexibility. The model could figure out how to use them.

What they found was the opposite. “Instead of trying to give the agent 500 different MCP tools and saying ‘good luck,’” said Marcus Dubreuil, Director of Systems Architecture at J.W. Pepper, the team shifted to fine-tuned workflows that represent specific business actions: looking up an order, updating a ticket, sending a notification to the right person.

That shift reduced over 500 available tools to approximately 20 structured workflows. Zero raw database access. The agent could no longer run arbitrary queries against production systems. It could only execute the specific, scoped operations the IT team had defined, reviewed, and approved.

The reliability gain was architectural. “Even if it has access to systems,” Marcus noted, “it doesn’t know how to use them.” Encoding business logic into the tools themselves closed that gap without relying on prompting.

Over time, those workflows started to feel like something familiar. “As I’ve begun transitioning our use of Tray into MCP, I’m finding our workflows are becoming almost like microservices,” Marcus said.

“Instead of us trying to build this whole robust all-in-one iPaaS solution, we’re just adding these little drops of determinism into what the agent can do.” — Marcus Dubreuil, Director of Systems Architecture, J.W. Pepper

The phrase is worth holding onto. Determinism is not a constraint on the model’s judgment. It is a property of the tools the model is given to work with. Add it in drops, at the tool level, and the agent’s behavior becomes predictable without removing its ability to reason.

Why authentication scope matters for MCP tools

There is a second dimension of tool predictability that composite architecture addresses: authentication scope.

A raw MCP tool that uses a service account creates a broad, undifferentiated blast radius. Every user invoking that tool acts with the same permissions, regardless of their own access level. The audit log shows the service account, not the person who triggered the call. You cannot trace the action back to an individual.

A well-designed composite tool handles this at the architecture level. Tools that require broad system access can use a scoped service account, where the workflow logic constrains what that account can do. Tools that act on behalf of a specific user — reading their inbox, sending a Slack message in their name — can use that user’s own OAuth credentials. The agent acts with the permissions of the person asking, not the permissions of a shared admin.

This is what a drop of determinism at the identity layer looks like: the blast radius is bounded, the log is traceable, and the tool behaves predictably regardless of who invokes it.

Where to start

If your agents are producing inconsistent results, the right diagnostic question is not which model performs better on this task. It is: how much of the execution logic lives in the tool versus the prompt?

A practical test: take your ten most commonly invoked agent workflows and ask what happens if the model takes a slightly different path through the tool calls. If a different sequencing produces a meaningfully different outcome, the business logic is in the wrong place.

Audit your tool surface. Identify which tools are raw API wrappers and which encode actual business logic. The ratio tells you how much of your agent’s execution reliability depends on the model getting it exactly right every time.

Start consolidating. Pick one high-frequency agent workflow and build a composite tool that handles the full sequence internally. Measure execution variance before and after. The difference will be concrete.

Treat tool definitions as production code. They should be versioned, reviewed, and tested. A tool definition that changes silently is a workflow that behaves differently without warning.

The goal is not to remove the model from the picture. It is to give the model tools that behave reliably when invoked, so the agent’s judgment is applied to the right decision rather than spent reasoning through how to do it.

Tray Agent Gateway gives enterprise teams a governed environment to build, manage, and monitor composite MCP tools — with full audit logging, RBAC, and scoped authentication. For a broader framework on what predictable MCP execution requires, download the ebook: Getting MCP right.