MCP Tool Sprawl: How I Cut 69 Tools to 43 With a Decorator

When your MCP server crosses 40 tools, something shifts. The agent still calls tools. But it starts guessing.

An agent with 69 tools does not have 69 capabilities. It has 69 ways to guess, and the guesses get worse as the count climbs.

I maintain mcp-redmine, an open-source production MCP server for Redmine with 689 tests, OAuth2 authentication, and community contributors who added tools faster than the architecture could absorb them. When the tool count hit 69, the server hit every one of those failure modes. I cut it to 43 by introducing a single decorator and a naming convention. Tool sprawl is a good problem to have; it means your server already works. If you’re not there yet, start by building an MCP server with FastMCP from a single file, then come back when the tool count gets unwieldy.

TL;DR: Group operations on the same domain entity into one manage_X(action=...) tool. A single @action_dispatch decorator validates actions, enforces read-only mode, and handles cleanup initialization. This is the Resource-Action Pattern: one tool per domain, action parameter dispatches. 34 tools became 9, a 35th was retired. Context overhead dropped. Agent tool selection improved.

Why tool count breaks agent behavior

The problem is mechanical, not philosophical.

GitHub’s official MCP server has been benchmarked at roughly 42,000 tokens in tool definitions alone, before the system prompt, before any conversation history. Cursor caps MCP tools at 40 in current versions. GitHub Copilot’s engineering team documented measurable degradation past 60 tools and reduced their default toolset from 40 to 13. Redis engineering benchmarked tool pre-filtering and found the difference stark: without filtering, 3+ seconds, ~23k tokens, wrong tool selected. With filtering: 392ms, 800 tokens, correct selection.

The degradation has two causes.

Context bloat. Tool definitions eat context. At 69 tools, a significant fraction of the model’s working memory is consumed by descriptions it will not use for the current task.

Semantic blur. When tools share a domain, their descriptions start to sound alike. list_issue_categories, create_issue_category, update_issue_category, delete_issue_category: four separate tool entries, each with its own schema. The model must discriminate between them every time issue categories are relevant. At scale, the semantic boundaries blur. The model blends parameters from one schema while calling another.

The standard fix is dynamic tool loading. Anthropic now ships a first-party Tool Search Tool that defers definitions until Claude needs them, cutting context by over 85%. At 200+ tools it is the right answer. But it composes better with a well-designed surface. 43 tools with clear domain boundaries search more accurately than 69 tools with overlapping schemas. Fix the surface first; add Tool Search when the catalog genuinely grows beyond what architecture can contain. This echoes what I saw on pdf-mcp: agents converged on a scout-then-read pattern and almost never touched the bulk-read tool. How Claude Code Actually Reads PDFs has the full observation.

The common growth pattern

Most MCP servers grow tool-by-tool. In an open-source project, this accelerates: a new contributor means a new tool.

add_watcher
remove_watcher
edit_note
set_note_private
get_private_notes
create_time_entry
update_time_entry
log_time_for_user

This is natural. Each tool has a clear name and a focused schema. It reads as clean design.

But after enough features, the server carries dozens of tools that operate on the same domain entities. The agent sees add_project_member, update_project_member, and remove_project_member as three distinct evaluation candidates. It reads all three descriptions when project membership is relevant. It pays context on all three schemas. And when the descriptions converge enough, it occasionally calls the wrong one.

The common instinct when this happens: add tool descriptions, tune prompts, or switch to dynamic loading. These are workarounds. The root cause is the tool surface itself.

The Resource-Action Pattern

The fix is one abstraction: one tool per domain entity, with an action parameter that dispatches to the operation.

# Before: three tools, three context slots
add_project_member(project_id, user_id, role_ids)
update_project_member(membership_id, role_ids)
remove_project_member(membership_id)

# After: one tool, one context slot
manage_project_member(
    action,        # "add"|"update"|"remove"
    project_id=None,
    user_id=None,
    membership_id=None,
    role_ids=None,
)

The tool count drops. The context cost drops with it. But the bigger change is cognitive.

This is the Resource-Action Pattern: group operations by domain entity, not by operation type. Instead of choosing between four similar tools, the model learns a convention: domain and action. “Which of these four category tools do I need?” becomes “I need to act on categories. Action is delete.” Tool selection becomes parameter filling. That is a much easier reasoning task.

Tool surface before and after consolidation: 12 individual tools on the left collapse into 4 manage_X tools on the right, each with an action parameter

The @action_dispatch decorator

Running manage_X tools by hand would mean repeating the same logic in every handler: validate the action string, block writes in read-only mode, register cleanup tasks. That boilerplate is where bugs accumulate.

I codified it into a single decorator.

@action_dispatch({
    "list":   ActionMode.READ,
    "create": ActionMode.WRITE,
    "update": ActionMode.WRITE,
    "delete": ActionMode.WRITE,
})
async def manage_issue_category(
    action: str,
    project_id: str,
    category_id: int | None = None,
    name: str | None = None,
    assigned_to_id: int | None = None,
) -> dict | list:
    if action == "list":
        ...
    elif action == "create":
        ...
    elif action == "update":
        ...
    elif action == "delete":
        ...

The decorator does three things before the handler runs:

Validates action against the declared set and returns a clean error on mismatch.
Enforces read-only mode: WRITE actions are blocked when REDMINE_MCP_READ_ONLY=true. READ actions pass through unchanged.
Initializes cleanup tasks for tools that write files to disk, so file expiry registration happens consistently.

Adding a new manage_X tool is one @action_dispatch declaration, not a copy-pasted pattern. Changing read-only behavior across the entire server is one place.

The @action_dispatch decorator flow: agent tool call enters, passes through action validation, read-only mode check, and cleanup initialization gates, then reaches the handler or returns an error

The numbers

34 individual tools collapsed into 9 manage_X tools:

Consolidated Into	Replaces
`manage_project_member`	add, update, remove member
`manage_issue_category`	list, create, update, delete
`manage_issue_relation`	list, create, delete
`manage_issue_watcher`	add, remove watcher
`manage_issue_note`	edit note, set note private
`manage_time_entry`	create, update, log for user
`manage_redmine_wiki_page`	get, create, update, delete, list, rename
`manage_product`	list, get, add, edit
`manage_contact`	list, get, create, edit, delete, assign, remove

Total: 69 tools to 43. The 9 manage_X tools carry identical functionality to the 34 they replaced, and a 35th tool, mark_checklist_done, was retired in favor of calling update_checklist_item(is_done=True) directly. The remaining 34 standalone tools stayed individual because each operates on a genuinely distinct domain with no consolidation benefit.

The module split that followed

There is a second refactor the tool consolidation made necessary.

Before: redmine_handler.py, 6,591 lines, all tools in one file. After consolidation, 43 tools in the same single file.

When the tools were isolated, the single-file structure was manageable. Once the manage_X tools shared helper logic across domains, related code was separated by thousands of lines. The file had become structurally misleading, not just large.

The natural next step: a tools/ package. 11 per-resource files under src/redmine_mcp_server/tools/, shared helpers in flat _X.py modules (_client.py, _errors.py, _validation.py, _serialization.py, _ssrf.py, and others).

The public MCP surface is unchanged. The internal structure is navigable. Any new contributor can open tools/issues.py and see every issue-related tool in one file.

I would not have seen the need for this refactor without the consolidation first. The tool count reduction made the structural problem visible. They were always connected; the consolidation just surfaced it.

What you give up

The Resource-Action Pattern has one real cost: schema-level discoverability.

A tool named create_issue_category communicates its purpose from the name alone. A tool named manage_issue_category requires the model to inspect the action parameter to understand the full capability set.

In practice, modern LLMs generalize the manage_X(action=...) convention within the first call. But the discoverability cost is real, which is why the decision rules below matter.

Use individual tools when:

Each operation has a distinct schema that shares few parameters with siblings
The domain entity is complex enough that one handler would become unreadable
You have 2 or fewer operations per domain

Use the Resource-Action Pattern when:

You have 3 or more operations on the same entity
The operations share most of their parameters
You are already seeing context pressure or selection errors

One rule that made the decision easy in practice: if three tools can share a handler with a top-level if action == ... branch, they belong in a manage_X tool.

How to apply this to your server

Start with an audit. List every tool and annotate the domain entity it operates on. Group by entity. Any group with 3 or more tools is a consolidation candidate.

Write the decorator once. Read-only mode enforcement, action validation, cleanup initialization: these belong in one place. If you are repeating that logic per handler, you are accumulating drift.

Consolidate conservatively. One tool per domain only helps when the domain is stable. If you are still exploring the API surface, individual tools give you faster iteration. Consolidate once the shape is settled.

The goal isn’t a minimal tool count. It’s a surface the agent can reason about without guessing.

Specificity stops scaling

The instinct when building an MCP server is to be specific: one tool per operation, named clearly and scoped tightly. That instinct is correct at 10 tools.

At 60, specificity becomes noise. The agent cannot discriminate add_watcher from remove_watcher from edit_note without reading all three schemas. Every disambiguation costs context. And when the descriptions are close enough, the model stops selecting and starts guessing.

The Resource-Action Pattern reverses this by grouping tools at the right level of abstraction rather than hiding them: one tool per domain, one decorator per convention. The agent stops choosing between variants of the same action and starts filling two fields: domain and operation.

69 tools to 43 is not the interesting number. The interesting number is 35: the tools that disappeared without losing a single capability, 34 folded into 9 grouped ones and one retired for an equivalent that already existed.

mcp python ai-agents production-systems

Kevin Tan

Cloud Solutions Architect and Engineering Leader based in Singapore. I write about AWS, distributed systems, and building reliable software at scale.

Email Portfolio LinkedIn GitHub

Why tool count breaks agent behavior

The common growth pattern

The Resource-Action Pattern

The @action_dispatch decorator

The numbers

The module split that followed

What you give up

How to apply this to your server

Specificity stops scaling

Get real-world MCP systems in your inbox.

Discussion

Related posts

How One Search Change Eliminated an Entire Agent Step

How to Evaluate an MCP Server With an LLM: 17 Bugs Found and Fixed

Section Chunking vs Page Chunking for AI Agents: ~6 Fewer Tool Calls Per PDF Query