Threshold based HITL in MCP: Using Elicitation

5 minute read

Disclaimer: All opinions and views in this article are my own. When citing, please call me an Independent Security Researcher.

The best security control in agent systems is often not a sandbox, a policy engine, or a classifier. It is a well-timed interruption. In AI systems, that interruption can be what breaks the exploit chain.

Human-in-the-loop (HITL) works, but it also adds friction. As Yampolskiy notes: “One major issue with human-in-the-loop monitoring is that humans may not be able to keep up with the speed and complexity of AI systems, particularly as they continue to advance and outpace human capabilities.” Yampolskiy’s actual argument goes further: his impossibility results say human oversight is structurally failing, not just slow. I am not solving that deeper problem here. But it raises a narrower question:

What if some of that burden moved to the host OR… let’s say the server?

For example, VS Code already pauses long-running agent sessions on the host side. In agent mode, Copilot Edits may use many chat requests, so VS Code periodically asks whether to continue; this is configurable through chat.agent.maxRequests. What I’m interested in is a second layer: the server itself noticing when invocation patterns look unusual and triggering a checkpoint before continuing.

Instead of acting as a passive executor, the server can watch how tools are being used. If invocation frequency becomes unusual, or access patterns no longer resemble normal user behavior, the server can pause execution and explicitly elicit approval before continuing.

What I mean by this

MCP tool annotations can describe intent. readOnlyHint helps identify non-destructive tools. destructiveHint helps flag operations that deserve tighter review.

These fields are only hints, not trust anchors. An untrusted server can lie about them. But here, we control the server, so the annotations are honest and we can build on them.

What is more interesting is what the server can do beyond that. The server can track thresholds within a session. Things like repeated read activity or unusual invocation patterns. Once a threshold is crossed, the server can use MCP elicitation to pause the workflow and force a client-side approval step.

Here is how it works in the code below. The add tool is wrapped by gateAdd(). It counts how many times the tool is invoked in a session. The first three calls pass normally. On the fourth call, the server pauses and asks the user for approval via elicitation. If approved, the call goes through. If denied, it is blocked. The pending variable makes sure that if multiple requests arrive while approval is pending, they all wait on the same prompt instead of spawning duplicate dialogs. Once resolved, the counter resets.

flowchart LR A[Agent calls tool] --> B{count <= threshold?} B -- Yes --> C[Allow] B -- No --> D[Elicit approval] D -- Approved --> C D -- Denied --> E[Block]

▶ Video POCs

VS Code demo

Inspector demo

 1import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
 2import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
 3import { z } from 'zod';
 4
 5const server = new McpServer({ name: 'calc-mcp', version: '0.1.0' });
 6
 7const NumInput = {
 8  a: z.number().describe('First number'),
 9  b: z.number().describe('Second number'),
10};
11
12const THRESHOLD = 3;
13let count = 0;
14let pending: Promise<boolean> | null = null;
15
16async function gateAdd(): Promise<boolean> {
17  count++;
18  if (count <= THRESHOLD) return true;
19
20  if (pending) return pending;
21
22  pending = (async () => {
23    try {
24      const result = await server.server.elicitInput({
25        mode: 'form',
26        message: `Add called ${count} times (threshold: ${THRESHOLD}). Allow remaining calls?`,
27        requestedSchema: {
28          type: 'object',
29          properties: {
30            approve: { type: 'boolean', title: 'Approve', default: true },
31          },
32          required: ['approve'],
33        },
34      });
35      return result.action === 'accept' && result.content?.approve === true;
36    } finally {
37      count = 0;
38      pending = null;
39    }
40  })();
41
42  return pending;
43}
44
45server.registerTool(
46  'add',
47  {
48    title: 'Add',
49    description: 'Add two numbers.',
50    inputSchema: NumInput,
51    annotations: { readOnlyHint: true, destructiveHint: false },
52  },
53  async ({ a, b }) => {
54    if (!(await gateAdd()))
55      return { content: [{ type: 'text' as const, text: 'Blocked by user.' }] };
56    return {
57      content: [{ type: 'text' as const, text: `${a} + ${b} = ${a + b}` }],
58    };
59  },
60);
61
62server.registerTool(
63  'subtract',
64  {
65    title: 'Subtract',
66    description: 'Subtract two numbers.',
67    inputSchema: NumInput,
68    annotations: { readOnlyHint: true, destructiveHint: false },
69  },
70  async ({ a, b }) => ({
71    content: [{ type: 'text' as const, text: `${a} - ${b} = ${a - b}` }],
72  }),
73);
74
75const transport = new StdioServerTransport();
76await server.connect(transport);

Extending this to real-world scenarios

How does this generalize? Here is my rough thinking:

Tool typeDefault policyWhen to require approval
Read only, low sensitivityAllowWhen invocation count crosses a threshold or access pattern looks abnormal
Read only, high sensitivityAskAlways, or after a very small threshold
Write, delete, send, publishAskAlways
External network or open world actionsAskAlways, or when destination is not allowlisted

I think this can extend to other protocols too. It remains to be explored, but the building blocks are already there.

#Protocol / systemWhat it requiresWhy it matters
1MCP specClients should show confirmation prompts for sensitive operations and users should be able to deny tool callsMakes HITL a protocol-level expectation
2ACP SpecBuilt-in Await mechanism pauses execution until an external response arrives (designed for multi-turn data gathering, not security gating specifically)
3A2A SpecTask state includes input-required so an agent can stop and wait for more input (an orchestration primitive, not a security checkpoint)

Limitations and open questions

This is a POC, not a production design. A flat counter is not real anomaly detection. Session rotation resets it. Elicitation was designed for data gathering, not security gating. And multi-server environments might bypass a per-server threshold entirely. Honestly, you could call this a fancy per-tool rate limit and you would not be entirely wrong.

If someone builds on it, these are the gaps I would look at first.

The more interesting direction is probably contextual thresholds: deciding based on what is being accessed, not just how many times. A filesystem server can tell the difference between 50 reads to documentation files and 5 reads to credential files. A server tracking cross-tool patterns could notice “many reads followed by one send_email” (a classic exfiltration shape) and elicit before the send goes through. As far as I can tell, no MCP server, gateway, or paper implements behavioral correlation as a trigger for elicitation. That seems worth building next.

Conclusion

Not sure if this would work in production. This is just me thinking out loud about what a server-side threshold could look like. Hosts like VS Code already do their own version of “pause and ask.” Maybe pushing some of that to the server is the right call, maybe not. Worth exploring.

Other interesting stuff I am working on: spotlighting-datamarking, an OSS project around data marking for AI systems. Check it out if that sounds up your alley.

Hope this helps.

-

END