In the world of large language models (LLMs) and AI assistants, the Model Context Protocol (MCP) has emerged as a modular, standardized method for exposing tools, structured prompts, and resources to LLMs. Over the past months, it has become the de facto standard for extending capabilities across nearly all major LLM agent frameworks and frontends, including OpenAIās ChatGPT, Anthropicās Claude, and integrations like Kilo Code in Cursor and VSCode.
This article introduces the MCP protocol, explains how it differs from traditional function calling, and walks through its architecture, lifecycle, and transport mechanisms. Later, we will implement a simple MCP server using Python and the FastMCP SDK to expose an MQTT interface to LLM agents.
It is important to emphasize that MCP does not invent anything new. All tasks that MCP can solve can already be solved with function calling, simple JSON documents and file or HTTP resources. But it provides a standardized and modular way to establish loose dynamic coupling and resource injection into orchestrator frameworks.

From Function Calling to Model Context Protocol
Originally, LLMs could interact with external systems using function calling: the orchestrator framework would inspect the current user intent or agent goal, collect the currently relevant tools from its internal registry (or a custom dynamic mechanism), serialize their definitions into OpenAI-compatible JSON, and pass them to the LLM along with the full conversation history in a stateless API call. When the model responds with a tool_call
, it means it has processed the input and is requesting a specific method execution. The orchestrator then executes the tool, adds the result to the message stream, and initiates a new inference call. This loop enables dynamic and modular behavior, but the registry and discovery of tools remains framework-specific and not standardized across different agents or environments. This approach, while flexible within a custom orchestrator, still comes with practical limitations in typical usage:
- While tools can be built dynamically, this requires implementation-specific logic and is not standardized.
- Discovery and invocation of tools across systems or services must be manually integrated, often tightly coupled to the orchestratorās architecture.
- Remote execution or tool delegation is possible, but again, it requires bespoke communication layers (e.g., message queues or custom APIs) without a shared protocol.
MCP addresses these issues by offering a standardized way for external components to declare and serve tools, prompts, and resources, enabling agent frameworks to discover and use them with minimal custom integration.
It is important to note that MCP does not change how stateless LLM inference APIs like OpenAIās chat/completions
work - each call is still a one-shot inference using the full message history and tool definitions passed at that moment. The difference lies in how the agent orchestrator behaves between those calls.
If you are building your own orchestration loop, you can fully implement dynamic behavior even with standard function calling: just maintain an internal registry of available tools and pass a different tools
array on each iteration based on logic, user state, or context. This works perfectly well for custom, tightly coupled pipelines.
What MCP offers is a standardized interface for discovering tools, prompts, and resources from third-party services or independently authored modules. Instead of managing internal configuration or hardcoding tool schemas, your agent can query one or more MCP servers (via stdio or HTTP), retrieve metadata, and invoke tools in a modular and loosely coupled fashion.
This makes MCP ideal for plugin-like systems, distributed agents, or any setup where the components evolve independently but need a shared protocol for coordination.
Core Components of MCP
At its core, the MCP defines a few key concepts:
Tools are executable methods that the LLM can call. They are described in OpenAI-compatible JSON schema (similar to function calling) and provide input/output specifications.
Examples:
search(query: str)
fetch_weather(location: str)
publish(topic: str, payload: str)
These tools can be exposed by the MCP server dynamically and invoked over the wire. If this sounds like function calling then yes - it is the same. The difference is that MCP has specified the format of function declarations (the JSON schema) as well as the transport over which one exchanges those methods. The orchestrator still fetches the list of relevant methods that one wants to use, passes them to the tools
array of the LLM from where they get passed into the chat template - and executes LLM inference exactly the same way as for traditional function calling. When the orchestrator receives a tool_call
response he executes the method by doing an RPC call through the transport that has been used (a network request or passing the request to an external process). The idea is exactly the same.
Prompts (Templates)
Prompts are modular prompt fragments or full templates that the orchestrator can dynamically retrieve and inject into the LLMās context. This mechanism allows external components to provide specialized behavior, instructions, or personality traits to LLMs in a structured way - especially useful for sub-task handling, formatting conventions, or chain-of-thought scaffolding.
Prompts exposed via MCP each have a name, title, description, and content field. They can be selected either by the LLM itself (from a known list) or injected by the orchestrator based on configuration, the current context, model responses, or user intent.
Once selected, the prompt is fetched from the MCP server and incorporated into the LLM context - typically in one of the following ways:
- As a
system
message when initiating a subcontext
- As part of the running conversation history
- Appended or prepended to a user query
From the modelās perspective, it sees the prompt as ordinary text input. There is no special API-level difference - the value lies in the modularity and flexibility of where the content comes from. This design offers several advantages:
- Avoids hardcoding prompt templates in the agent codebase
- Prompts can be versioned, reused, and maintained independently
- Multiple agents can share prompt libraries
- Prompts can be authored and managed by non-developers
There is a standardized protocol for discovery and injection, eliminating the need for bespoke JSON formats or incompatible APIs between frameworks
Resources
Resources are structured pieces of non-executable content - such as documents, configuration files, or knowledge snippets - that can be discovered, retrieved, and injected into an LLMs context.
Examples:
- Markdown files with API documentation
- PDF manuals or changelogs
- JSON or YAML configuration data
- Graph or vector database summaries
While such data could also be retrieved via a custom HTTP API or internal agent logic, MCP resources provide a standardized discovery and delivery interface. Each resource includes metadata (name, description, MIME type, path) and can be listed, previewed, and retrieved through the same protocol as other MCP elements.
Why not just use HTTP?
You could expose your documentation or database snapshots via HTTP endpoints. But then youād need to implement:
- A discovery layer (what files exist?)
- MIME type inference
- Metadata schema
- Access control or filtering
- Compatibility logic for multiple agents
MCP solves this by offering a unified interface that:
- Makes resources self-describing
- Allows structured querying across different MCP servers
- Integrates into the same transport (pipe or HTTP/SSE)
- Allows agents to dynamically discover and load context-relevant documents without hardcoding logic
How are resources used by an orchestrator?
The orchestrator can list the available resources exposed by MCP servers and decide (based on model requests, current task, or configuration) which ones to load. The content can then be injected:
- As a
system
or user
message
- As part of few-shot context
- Or displayed to the user to let the model comment or reason about it
In many cases, a resource may represent a dynamic wrapper - for example:
- A database-backed MCP resource could stream structured results from a SQL or graph query
- A vector database interface could expose document chunks with embeddings as selectable resources
This makes the MCP server a standardized proxy to external knowledge systems, giving agents the ability to explore and use data on demand - without tightly coupling the orchestrator to each specific backend implementation.

Lifecycle of an MCP Server
A typical MCP server follows this lifecycle:
- Startup and Declaration: On launch, the server declares which tools, prompts, and resources it provides. While some implementations (such as those using FastMCP) use a manifest.json-like structure, this is not mandated by the protocol itself - the format and mechanism are implementation-specific and may vary depending on the server architecture or underlying framework.
- Transport Initialization: The server waits for incoming connections, which can come through stdin/stdout pipes (if run as a subprocess) or over HTTP/SSE (for multi-agent deployments).
- Discovery and Listing: When queried, the server returns lists of available tools, prompts, and resources, each with descriptions and schema (for tools).
- Invocation: Agents send requests to invoke tools, retrieve prompts, or load resources. These can happen multiple times over the connection.
- Termination or Keep-Alive: The server continues running as a background service or subprocess, responding to further queries until terminated.
Transport Mechanisms
MCP supports three major transport methods:
Standard I/O (stdin/stdout)
- Suitable for launching the MCP server as a subprocess.
- Enables fast local communication between a single agent framework and the MCP process.
- Cannot be shared across multiple agents; each orchestrator needs its own subprocess instance.
- Commonly used in local tools like Kilo Code to wrap a Python script or binary in MCP format.
Streamable HTTP
- Stateless interaction via HTTP POST.
- Each request/response cycle is standalone, making it ideal for cloud-hosted or REST-integrated MCP services.
- Supports concurrent requests from multiple agent orchestrators.
- Easily integrated into load-balanced environments or secured with HTTP headers, API tokens, or mTLS.
- Session handling is orchestrator-defined; typical patterns include passing agent identifiers in headers or query parameter
Session Handling in the FastMCP SDK
The FastMCP SDK uses a session-based access model to manage streamable HTTP endpoints securely and contextually:
- Before any tool, prompt, or resource can be accessed, the client must call the /register endpoint to establish a session.
- The server responds with a token or cookie identifying the session.
- Subsequent requests must include this session token (e.g., via an Authorization or X-Session header).
- If no valid session is provided, requests to endpoints like /invoke or /stream will result in 400 Bad Request or 401 Unauthorized responses.
This mechanism allows FastMCP to isolate agents, apply per-session filtering, and potentially enforce authentication and rate limits - without relying on external reverse proxies or middleware.
Server-Sent Events (SSE)
- Persistent connection allowing real-time push-style communication.
- Suitable for streaming outputs, monitoring subscriptions, or feeding the model incrementally.
- Multiple clients can connect concurrently, with the server maintaining separate channels for each stream.
- Supports authentication and authorization via HTTP headers (e.g. tokens) at connection initiation.
- Can be combined with structured session registration (e.g. via initial payload or handshake) for access control and audit logging.
While stdin/stdout is fast and simple for single-process local integration, only the HTTP-based transports (streamable HTTP and SSE) support true multi-agent sharing, persistent availability, and network-based security models. These are essential for distributed architectures and plugin-based ecosystems.

Typical applications
MCP is especially well-suited for:
- Plugin-style injection: Tools, prompts, and resources can be registered and served by independent components, allowing agents to integrate third-party capabilities without modifying core logic. This enables true plugin architectures.
- Remote agents and device control: MCP servers can run on other machines, embedded systems, or gateways - allowing LLMs to interact with remote databases, information repositories, long running tasks, lab equipment, home automation, industrial machinery, or sensor networks in a modular way.
- Dynamic capability discovery: An orchestrator or LLM can first reason about the problem at hand (e.g., using semantic search or graph traversal), then query MCP servers to assemble just the right subset of tools needed for that task, reducing overload and improving contextual relevance.
- Shared multi-agent services: MCP servers can be reused across multiple orchestrators or sessions, enabling central services to be shared and scaled cleanly.
- Standardized integration for context enrichment: Prompts and resources can be retrieved from MCP servers on demand, supporting context construction workflows that evolve during a session, rather than being statically defined up front.
Actual (hopefully useful) implementations
In the following section we are taking a look at some small scale MCP server implementations.
- At first we give a quick glance at the typical example - a simple adding routine that you can call from your LLM orchestrator including examples on how to use it from the Kilo-Code plugin in Cursor or Code-OSS (VSCode)
- Then we are turning towards a very useful extension for many agent pipelines - and MQTT MCP that allows the LLM to access arbitrary MQTT topics to interact with different microservices.
The toy MCP
To demystify MCPs, here is a minimal server you can paste into toy_mcp.py
and execute it. It exposes a single tool, now(), returning the current ISO timestamp, plus a tiny read-only resource.
from datetime import datetime, timezone
from FastMCP import FastMCP, Context
mcp = FastMCP("toy-mcp")
@mcp.tool(annotations={
"title": "Return the current time (UTC).",
"readOnlyHint": True,
"destructiveHint": False,
})
def now(ctx: Context = None) -> str:
return datetime.now(timezone.utc).isoformat()
@mcp.resource("toy://hello")
def hello() -> str:
return "Hello from Toy MCP! Try the `now` tool."
if __name__ == "__main__":
mcp.run() # stdio transport; launchable by your orchestrator
You can utilize this in your orchestrator by configuring the quasi-standardized mcp.json
configuration file - one has to look up where this is located for your specific orchestrator. For KiloCode, for example, you can either store the settings in the global mcp.json
or relative to your current project folder at .kilocode/mcp.json
.
{
"mcpServers": {
"toy": {
"command": "python3",
"args": ["/home/exampleuser/toy_mcp.py"],
"alwaysAllow": [
"now"
]
}
}
}

Interacting with MQTT via an MCP Server
Now lets do something useful - letās make MQTT a first-class citizen for LLM agent orchestrators. MQTT is the lingua franca of devices, labs, and home automation. Exposing it through MCP lets an agent safely discover topics, subscribe to live data, and publish commands - all within the same agent workflow (either interactive through a chat session or via an background agent). A few concrete things this unlocks:
- Sense the real world. Subscribe to temperature, vibration, or power meters; read machine status from CNC machines or 3D printers; collect GPS data from trackers; watch door, window or motion sensors.
- Trigger actions on hardware. Publish start/stop, mode changes or setpoints to robots, pumps, lights, HVAC, irrigation, shutters - anything that already speaks MQTT (directly or via a bridge).
- 3D printer and CNC control. Start prints, send āpauseā, āresumeā or āset temperatureā commands; request current layer/state; route alerts into chat; kick off a maintenance macro. React to finished prints
- Home automation. Toggle scenes, dim lights, arm/disarm alarms, open gates - while keeping commands constrained to allowed topics.
- RPC workflows. Ask a device or service for status, health or metrics on a response topic; trigger one-off jobs (like executing calibrations, taking snapshots, homing devices, etc.) and wait for the reply.
- Interact with different services and microservices. Start CI jobs, deploy canary services, fan out data processing tasks or notify downstream microservices - MQTT as a light control bus for your microservices zoo.
- Update information on dashboards. Subscribe once and forward readings into a database, generate alerts, or feed a live dashboard - handy for experiments and long-running tests.
- Human-in-the-loop safety. Route dangerous commands to an approval queues; separate dry-run from apply topics; log every action and response for audit.
- Edge-cloud bridge. You can utilize this as a bridge between edge devices and the cloud.
In short: the MQTT MCP gives your agent eyes (subscribe), hands (publish) and a voice for structured conversations with devices and services (request/response) - without leaving the chat or agent workflow or having to program in the traditional way.
The implementation
The implementation of this MCP is a little bit more complex and can be found on GitHub and can also be installed from PyPi via
The stdio protocol based MCP server is configured via a single configuration file at ~/.config/mcpmqtt/config.json
or at a configurable location specified via the --config
parameter. An example configuration file looks like
{
"mqtt": {
"host": "localhost",
"port": 1883,
"username": null,
"password": null,
"keepalive": 60
},
"topics": [
{
"pattern": "sensors/+/temperature",
"permissions": ["read"],
"description": "Temperature sensor data from any location (+ matches single level like 'room1', 'room2'. Known rooms are 'exampleroom1' and 'exampleroom2'). Use subscribe, not read on this topic. Never publish."
},
{
"pattern": "sensors/+/humidity",
"permissions": ["read"],
"description": "Humidity sensor data from any location. (+ matches single level like 'room1', 'room2'. Known rooms are 'exampleroom1' and 'exampleroom2'). Use subscribe, not read on this topic. Never publish. Data returned as %RH"
},
{
"pattern": "actuators/#",
"permissions": ["write"],
"description": "All actuator control topics (# matches multiple levels like 'lights/room1'. To enable a light you write any payload to 'lights/room1/on', to disable you write to 'lights/room1/off')"
},
{
"pattern": "status/system",
"permissions": ["read"],
"description": "System status information - exact topic match"
},
{
"pattern": "commands/+/request",
"permissions": ["write"],
"description": "Command request topics for request/response patterns"
},
{
"pattern": "commands/+/response",
"permissions": ["read"],
"description": "Command response topics for request/response patterns"
}
],
"logging": {
"level": "INFO",
"logfile": null
}
}
The sections in the main JSON object are:
mqtt
: Contains the broker configuration
topics
: Provides pattern, permissions on the given topics and a description that is used by the LLM to select which topic to use.
logging
provides logging configuration (in a crude way) for debugging.
The topic permissions allow one to perform filtering of which topics and be subscribes or published to by the agent orchestrator - this works in addition to the message broker configuration.
The tools provided are
mqtt_publish
to simply publish an event to the message broker (including payload)
mqtt_subscribe
to subscribe to a topic and collect a specified maximum number of messages (or reach the timeout)
mqtt_read
subscribes to a specific topic, waits for a single message and then removes the subscription again
mqtt_query
transmits a request and waits for a single reply (on two different topics - do not forget to use correlation IDs to identify which response belongs to which request, this is not done by the MCP server since it knows nothing about the service)
In addition the MCP server exposes two resources. First the mcpmqtt://topics/allowed
resource provides a list of all useable topics and their permissions. In addition mcpmqtt://topics/examples
also provides examples to the agent orchestrators.
Keep in mind that the MCP grants whatever the broker permits and that is whitelisted in configuration - lock it down at the broker and in the MCP config properly or you will encounter interesting situations.
Why This Design Works Well for Agents
The design makes the MQTT space self-describing to an agent. Instead of handing the model a static set of methods, the server exposes resources that enumerate whatās actually allowed at runtime. mcpmqtt://topics/allowed
returns concrete patterns, permissions, and broker hints (host/port; whether auth is required), and you can also serve example expansions so the agent sees how +
and #
wildcards materialize into real topics. Under the hood this is populated straight from the live config via a small global that the resource reads, so discoverability stays in sync with whatever youāve deployed.
Safety comes from checking the rules before touching the wire. Every tool - publish, subscribe, one-shot read and request/response - verifies the requested topic against the permission set (read and/or write) and only proceeds if it matches one of the configured patterns. Because the same wildcard semantics used by MQTT are enforced in the MCP layer, an agent canāt accidentally publish to or subscribe from a disallowed branch even if it guesses a valid-looking string or wants to exploit your architecture in a way that you do not desire; the tool simply refuses and returns a clear error. This pushes guardrails to the controllable edge, where they belong, and keeps the agentās often nondeterministic behavior constrained to the intended slice of the broker. This prevents bad surprises from hallucinations or a run away context.
The async
story is robust. Paho delivers messages from its own thread; the client manager records āwhoās waiting for whatā in a dictionary of asyncio.Futures
and when a message arrives it completes the right future on the correct event loop using loop.call_soon_threadsafe(...)
- the thread-safe handoff that avoids racey cross-thread mutations. Waiting is bounded (asyncio.wait_for
) with cleanup that removes stale futures, and the RPC helper arms the response listener before publishing the request to prevent missed replies. The MCP server also wraps connection setup/teardown in a lifespan context so tools only run with a live broker and shut down cleanly. The result is an agent interface thatās non-blocking, race-aware, and predictable under timeouts.
A mini code tour
mcp_server.py
The server is created again via the FastMCP
constructor. In contrast to the simple example above a lifespan context manager is also supplied that loads the configuration, constructs the MQTT client, yields a context, and cleans up on shutdown. Topic permissions are validated before any network action with validate_topic_permission(...)
against your configurations patterns. A global _current_config
lets other parts of the application read the active configuration to render discoverable topic info.
The tools are again exposed via the @mcp.tool
annotation - but in contrast to our simple example the functions carry additional annotations - a title for each tool as well as the following hints:
readOnlyHint
signals that the method is a simple getter that has no side effects.
destructiveHint
signals that a method has side effects that cannot be undone.
idempotentHint
tells the orchestrator that the method is idempotent. Multiple calls do not accumulate and yield the same result as a single call.
openWorldHint
tells the orchestrator that he is interacting with the open world and not a closed environment.
Resources are annotated with @mcp.resource
and get an mcpmqtt://
URL style prefix.
mqtt_client.py
This contains the MQTT client manager. It builds on paho-mqtt
with a thin async layer. It manages connection state and performs all actions exposed to the orchestrator. A pending_responses
dictionary maps to the asyncio
Futures
that are triggered whenever an operation finishes in the paho-mqtt
thread. This allows signalling the finished operations into the Futures
asyncio
loop (a different thread) in a thread safe way. The wait_for_message
method utilizes this mechanism by awaiting
on the Future
that represent itās request.
Conclusion
MCP provides a standardized, modular way to expose tools, resources and prompts to orchestrators - dynamically, at runtime, and without baking assumptions into any single agent framework. By separating capability description from transport and execution, it gives you discoverability, permission boundaries, and composability out of the box. Writing an MCP server is intentionally simple: the toy MCP shows that a few clear tool definitions and a tiny event loop are enough to get a real, inspectable capability surface that any MCP-aware orchestrator can use.
Building on that, the shown example MQTT MCP provides a clean interface to controlled interaction with the real world - IoT devices, home automation, lab gear and robots. Topics are discoverable yet fenced by allowlists; publish/subscribe and request/reply are guarded by timeouts and correlation; async hand-off keeps the client robust under load. The result is a portable, permissioned āhands-and-eyesā layer that scales from a breadboard sensor to a building automation system while remaining easy to reason about and safe to operate.

This article is tagged: