AI Engineering

Agent control planes are not meta-harnesses

Agent control planes govern identity, policy, visibility, and sessions. An operational meta-harness governs the work path from intent to evidence.

By Petru Arakiss/

Jul 8, 2026(updated)

/12 min read

The phrase "agent control plane" is already being stretched.

GitHub ships an agent control plane for enterprise AI controls, session activity, audit logs, custom agents, and MCP allowlists. Microsoft tells organizations to establish a single control plane for AI agents so they can identify agents, assign ownership, limit access, observe activity, and stop bad actions. LiteLLM describes the next stack as models, harnesses, runtimes, and an agent control plane. Security vendors use the same phrase because coding agents now read repos, select dependencies, call MCP tools, write code, and open PRs.

The term is useful. It also hides a boundary that matters.

An agent control plane governs agents as resources.

An operational meta-harness governs agent work as a process.

Those are not the same job.

Scope note: this is a category boundary, not a claim that every vendor is using the words wrong. A mature product can combine control-plane and meta-harness responsibilities. The risk is treating one responsibility as proof of the other.

What the control plane actually controls

The control plane term comes from infrastructure, where the control plane decides and the data plane does the work. In agent systems, the data plane is the model call, tool call, file edit, browser action, package install, shell command, PR creation, deployment operation, or API call. The control plane is the administrative and policy layer that decides which agents exist, who owns them, which identities they use, which tools they can reach, which sessions are running, what cost they create, what audit trail they leave, and which policies apply.

That layer is necessary. Without it, a company running agents at scale is mostly hoping.

A useful agent control plane should answer basic questions:

Which agents exist?
Who owns each agent?
Which user is the agent acting on behalf of?
Which MCP servers and tools are allowed?
Which policies were active during a session?
What did the agent invoke?
Which session started, finished, failed, or escalated?
Which activity is visible to security, platform, and compliance reviewers?
Which permissions can be revoked centrally?

This is real work. GitHub's public changelog for enterprise AI controls talks about audit entries that mark an actor as an agent, user attribution for who the agent acts on behalf of, agent session task events, session activity views, custom agent definitions, and MCP allowlists. Microsoft's guidance says a centralized agent control plane should provide identity, policy enforcement, inventory, ownership, behavioral visibility, and cross-platform governance. LiteLLM frames the control plane as the next thing above many agent runtimes: one place to manage runtimes, schedules, memory, and sessions.

That is the administrative layer.

But administration is not execution quality.

What the control plane does not prove

A control plane can tell you that an agent ran. It can tell you which identity it used, which tool boundary applied, which MCP registry was allowed, which session emitted events, and which audit row was recorded.

It does not automatically prove the work was correct.

It does not prove the agent got the right repository context.

It does not prove the task was scoped well enough to delegate.

It does not prove the chosen harness was the right one.

It does not prove the shell command ran in a safe worktree.

It does not prove the browser evidence matched the user's flow.

It does not prove a migration was reversible.

It does not prove a security boundary stayed intact.

It does not prove the agent's final answer matches the diff.

It does not prove the workflow should be kept after the next model release.

Those are meta-harness questions.

The distinction is not academic. It is the difference between "we can see agent sessions" and "we can trust this agent-mediated change enough to ship it."

Harness, control plane, meta-harness

A harness is the operating envelope around one agent or agent loop. It defines tool access, context, permissions, memory, command execution, file editing, state, and validation for that agent. Codex has a harness. Claude Code has a harness. Cursor has one. OpenHands has one. MCP servers expose parts of a tool surface into one.

A control plane sits above agents as managed resources. It centralizes identity, policy, inventory, session visibility, runtime configuration, audit, and sometimes cost.

A meta-harness sits above harnesses as execution engines. It governs the path from human intent to verified outcome.

That path includes:

task intake
risk classification
context compilation
harness selection
worktree and sandbox setup
policy decision
approval flow
execution trace
test and browser evidence
diff review
acceptance decision
rollback path
memory update
workflow evolution

An agent control plane can implement several of those components. It often should. But when people collapse the meta-harness into the control plane, they make the system look safer than it is.

Session visibility is not acceptance.

Tool allowlisting is not correctness.

Audit logging is not rollback.

Cost tracking is not governance.

Centralized policy is not a task contract.

Where Omnigent fits

Databricks made "meta-harness" visible with Omnigent, a system for combining, controlling, and sharing agents across tools such as Claude Code, Codex, Pi, and custom agents. That is important because it names a layer above individual agent harnesses instead of pretending one agent runtime will own the whole future.

The useful read is this: Omnigent is productizing part of the layer above agent harnesses. It gives teams a shared way to use, control, collaborate around, and potentially run different agents.

That does not mean every meta-harness is Omnigent, or that every control plane becomes a meta-harness because it has agent sessions. The category boundary is sharper:

If the system mainly centralizes identity, sessions, policies, audit, runtime registration, and MCP access, it is acting as a control plane.
If the system governs the full work path from task to verified evidence and recovery, it is acting as an operational meta-harness.

A mature product may do both. The architecture still has two responsibilities.

The control plane manages the governed resources. The meta-harness governs the work those resources perform.

The task contract is the missing object

Most agent governance products talk about users, agents, sessions, tools, policies, and logs.

They talk less about the task contract.

That is a problem because a coding agent does not fail only by violating policy. It fails by misunderstanding the job.

A task contract should define:

the target repo and branch
the allowed file area
the forbidden file area
the intended behavior change
the non-goals
the validation commands
the expected browser flow, if UI is involved
the evidence required before completion
the escalation triggers
the rollback condition

Without that contract, the control plane can show you a clean session that did the wrong work.

Useful coding-agent systems are not only observable. They are executable, verifiable, and stateful. Recent research on code as agent harness describes repository-level work as a managed development loop that controls repository access, edits, command execution, approval boundaries, context isolation, logging, and validation. That is closer to the operational frame. The repo is not a prompt accessory. It is the work substrate.

MCP governance is not enough

MCP makes this more urgent.

An MCP server gives an agent a tool surface. That tool surface may read email, query a database, inspect a browser, create a ticket, mutate cloud state, search a repo, or call an internal service. The governance problem is not "does this agent have an MCP server?" The problem is whether this MCP call is appropriate for this task, at this point in the workflow, under this approval state, with this evidence requirement.

An MCP allowlist is a control-plane primitive.

It tells the system which servers or tools are allowed in general.

It does not answer whether the current call should happen now.

For that, the meta-harness needs task-aware policy:

Is the call inside the task scope?
Is it read-only or mutating?
Does it expose secrets or personal data?
Is the result trusted context or untrusted input?
Does a human need to approve it?
Does the trace preserve enough evidence?
Does the output need validation before it enters memory or code?

The difference matters because prompt injection and poisoned tool output do not respect a neat diagram. A server can be allowed and still return hostile data. A tool can be safe in one task and dangerous in another. A read can become a leak. A generated dependency name can become a supply-chain decision.

The control plane says which doors exist and who has keys. The meta-harness decides whether walking through this door is part of the job.

Evidence is the operating boundary

Agent systems keep trying to turn governance into visibility. Visibility helps. It is not the same thing as control.

The operational boundary is evidence.

For software work, evidence is not the final answer. It is the diff, tests, typecheck, lint, screenshot, browser trace, network log, migration check, security scan, benchmark, policy decision, approval record, deployment status, and rollback record.

A control plane can store some of that. A meta-harness requires it.

This is the test:

Can the system reject an agent's success claim even when the session looks clean?

If yes, there is an operational harness above the agent.

If no, there is only session administration and trust in prose.

Coding agents are especially dangerous here because they explain themselves well. A fluent final answer can make a weak operator stop looking. The system should do the opposite. The better the agent speaks, the more the workflow should demand evidence outside the agent's narrative.

The pieces I can show

I do not want this argument to live only as terminology. My public agent tooling maps to separate parts of the boundary, with the limits visible:

Gommage is a deterministic policy layer for coding-agent tool calls.
slod is an append-only trace layer for agent runs.
Nahuali is an auditable memory layer that exposes provenance and trust state.
Greco is an experiment in harness self-improvement under operator-owned evals and budgets.

None of those is a complete meta-harness. They are public fragments: what an agent may do, what it did, what it remembers, and what evidence lets a change become accepted. A control plane can host or observe some of those pieces. It still has to prove the work path, not only the session.

The architecture I would trust in a repo

For a serious repo, the minimum useful stack looks like this:

Layer	Job	Failure caught
Native harness	Tools, files, shell, memory, permissions, local loop	Raw model cannot operate the repo
Sandbox	OS, container, worktree, network, credentials	Agent reaches state it should not touch
Control plane	Identity, inventory, sessions, MCP registry, policies, audit	Organization cannot see or govern agents
Task contract	Scope, non-goals, acceptance, evidence, escalation	Agent does the wrong work cleanly
Policy gateway	Task-aware action decision and approvals	Allowed tool is used in the wrong context
Evidence layer	Diff, logs, tests, browser, traces, signed decisions	Final answer replaces verification
Acceptance gate	Compare result to task contract	Work ships because it sounded done
Evolution loop	Retire stale scaffolding and preserve governance invariants	Old model workarounds become architecture

Some products will combine layers. That is fine. What matters is that the responsibilities exist and do not all collapse into a dashboard.

The native harness should be used first. Codex and Claude Code already have real controls: permissions, sandboxing, hooks, settings, skills, MCP configuration, logs, and project instructions. Use them. But do not confuse native configuration with cross-harness governance, and do not confuse cross-harness governance with proof that the work is correct.

The reference-monitor line

A control you rely on for safety cannot be editable by the thing it watches.

That old reference-monitor rule keeps becoming relevant. If an agent can edit the hook, policy file, approval script, or evaluation suite that decides whether its own work is acceptable, the control is convenience, not enforcement.

This is why the operational meta-harness has to own some things outside the watched agent:

approval records
policy fixtures
validation criteria
signed audit logs
task contracts
rollback rules
eval suites used to admit workflow changes

Native harness controls are still valuable. They catch a lot. But safety-critical governance needs a boundary the agent cannot silently rewrite during the same task.

The public claim worth making

The useful claim is not "control planes are wrong."

They are necessary.

The useful claim is stricter:

The agent control plane governs agents as resources. The operational meta-harness governs agent work as an accountable process.

That boundary explains why the terms keep colliding. Centralized governance is now easy to name. The work path itself is less often named: intake, context, harness selection, execution theater, policy, evidence, acceptance, rollback, and evolution.

That is where most failures happen.

The next wave of agent tooling will sell control planes. Some will be useful. Some will be dashboards over chaos. The buying question should be blunt:

Can this system prove that the right work happened, under the right constraints, with evidence I can replay, and a rollback path I can execute?

If the answer is no, it may still be a good control plane.

It is not a complete operational meta-harness.