Research PaperAgentic Orchestration

Adverant Nexus Stack v4.0: A Unified Agentic Orchestration Architecture for Sovereign, Multi-Tenant, Marketplace-Scale AI

A systems architecture paper proposing Adverant Nexus Stack v4.0: fifteen primitives that close seven unsolved industry gaps in the 2026 agentic AI framework landscape. Builds on the UNO Pipeline Redesign by adding a concrete Tier 4 autonomous state machine, a signed-manifest skill marketplace with SBOM and runtime quality scoring, hooks as first-class orchestrator primitives, a cryptographically isolated Memory Bank, an A2A plus MCP dual protocol plane, deterministic replay plus C2PA chain-of-custody, an airgapped bundle mode for FedRAMP and DoD use, a first-class CLI, native integration of thirteen governance regimes, and a UI/UX bindings layer generalizing production skill bindings into a user-configurable button-to-workflow substrate.

Adverant Research Team2026-04-24124 min read30,845 words

Adverant Nexus Stack v4.0: A Unified Agentic Orchestration Architecture for Sovereign, Multi-Tenant, Marketplace-Scale AI

Authors: [Adverant Research Team] · [Adverant Limited, Dublin, Ireland] Publish date: 2026-04-24 · Target length: ~30 000 words / 15 sections / ≥90 citations Distribution: link-only, not discoverable (same policy as the UNO Pipeline Redesign paper and the Cognitive Memory Architecture paper)

Abstract

Agentic AI platforms in 2026 look very different from those in 2024. Model Context Protocol (MCP) has become the universal tool-use substrate — 97 million monthly SDK downloads by February 2026, donated to the Linux Foundation's Agentic AI Foundation in December 2025 [1][2]. Agent2Agent (A2A) went production-grade at Google Cloud Next 2026 and ships as a first-class primitive in Microsoft Agent Framework 1.0, released 3 April 2026 [3][4]. Graph-shaped orchestration — LangGraph StateGraph, Google ADK, Microsoft Agent Framework workflow graphs, CrewAI Flows, LlamaIndex Workflows — has displaced conversation-graph and role-crew models as the dominant orchestration primitive [5][6][7]. Managed runtimes (LangSmith Deployment, Vertex Agent Runtime, Bedrock AgentCore with Strands) are the commercial wedge, and observability has bifurcated into an OpenTelemetry tier and an LLM-native analytics tier (LangSmith Insights Agent and Polly, AgentCore quality evaluations) [8][9][10].

Beneath this convergence, seven industry gaps remain unsolved: (a) airgapped multi-agent deployment with signed plugin bundles, (b) cryptographic per-tenant isolation across shared observability, (c) plugin marketplaces with provenance and software bill-of-materials (SBOM) and runtime quality scoring, (d) cross-framework agent portability, (e) deterministic replay and chain-of-custody for long-horizon agents, (f) agent-level FinOps governance, and (g) a portable, user-configurable UI-to-workflow binding layer. No framework in the 2026 survey ships turnkey answers to all seven.

This paper presents Adverant Nexus Stack v4.0, a unified agentic orchestration architecture designed to close these gaps while preserving the dispatch-execution separation discipline established in our prior work, the Unified Nexus Orchestrator (UNO) paper [42]. Building on 44 production microservices, 729 deployed skills, four execution tiers, four AI provider adapters, and row-level-security-plus-Istio tenant isolation — all running as of 24 April 2026 — v4.0 adds: (1) a concrete Tier 4 autonomous state machine with human-in-the-loop waitpoints, cost caps, and deterministic replay; (2) a skill marketplace with signed manifests, SBOMs, runtime quality scoring, and semantic versioning with auto-rollback; (3) hooks as first-class orchestrator primitives (PreDispatch, PreToolUse, PostLLMCall, OnCostThreshold, OnHITLPause, OnTierEscalation); (4) a Memory Bank with per-tenant key-encryption-key (KEK) envelope crypto backed by HSM in cloud profiles and TPM in airgapped profiles; (5) a dual A2A-plus-MCP protocol plane; (6) Pydantic-style structured output with self-correction on every skill contract; (7) DSPy-style optimizer-compiled prompts; (8) observability extensions (Insights Agent, Polly-natural-language debugging) on top of our existing twelve-type span tree; (9) pre-dispatch FinOps governance with per-org, per-skill, per-run budgets and circuit-breakers; (10) a sealed airgapped bundle mode for FedRAMP, DoD, CJIS, and IRS Publication 1075 use cases; (11) a first-class CLI (Adverant Nexus CLI 2.0) that is a dispatch, streaming, and Progress Command Center (PCC) mirror client; (12) native integration of EU AI Act, GDPR, SOC 2, ISO 27001, ISO 42001, HIPAA, FedRAMP, NIST AI Risk Management Framework, PIPL, DPDP, LGPD, OWASP LLM Top 10 (2025), MITRE ATLAS, C2PA, and export controls as first-class primitives — not bolted-on middleware; and (13) a UI/UX Bindings layer generalizing our production ros.skill_bindings table into a substrate where users configure what any button in any plugin or marketplace application does — selected skill, tier, provider, model, cost cap, risk tier, data residency, inputs mapping, and hook set — without shipping code.

We present five comparison tables across twelve surveyed frameworks; a current-state map of our v3 stack grounded in file paths and line numbers from the production monorepo; a gap analysis separating industry gaps from v3-internal gaps; a detailed v4.0 proposal in fifteen sub-sections; reference architecture diagrams; fifty use cases; an eighteen-phase migration path extending UNO's nine-phase strangler-fig plan; four deployment profiles; an evaluation methodology; and ten appendices including the full binding schema, compliance-control traceability matrix, and auditor-export payload schema. The paper is internally validated at three gates by Gemini 2.5 Pro [11] (post-outline, post-proposal-core, pre-publication); all three transcripts are archived and published alongside the paper.

1. Introduction

The question motivating this paper is not whether Adverant Nexus needs a next major version. That is answered by three simultaneous forcing functions. The question is what the next version should contain, given a market that consolidated faster than anyone predicted, a regulatory climate that now demands baked-in compliance rather than post-hoc audit, and a production system that has revealed, through fourteen months of operation, which of our assumptions survived contact with real workloads and which did not.

1.1 Three Forcing Functions

Forcing function one: the 2026 market consolidation. Between December 2025 and April 2026, the agentic AI framework landscape underwent a consolidation shock. Anthropic donated Model Context Protocol to the Linux Foundation in December 2025, with Google, Microsoft, Amazon Web Services, Cloudflare, and Bloomberg joining as founding supporters of the new Agentic AI Foundation [1][2]. Google shipped Agent2Agent as production-grade at Cloud Next 2026, with 150-plus organizations in production on A2A workflows [3]. Microsoft merged AutoGen and Semantic Kernel into the unified Microsoft Agent Framework, version 1.0, on 3 April 2026 [4][13]. OpenAI retired Swarm and replaced it with the OpenAI Agents SDK, adding sandboxing and a new model harness [14][15]. Google launched the Gemini Enterprise Agent Platform on 22 April 2026 — two days before this paper is dated — combining the Agent Development Kit (ADK), Agent Studio low-code authoring, Agent Gateway networking, and Memory Bank into a single managed product [16][17][18]. Amazon Web Services expanded Bedrock AgentCore with managed quality evaluations and policy controls [10][19]. The frameworks have converged on three structural axioms: tool use is MCP, agent-to-agent communication is A2A, and orchestration is a directed graph with explicit state, checkpointing, and human-in-the-loop waitpoints.

Forcing function two: seven unsolved industry gaps. Across a systematic survey of twelve frameworks — CrewAI, LangChain and LangGraph, Pydantic AI, Gemini Enterprise Agent Platform, Microsoft Agent Framework, OpenAI Agents SDK, Anthropic Claude Agent SDK, Semantic Kernel (maintenance), LlamaIndex Workflows, DSPy, Haystack (deepset), and Bedrock AgentCore — no framework ships turnkey solutions to: airgapped multi-agent deployment with signed plugin bundles; cryptographic per-tenant isolation across shared observability; plugin marketplaces with provenance, SBOM, and runtime quality scoring; deterministic replay with exactly-once semantics for long-horizon agents; agent-level FinOps; chain-of-custody for AI-generated artefacts; and cross-framework agent portability beyond the A2A wire protocol. These are defensible commercial wedges for a platform that treats them as first-class primitives rather than user-implemented patterns.

Forcing function three: stale points inside the current stack. The UNO paper [42] documents a nine-phase strangler-fig migration through April 2026. Phases 1–6 and 9 are complete; Phases 7 (multi-provider AI routing) and 8 (tool executors) are partial; Phases 10 and beyond were undefined when the UNO paper went to press. Specific stale points: the UNO paper describes graphrag.skill_registry as the router, but the April 2026 Skills Engine consolidation (documented in our internal memory at skills_engine_consolidation_20260416) dropped that table and migrated 92 skills into the unified engine under ros.tool_registry. The UNO paper's Section 14 flags that nexus-mageagent retains a governance bypass path while Section 12.3 describes it as "fully migrated" — an internal contradiction that must be resolved. Tier 4 is defined in the ExecutionTier type in services/nexus-orchestrator/src/types/execution.ts but no concrete human-in-the-loop waitpoint code exists. Chain DAG state lives in Redis with a 24-hour time-to-live, which is inadequate for long-horizon chains. Skill versioning is declared in the SKILL.md front matter but unenforced at dispatch. The current CLI is shallow relative to dispatch. These are fixable in v4.0 without re-opening the dispatch-execution boundary that UNO successfully closed.

1.2 The v4.0 Thesis

Nexus Stack v4.0 is built on one thesis: every dimension along which modern agentic systems now compete — runtime governance, memory, observability, supply chain, deployment sovereignty, cost control, deterministic audit, and portable configuration — should be a primitive of the platform, not a capability an engineer adds per-skill. When governance is a primitive, skills get it by default. When replay is a primitive, every artefact carries chain-of-custody. When bindings are a primitive, business users reconfigure behaviour without shipping code. When airgapped deployment is a primitive, the same codebase serves FedRAMP High customers and public-cloud customers. The operating question for every v4.0 feature is: can this be a first-class primitive of the platform, rather than an exception that each plugin re-implements?

1.3 Contributions

This paper makes the following contributions:

A systematic survey of twelve 2026 agentic frameworks across five comparison dimensions (core abstraction plus orchestration, tool use plus memory plus observability, deployment plus multi-tenancy plus licensing, developer experience, documented weaknesses), yielding a comparison matrix suitable for architectural decision-making in Q2 2026 and beyond.
A gap analysis separating seven industry gaps (unsolved by any surveyed framework) from ten v3-internal gaps (identified through honest retrospective on the UNO migration), prioritised by commercial leverage times implementation cost.
The Adverant Nexus Stack v4.0 architecture, specified in fifteen sub-sections covering principles, execution tiers (including a concrete Tier 4 state machine with human-in-the-loop waitpoints and cost caps), a signed-manifest plus SBOM plus quality-score skill marketplace, hooks as first-class primitives, a cryptographically isolated Memory Bank, an A2A-plus-MCP dual plane, structured output with self-correction, DSPy-style optimizer-compiled prompts, observability extensions, FinOps governance, deterministic replay and chain-of-custody, airgapped bundle mode, a first-class CLI, native governance and compliance integration, and a UI/UX bindings layer.
Native integration of thirteen governance and compliance regimes — EU AI Act, GDPR and UK GDPR plus EU Data Act, SOC 2 Type II, ISO/IEC 27001, ISO/IEC 42001 (AI management systems), HIPAA and HITRUST, FedRAMP Moderate and High plus DoD Impact Level 4 and 5 plus CJIS plus IRS Publication 1075, NIST AI Risk Management Framework 1.0 and NIST AI 600-1, regional privacy laws (PIPL, DPDP, LGPD, PIPEDA, Australian Privacy Act), OWASP LLM Top 10 (2025), MITRE ATLAS, C2PA content provenance, and export controls (EAR, ITAR, EU Regulation 2021/821) — with specific enforcement points at dispatch, execution, storage, and observability layers, and a traceability matrix (Appendix G) mapping each control identifier to the v4.0 primitive that satisfies it.
Adverant Nexus Bindings v2, a first-class substrate generalizing our production ros.skill_bindings table (schema in Appendix J) into a user-configurable UI-to-workflow binding layer with a rich metadata set (skill identifier, tier, provider, model, cost cap, daily cap, token cap, timeout, max iterations, risk tier, data residency, export tags, hook set, allowed and denied tool lists, input schema, inputs mapping, output target, UI placement, shortcut, confirmation level, A/B experiment reference, quality-score threshold) resolved at runtime by a four-level scope hierarchy (user over project over org over system) with priority-based selection, validated by an OPA-based override policy to prevent user overrides from weakening organizational governance.
Reference architecture in fifty diagrams covering v3 current state (service topology, dispatch pipeline, execution tiers, AI provider router, WebSocket event flow, Persistent Chat Context, multi-tenant isolation, CLI, plugin template), v4.0 architecture (topology, dispatch flow, four tiers, marketplace, hooks, memory bank, A2A plus MCP, structured output, optimizer, observability, FinOps, replay, airgap, CLI 2.0, bindings resolution, bindings metadata), user journeys (end-user dispatch, developer publish, admin configuration, auditor export, airgapped install, human-in-the-loop approval, CLI dispatch, binding editor), UI/UX mocks (dashboard, PCC panel, governance tab, marketplace, chain visualizer, span explorer, FinOps dashboard, CLI REPL, binding editor), compliance and security diagrams (EU AI Act enforcement, GDPR erasure, OWASP LLM defense, envelope encryption, three-gate enforcement, OPA evaluation, C2PA provenance, binding override policy), and four deployment profiles (public cloud, single-tenant VDS, on-premise, airgapped).
Fifty use cases specifying trigger, tier, hooks, compliance, and outcome fields across all v4.0 capabilities, including seven bindings-specific cases demonstrating declarative defaults, context-menu bindings, A/B traffic splits, version pins, user-scope overrides, quality auto-deactivation, and marketplace install seeding.
An eighteen-phase migration path extending the UNO nine-phase strangler-fig plan with Phases 10 through 27 for v4.0.
Four deployment profiles — public cloud multi-tenant, single-tenant virtual dedicated server, on-premise Kubernetes, and sealed airgapped bundle — served by the same codebase with manifest deltas documented per profile.
An evaluation methodology across six axes — token efficiency, latency, multi-agent cost, provable tenant-isolation boundaries, replay fidelity, airgapped feature parity — with benchmark designs but without benchmark execution, which is deferred to follow-up work.
Three Gemini 2.5 Pro validation transcripts archived with the paper (Gate A post-outline, Gate B post-Section 7, Gate C pre-publication peer-review simulation), providing an independent adversarial-reviewer voice alongside authorial claims.

1.4 Paper Organization

Section 2 surveys the 2026 framework landscape. Section 3 presents the comparison matrix. Section 4 maps Nexus v3 against the plan as currently running in production. Section 5 is the gap analysis. Section 6 outlines the v4.0 principles. Section 7 is the v4.0 proposal core in fifteen sub-sections. Section 8 presents the reference architecture diagrams. Section 9 catalogues the fifty use cases. Section 10 is the migration path. Section 11 describes the four deployment profiles. Section 12 presents evaluation methodology. Section 13 positions v4.0 against related work beyond the framework survey. Section 14 concludes. Appendices A through J contain detailed schemas and policy starter packs.

2. Background: The 2026 Agentic Framework Landscape

We surveyed twelve frameworks as of 24 April 2026. Each sub-section below is a compact profile; Section 3 renders the comparison across dimensions.

2.1 CrewAI

CrewAI pairs a role-and-goal-and-backstory agent DSL (Crews) with an event-driven graph orchestration layer (Flows), giving it the most human-readable agent-definition syntax among surveyed frameworks [20]. Crews' strengths — approachability, hundreds of built-in tools, native MCP — coexist with documented weaknesses: coordination-via-natural-language wastes tokens, there is no built-in checkpointing, and observability lags LangSmith [21][22]. CrewAI AMP is the commercial SaaS layer with organizational scoping and RBAC.

2.2 LangChain and LangGraph

LangGraph is the StateGraph-based orchestration engine that pioneered graph-shaped agent orchestration in 2024 [23]. LangSmith, the observability and evaluation platform, has grown to process more than fifteen billion traces and one hundred trillion tokens as of 2026 [8][24]. The 2026 additions — Insights Agent for automatic trace clustering and Polly for natural-language debugging — are the two most distinctive observability primitives shipped by any framework. Weaknesses: a steep learning curve that requires state-machine fluency, and lock-in risk around LangSmith's commercial deployment product.

2.3 Pydantic AI

Pydantic AI applies Pydantic's type-validation ethos to agent construction [25]. Every tool decorator produces a JSON schema automatically; every agent output is validated against a Pydantic model, with automatic retry on validation failure — the "structured output with self-correction" pattern. Pydantic AI Harness, released 2026, adds durability across failures. Observability flows through Logfire, Pydantic's telemetry product. Weaknesses: Python-only, fewer built-in multi-agent patterns, younger ecosystem.

2.4 Gemini Enterprise Agent Platform

Google's 22 April 2026 launch unifies the Agent Development Kit, Agent Studio natural-language low-code authoring, Agent Gateway (an agent-network layer with governance policies), Memory Bank (persistent cross-session memory), and Vertex AI Gen AI Evaluation into a single managed product [16][17][18]. Agent Runtime provides sub-second cold starts, A2A is native, and governance flows through GCP IAM, VPC Service Controls, Cloud Audit Logs, and Customer-Managed Encryption Keys. Weaknesses: GCP lock-in, rebrand churn, enterprise pricing opacity.

2.5 Microsoft Agent Framework

Version 1.0 shipped 3 April 2026, merging AutoGen and Semantic Kernel into a single framework with six orchestration patterns — sequential, concurrent, handoff, group chat, Magentic-One, and graph [4][13][26]. Declarative YAML agents are first-class, DevUI is the built-in operator interface, A2A is native, MCP is native, and deployment flows through Azure AI Foundry. Weaknesses: AutoGen users must migrate, .NET-first documentation bias.

2.6 OpenAI Agents SDK

The successor to Swarm [14], the OpenAI Agents SDK adds sandboxing, a new model harness, and formalizes handoffs and guardrails as core primitives [15][27]. Handoff is arguably the cleanest multi-agent abstraction shipped anywhere — an agent delegates to another agent, returning the full context. The sandboxed long-horizon harness enables safe long-running execution with provider-agnostic backends. Weaknesses: a small primitive set (teams outgrow it for complex graphs), sandbox harness Python-first.

2.7 Anthropic MCP and Claude Agent SDK

Anthropic introduced Model Context Protocol in late 2024 [28] and donated it to the Linux Foundation's Agentic AI Foundation in December 2025 [2]. The Claude Agent SDK ships main-agent-plus-subagents hierarchical delegation and lifecycle hooks (PreToolUse, PostToolUse, Stop, SessionStart) as core primitives [29][30][31]. Hooks are the single most powerful behaviour-modification primitive in any framework — they enable gates, policy enforcement, and circuit-breakers as first-class callbacks. Weaknesses: the SDK is opinionated toward coding tasks, and model-decided subagent routing is hard to test deterministically.

2.8 Semantic Kernel (maintenance)

Semantic Kernel pioneered the Planner-driven-plugin-composition abstraction [32]. In 2026 it entered maintenance mode, with new features migrating to Microsoft Agent Framework [33]. The Planner remains the strongest "given a goal and a plugin catalogue, produce a plan" abstraction even as SK itself stops adding capabilities.

2.9 LlamaIndex Workflows

LlamaIndex pivoted from a RAG-centric framework to a workflow-centric agent framework with Workflows 1.0, announced 2026 [34][35]. The @step decorator is the cleanest primitive for expressing loops, parallel branches, and human-in-the-loop waitpoints in a single file. LlamaCloud provides the managed observability and deployment layer. Weaknesses: document-centric framing makes pure-agent use cases feel tacked on.

2.10 DSPy

DSPy is the Stanford-originated compiled-prompt-optimization framework [36][37][38]. Its Signatures-and-Modules-and-Optimizers model turns prompts into compilable artefacts: MIPROv2 and GEPA optimizers re-compile prompts and few-shot examples against a metric function. DSPy is the only surveyed framework that actually optimizes prompts rather than asking humans to tune them. Weaknesses: requires labelled data for optimizers, debugging compiled prompts is opaque.

2.11 Haystack

Haystack (deepset) provides a modular pipeline with explicit retrieval, routing, memory, and generation seams [39][40]. deepset Cloud and deepset Enterprise are the commercial layers. MCP integrations arrived in 2025. Weaknesses: smaller enterprise footprint than LangChain.

2.12 Bedrock AgentCore

Amazon Web Services' AgentCore [19][41] ships a managed agent runtime with session isolation as a first-class runtime primitive, the Strands harness for code-defined agents, AgentCore Memory (managed long-term memory), and — added in 2026 — quality evaluations and policy controls [10]. AgentCore skills for Claude Code and Kiro were released in early 2026; the claim "three API calls to a working agent" is the fastest time-to-working-agent of any platform surveyed. Weaknesses: AWS lock-in, preview features churn.

3. Comparison Matrix

We present five comparison tables across the twelve frameworks, covering core abstraction and orchestration model (Table 1), tool use plus memory plus observability (Table 2), deployment plus multi-tenancy plus streaming plus enterprise features plus licensing (Table 3), developer experience (Table 4), and documented weaknesses (Table 5).

Table 1 — Core abstraction and orchestration model

#	Framework	Core abstraction	Orchestration model	Primary language(s)
1	CrewAI	Crew + Flow	Role-based (Crews) over event-driven graph (Flows)	Python
2	LangChain + LangGraph	StateGraph	Graph, durable, human-in-the-loop	Python + TypeScript
3	Pydantic AI	Agent + Capabilities	Type-checked function-calling, composable capabilities	Python
4	Gemini Enterprise Agent Platform	ADK Agent + Agent Studio	Graph-based ADK, Agent Gateway networking, A2A	Python, Go, Java, TypeScript
5	MS Agent Framework (AutoGen + SK merged)	Agent + Workflow	Sequential, concurrent, handoff, group chat, Magentic-One, graph	.NET + Python
6	OpenAI Agents SDK	Agent + Handoff + Guardrail	Lightweight handoff graph, sandboxed long-horizon harness	Python + TypeScript
7	Anthropic MCP + Claude Agent SDK	Main agent + Subagents + Hooks	Hierarchical delegation via subagents, lifecycle hooks	Python + TypeScript
8	Semantic Kernel (maintenance)	Kernel + Plugin + Planner	Planner-driven sequential or parallel	.NET + Python
9	LlamaIndex Workflows / AgentWorkflow	Workflow step + Event	Event-driven steps, loops, parallel paths	Python
10	DSPy	Signature + Module + Optimizer	Compiled program, optimizer re-compiles prompts	Python
11	Haystack (deepset)	Pipeline + Component + Agent	Modular pipeline with retrieval, routing, memory	Python
12	AWS Bedrock AgentCore	Agent + Strands harness	Managed harness, session-isolated runtime	Python + TypeScript

Table 2 — Tool use, memory, observability

#	Framework	Tool use	Memory / state	Observability
1	CrewAI	100s of built-in + MCP	Shared short/long-term, entity, contextual	Built-in events
2	LangGraph	Function-calling + MCP + custom	Persistent state per node + checkpointer	LangSmith (15B+ traces, 100T tokens, Insights Agent, Polly)
3	Pydantic AI	Typed tool decorator + JSON-schema auto-gen + MCP	Durable via Pydantic AI Harness	Logfire
4	Gemini Enterprise	MCP + native GCP tools	Memory Bank (persistent cross-session)	Vertex AI Gen AI Evaluation + Cloud Trace
5	MS Agent Framework	MCP native + A2A	Session state + checkpointing + pause/resume	DevUI + OpenTelemetry + Azure Monitor
6	OpenAI Agents SDK	Tools + MCP + sandbox workspaces	Resumable sandbox sessions	Tracing dashboard + pluggable exporters
7	Claude Agent SDK	10+ built-in + MCP (75+ connectors)	Subagent context isolation; hooks persist	Hooks: PreToolUse / PostToolUse / Stop / SessionStart
8	Semantic Kernel	Plugins + OpenAPI + MCP (2025)	Session memory + Kernel.Memory	OpenTelemetry
9	LlamaIndex	Function-calling + MCP + LlamaHub	Workflow context + vector + document stores	Instrumentation API + LlamaCloud
10	DSPy	ReAct + ProgramOfThought modules	Compiled program holds optimized prompts	MLflow + DSPy inspect
11	Haystack	Tool components + MCP integrations	Component-level state + ChatMessage stores	deepset Cloud
12	Bedrock AgentCore	Strands + MCP + AWS service actions	AgentCore Memory (managed) + session isolation	CloudWatch + AgentCore quality evals + policy controls

Table 3 — Deployment, multi-tenancy, streaming, enterprise, licensing

#	Framework	Deployment	Multi-tenancy	Streaming	Enterprise (RBAC/SSO/Audit)	License
1	CrewAI	Self-host OSS + CrewAI AMP	Via AMP org scoping	SSE	AMP: RBAC, SSO, audit	MIT + commercial
2	LangGraph	Self-host + LangGraph Platform + LangSmith Deployment	Workspace-level	WS + SSE	LangSmith: SSO, RBAC, SOC2, SCIM	MIT + commercial
3	Pydantic AI	Self-host library	Caller-owned	Streaming	— (library)	MIT
4	Gemini Enterprise	GCP-only Agent Runtime	Full IAM isolation	Native + A2A	GCP IAM, VPC-SC, audit, CMEK	Commercial
5	MS Agent Framework	Self-host + Azure AI Foundry	Azure tenant	Streaming all patterns	Azure AD, RBAC, Purview	MIT + commercial
6	OpenAI Agents SDK	Self-host + OpenAI platform	Caller-owned	Streaming handoffs	OpenAI org management	MIT
7	Claude Agent SDK	Self-host (Anthropic API or Bedrock or Vertex)	Caller-owned	Streaming + hooks	Via provider	MIT
8	Semantic Kernel	Self-host	Caller-owned	Streaming	Azure integrations	MIT
9	LlamaIndex	Self-host + LlamaCloud	LlamaCloud projects	Streaming	LlamaCloud SOC2, RBAC	MIT + commercial
10	DSPy	Self-host library	Caller-owned	Via underlying LM	—	MIT
11	Haystack	Self-host + deepset Cloud + Enterprise	deepset Enterprise	Streaming	SSO, RBAC, audit	Apache 2.0 + commercial
12	Bedrock AgentCore	AWS-only (VPC, PrivateLink)	Full AWS-account + session	Streaming	IAM, KMS, CloudTrail	Commercial

Table 4 — Developer experience and CLI

#	Framework	CLI / scaffold	IDE integration	Notable DX feature
1	CrewAI	`crewai` CLI	—	Role/goal/backstory DSL
2	LangGraph	`langgraph` CLI + Studio	LangSmith CLI, Polly NL debugger	Graph Studio visual editor
3	Pydantic AI	Standard Python	Full type-check + autocomplete	Structured output + self-correction
4	Gemini Enterprise	`gcloud agents` + Agent Studio	VS Code, JetBrains	Natural-language agent authoring
5	MS Agent Framework	`agent-framework` CLI + DevUI	VS Code	YAML declarative agents
6	OpenAI Agents SDK	`openai-agents` CLI + sandbox mgr	—	Handoff primitive + subagents + code mode
7	Claude Agent SDK	`claude` Code CLI	IDE via Claude Code	Subagents + hooks + MCP
8	Semantic Kernel	`sk` CLI	Visual Studio, Rider	Planner-driven plugins
9	LlamaIndex	`llamactl`	—	One-command document agent templates
10	DSPy	`dspy` CLI	—	Compile + optimize programs
11	Haystack	`haystack` CLI	deepset Studio	Pipeline visual editor
12	Bedrock AgentCore	AgentCore CLI + AgentCore skills for Claude Code/Kiro	Any	Three API calls to working agent

Table 5 — Documented weaknesses (2026)

#	Framework	Weaknesses
1	CrewAI	No checkpointing; NL coordination wastes tokens; coarse error handling; weaker monitoring than LangSmith
2	LangGraph	Steep learning curve; state-machine fluency required; LangSmith lock-in risk
3	Pydantic AI	Python-only; fewer multi-agent patterns; younger ecosystem
4	Gemini Enterprise	GCP lock-in; rebrand churn; pricing opacity
5	MS Agent Framework	AutoGen users must migrate; .NET-first docs bias
6	OpenAI Agents SDK	Small primitive set; sandbox harness Python-first
7	Claude Agent SDK	Opinionated toward coding; model-decided routing is nondeterministic
8	Semantic Kernel	Maintenance mode; new features go to MS Agent Framework
9	LlamaIndex	Document-centric framing makes pure-agent use feel tacked on
10	DSPy	Requires labelled data; compiled prompts opaque to debug
11	Haystack	Smaller enterprise footprint than LangChain
12	AgentCore	AWS lock-in; preview features churn; docs assume AWS fluency

3.1 Ten Emerging Patterns

Pattern 1 — MCP is the universal tool-use substrate. 97 million monthly downloads by February 2026, donated to Linux Foundation AAIF December 2025. Every framework ships native MCP or an integration layer [1][2].

Pattern 2 — A2A is the emerging agent-to-agent layer. "MCP vertical, A2A horizontal" is the industry consensus. Expect critical mass by late 2027.

Pattern 3 — Orchestration is graph-shaped. Directed graphs with explicit state, checkpointing, and HITL waitpoints. Chat (AutoGen GroupChat) and role-crew (early CrewAI) are now special cases inside graph runtimes.

Pattern 4 — Managed runtimes are the 2026 commercial wedge. LangSmith Deployment, Vertex Agent Runtime, AgentCore plus Strands, OpenAI sandbox harness. Exactly-once, pause-resume, horizontal scaling.

Pattern 5 — Typed structured output is table-stakes. Pydantic AI, OpenAI Guardrails, Claude output schemas, MS Agent Framework YAML, DSPy Signatures all converge.

Pattern 6 — Two-tier observability market. Tier 1 OpenTelemetry-compatible tracing. Tier 2 LLM-native analytics (Insights Agent plus Polly, Logfire, AgentCore quality evals).

Pattern 7 — Long-term memory is a first-class service. Gemini Memory Bank, AgentCore Memory, LangGraph long-term store, LlamaIndex persistent context.

Pattern 8 — Single-agent-as-tool beats multi-agent-chat. OpenAI handoffs, Claude subagents, MS handoff orchestration — token cost and determinism drive this shift.

Pattern 9 — CLI-first DX is a battleground. Time-from-zero-to-working-agent is a marketing metric.

Pattern 10 — Legacy convergence. AutoGen plus SK merged into MS Agent Framework; Swarm replaced by Agents SDK; Vertex became Gemini Enterprise Agent Platform.

3.2 Comprehensive Feature Catalog Matrix — All Competitors vs Adverant Nexus v4.0

This section is the exhaustive comparison the prior tables (1–5) compressed. Where Tables 1–5 slice by dimension, Tables 6–20 catalog every distinguishing feature we identified across the twelve frameworks, then score each framework against it — plus a final column for how Adverant Nexus v4.0 implements the same capability.

Frameworks compared (column headers, left to right): CrewAI · LangGraph (LangChain) · Pydantic AI · Gemini Enterprise Agent Platform · MS Agent Framework · OpenAI Agents SDK · Claude Agent SDK · Semantic Kernel · LlamaIndex Workflows · DSPy · Haystack (deepset) · Bedrock AgentCore · Adverant Nexus v4.0 (last column, bold).

Cell legend: ✓ = native/first-class · ◐ = partial, preview, or via plugin · ◦ = possible via third-party/custom · ✗ = not supported · — = not applicable · Numbers like "§7.4" in the Nx column point to the v4.0 sub-section that implements the feature. The data reflects public information as of 2026-04-24.

Table 6 — Orchestration Primitives

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
6.1	Graph-based state machine	◐	✓	◦	✓	✓	◐	◦	◦	✓	✗	◐	◐	✓ §7.2
6.2	Sequential workflow pattern	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.2
6.3	Concurrent / parallel fan-out	✓	✓	◐	✓	✓	✗	✓	◐	✓	✗	✓	✓	✓ §7.2
6.4	Conditional branching	◐	✓	◐	✓	✓	✓	✓	◐	✓	◐	✓	✓	✓ §7.2
6.5	Human-in-the-loop waitpoint	✗	✓	◐	✓	✓	◐	◐	✗	✓	✗	◐	◐	✓ §7.2 Tier 4
6.6	Checkpointing / pause-resume	✗	✓	✓	✓	✓	✓	◐	◐	✓	✗	◐	✓	✓ §7.2 + §7.11
6.7	Durable execution (exactly-once)	✗	✓	✓	✓	✓	✓	✗	✗	◐	✗	◐	✓	✓ §7.11
6.8	Dynamic DAG modification	✗	✓	◐	✓	✓	✗	✗	✗	✓	✗	◐	✗	✓ §7.2
6.9	Loops / recursion	◐	✓	✓	✓	✓	✓	✓	◐	✓	✓	✓	✓	✓ §7.2
6.10	Batch dispatch / fan-in aggregation	◐	✓	✗	✓	✓	✗	✗	✗	◐	✗	◐	◐	✓ §7.2 (UNO batch)
6.11	Named queues with priority	✗	◐	✗	✓	✓	✗	✗	✗	✗	✗	◐	✓	✓ §10 Phase 25
6.12	Tier-based execution taxonomy	✗	◐	✗	◐	✓	◐	◐	✗	✗	✗	✗	◐	✓ §7.2 (4 tiers)
6.13	Role-based agent DSL	✓	✗	✗	◐	✓	✗	✗	✗	✗	✗	✗	◐	◐ §10 (absorb CrewAI)
6.14	Workflow-as-code	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.15 (bindings alt)
6.15	Workflow-as-YAML / declarative	✗	◐	✗	◐	✓	✗	✓	✗	✗	✗	✓	◐	✓ §7.15 (actions[])
6.16	Autonomous goal decomposition	◐	◐	✗	✓	✓	✓	✓	✓	◐	✗	✗	✓	✓ §7.2 Tier 4
6.17	Multi-agent handoff primitive	◐	◐	✗	✓	✓	✓	✓	✗	◐	✗	✗	✓	✓ §7.2 + §7.6
6.18	Group-chat / GroupChat pattern	✗	◐	✗	◐	✓	✗	✗	✗	✗	✗	✗	✗	◐ (absorb if needed)
6.19	Magentic-One orchestrator pattern	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗
6.20	Competition / consensus ensemble	✓	◐	✗	✗	✓	✗	✗	✗	◐	✓	✗	◐	✓ §7.2 Tier 4

Table 7 — Tool Use (Agent → Tool Plane)

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
7.1	Native MCP support (as of 2026)	✓	✓	✓	✓	✓	✓	✓	✓	✓	◐	✓	✓	✓ §7.6
7.2	Built-in tool library size	100+	200+	0 (decorate)	GCP stack	50+	20+	10 + 75 MCP	Plugins	LlamaHub	ReAct/POT	Components	30+ AWS	729+ skills
7.3	Function-calling abstraction	✓	✓	✓	✓	✓	✓	✓	✓	✓	◐	✓	✓	✓ §7.6
7.4	JSON Schema auto-gen from types	◐	✓	✓	✓	✓	✓	✓	✓	✓	◐	◐	✓	✓ §7.7
7.5	Tool allowlist / scope per role	◐	◐	✗	✓	◐	◐	✓ (hooks)	✗	✗	✗	✗	✓	✓ §7.4 + §7.15
7.6	Tool call sandboxing	✗	◐	✗	✓	✓	✓ (sandbox)	◐	✗	✗	✗	✗	✓	✓ §7.12 airgap; §10 Phase 8
7.7	Code execution tool (REPL/Python)	✓	✓	◐	✓	✓	✓	✓	◐	✓	◐	✓	✓	◐ §10 Phase 8
7.8	Filesystem tool with path-allowlist	◐	◐	✗	◐	◐	◐	✓ (hooks)	✗	◐	✗	✗	✓	✓ §7.4 hooks
7.9	Shell / kubectl tool with policy	✗	◐	✗	◐	◐	◐	◐	✗	✗	✗	✗	◐	✓ §8 C8 + §9 #8
7.10	Tool call retry on failure	◐	✓	✓	✓	✓	✓	◐	◐	✓	◐	✓	✓	✓ §7.4 hooks
7.11	Tool cost attribution	✗	✓	◐	✓	◐	◐	◐	✗	◐	✗	◐	✓	✓ §7.10
7.12	Tool output schema validation	◐	◐	✓	✓	✓	✓	◐	◐	◐	◐	◐	◐	✓ §7.7

Table 8 — Agent-to-Agent Communication

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
8.1	A2A protocol (Google standard)	✗	◐	✗	✓	✓	✗	✗	✗	✗	✗	✗	◐	✓ §7.6 (target)
8.2	Subagents / hierarchical delegation	◐	◐	◐	✓	✓	✓	✓	✗	◐	✗	✗	✓	✓ §7.2 Tier 4
8.3	Agent handoff with context	◐	✓	✗	✓	✓	✓	✓	✗	◐	✗	✗	✓	✓ §7.2 + §7.6
8.4	Inter-agent message signing	✗	✗	✗	✓	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.11 + Appendix E
8.5	Cross-org / cross-tenant A2A	✗	✗	✗	✓	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.6 (via SPIFFE)
8.6	Agent discovery / registry	✗	◐	✗	✓ (Agent Gateway)	✓	✗	✗	✗	✗	✗	✗	✓	✓ §7.6 + nexus-auth
8.7	Local-only A2A (airgap)	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.12

Table 9 — Memory and State

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
9.1	Short-term (conversation) memory	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.5
9.2	Long-term (cross-session) memory	◐	✓	◐	✓ (Memory Bank)	✓	◐	◐	◐	✓	✗	◐	✓ (AgentCore Memory)	✓ §7.5 Memory Bank
9.3	Entity / semantic memory	✓	✓	✗	✓	◐	✗	✗	◐	✓	✗	◐	◐	✓ §7.5 (GraphRAG integration)
9.4	Per-tenant memory isolation	✗	◐ (workspace)	✗	✓ (IAM)	✓ (tenant)	✗	✗	✗	◐ (LlamaCloud)	✗	✗	✓ (session)	✓ §7.5 cryptographic
9.5	Cryptographic per-tenant KEK	✗	✗	✗	◐ (CMEK)	◐	✗	✗	✗	✗	✗	✗	◐ (KMS)	✓ §7.5 (envelope + HSM/TPM)
9.6	Crypto-erasure (DEK destruction)	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.5 + §7.14 GDPR
9.7	Memory checkpointer	✗	✓	✓ (Harness)	✓	✓	◐	✗	✗	✓	✗	✗	✓	✓ §7.5
9.8	Vector store integration	◐	✓	◐	✓	◐	◐	✗	✓	✓	✗	✓	◐	✓ Qdrant (v3 retained)
9.9	Knowledge-graph memory	✗	✓	✗	◐	◐	✗	✗	◐	✓	✗	◐	✗	✓ Neo4j (v3 retained)
9.10	Managed memory service	✗	✓ (LangSmith)	✓ (Logfire)	✓	✓ (Azure)	◐	✗	✗	✓ (LlamaCloud)	✗	✓ (deepset)	✓	✓ §7.5

Table 10 — Observability

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
10.1	OpenTelemetry emission	◐	✓	◐	✓	✓	✓	◐	✓	✓	◐	✓	✓	✓ (retained v3)
10.2	LLM-call level tracing	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.9 span tree
10.3	Tool-call level tracing	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.9
10.4	Hierarchical parent-child spans	◐	✓	✓	✓	✓	✓	◐	◐	✓	◐	✓	✓	✓ §7.9 12-type enum
10.5	Closed span-type taxonomy	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.9 (12 types, UNO)
10.6	Automatic trace clustering	✗	✓ (Insights Agent)	✗	✓	◐	✗	✗	✗	◐	✗	✗	✓ (quality evals)	✓ §7.9
10.7	Natural-language trace query	✗	✓ (Polly)	✗	◐	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.9 Polly-NL
10.8	Quality evaluation harness	✗	✓ (LangSmith evals)	◐ (Logfire)	✓ (Vertex Eval)	✓	✓	✗	◐	✓	✓ (DSPy metric)	◐ (deepset evals)	✓ (AgentCore evals)	✓ §7.8 + §7.9
10.9	Anomaly / regression detection	✗	✓	◐	✓	◐	✗	✗	✗	◐	✗	✗	✓	✓ §7.9 Insights
10.10	Cost hotspot analysis	✗	✓	◐	✓	◐	◐	✗	✗	◐	✗	✗	✓	✓ §7.9 + §7.10
10.11	Storage tiering (hot/cold)	✗	◐	✗	✓	◐	✗	✗	✗	◐	✗	◐	✓	✓ §7.9 (PG + ClickHouse)
10.12	No-sampling (full record)	✗	◐	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.9 (EU AI Act Art. 12)

Table 11 — Deployment Targets

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
11.1	Self-hosted OSS	✓	✓	✓	✗	✓	✓	✓	✓	✓	✓	✓	✗	✓ (all profiles)
11.2	Managed cloud runtime	✓ (AMP)	✓ (LangSmith)	✗	✓ (Vertex)	✓ (Azure)	✓ (OpenAI)	✗	✗	✓ (LlamaCloud)	✗	✓ (deepset)	✓ (AWS)	✓ §11 public-cloud
11.3	Single-tenant VDS	◐	◐	✓	✗	◐	◐	✓	✓	◐	✓	◐	✗	✓ §11 VDS
11.4	On-premise Kubernetes	◐	◐	✓	✗	◐	✓	✓	✓	◐	✓	◐	✗	✓ §11 on-prem
11.5	Airgapped sealed bundle	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.12 + §11 airgap
11.6	Sub-second cold start	✗	◐	✓	✓ (Agent Runtime)	◐	◐	✓	✓	◐	✓	◐	✓ (AgentCore)	✓ §7.13 + UNO dispatch
11.7	Horizontal scaling (stateless dispatch)	✗	✓	✗	✓	✓	✓	✗	✗	◐	✗	◐	✓	✓ §4.1 (UNO retained)
11.8	GPU scheduling (own hw)	✗	◐	✗	✓	◐	✗	✗	✗	◐	✗	✗	✓	✓ §7.12 airgap + §4 gpu-queue
11.9	BYO-LLM endpoints	◐	✓	✓	◐	✓	✗	✓	✓	✓	✓	✓	◐	✓ (4 adapters + BYO)
11.10	FIPS 140-3 crypto modules	✗	◐	✗	✓	✓	✗	✗	✗	✗	✗	✗	✓	✓ §7.12 + §7.14 FedRAMP
11.11	STIG-compliant base images	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.12 + §7.14 DoD IL5
11.12	Monthly delta update bundle	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.12

Table 12 — Multi-Tenancy and Security

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
12.1	Organization / workspace scoping	◐ (AMP)	✓	✗	✓ (IAM)	✓	✓	◦	✗	✓ (LlamaCloud)	✗	✓ (Enterprise)	✓ (AWS acct)	✓ §4.5 (retained)
12.2	Row-level security (database)	✗	✗	✗	◐ (BigQuery RLS)	◐	✗	✗	✗	✗	✗	✗	✗	✓ §4.5 Postgres RLS
12.3	Payload/vector filter isolation	✗	◐	✗	◐	◐	✗	✗	✗	◐	✗	◐	◐	✓ §4.5 Qdrant + Neo4j
12.4	Service mesh mTLS (SPIFFE)	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §4.5 Istio retained
12.5	JWT + middleware tenant headers	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §4.5
12.6	SSO (SAML/OIDC)	✓ (AMP)	✓ (LangSmith)	✗	✓	✓	✓	✗	✗	✓ (LlamaCloud)	✗	✓ (Enterprise)	✓ (IAM)	✓ (nexus-auth)
12.7	RBAC with fine-grained permissions	✓ (AMP)	✓ (LangSmith)	✗	✓	✓	◐	✗	◐	✓ (LlamaCloud)	✗	✓	✓	✓ §7.4 + §7.15
12.8	SCIM user provisioning	✗	✓ (LangSmith)	✗	✓	✓	◐	✗	✗	◐	✗	✓	✓	◐ §10 (future)
12.9	Per-tenant encryption keys	✗	✗	✗	✓ (CMEK)	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.5 KEK (HSM/TPM)
12.10	Post-quantum crypto (hybrid)	✗	✗	✗	◐ (roadmap)	◐ (roadmap)	✗	✗	✗	✗	✗	✗	◐ (roadmap)	◐ §7.14 roadmap
12.11	Network policy / AuthorizationPolicy	✗	✗	✗	✓ (VPC-SC)	✓	✗	✗	✗	✗	✗	✗	✓ (PrivateLink)	✓ §4.5 Istio
12.12	OPA/Rego policy engine	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	✓ (AgentCore policy)	✓ §7.14 Appendix H

Table 13 — Governance, Risk, and Compliance

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
13.1	Risk-tier classification (e.g. EU AI Act)	✗	◐	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14 EU AI Act
13.2	Human-oversight gate for high-risk	✗	✓ (HITL)	◐	◐	✓	◐	◐	✗	✓	✗	◐	◐	✓ §7.2 Tier 4
13.3	Data residency enforcement	✗	◐	✗	✓ (regions)	✓ (Purview)	✗	✗	✗	◐	✗	✗	✓ (AWS regions)	✓ §7.14 + §4.1
13.4	GDPR right-to-erasure flow	✗	◐	✗	◐	◐	✗	✗	✗	◐	✗	◐	◐	✓ §7.14 + §9 #11
13.5	DPIA / impact-assessment generator	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.14
13.6	Conformity assessment records	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.14 EU AI Act
13.7	SOC 2 evidence pipeline	✗	✓ (LangSmith attest.)	✗	✓	✓	◐	✗	✗	✓ (LlamaCloud)	✗	✓ (Enterprise)	✓	✓ §7.14 + Appendix G
13.8	ISO 27001 mapping	✗	✓	✗	✓	✓	◐	✗	✗	✓	✗	✓	✓	✓ Appendix G
13.9	ISO 42001 AI management system	✗	✗	✗	✗	◐	✗	✗	✗	✗	✗	✗	✗	✓ §7.14 + §7.3
13.10	HIPAA BAA-aware routing	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14 + §9 #50
13.11	FedRAMP Moderate authorization	✗	◐	✗	✓	✓	◐	✗	✗	✗	✗	✗	✓	✓ §7.14 airgap
13.12	FedRAMP High / DoD IL5	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14 airgap
13.13	NIST AI RMF alignment	✗	◐	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14 + Appendix G
13.14	OWASP LLM Top 10 defenses	◐	◐	◐	◐	◐	◐	◐	✗	◐	✗	◐	◐	✓ §7.14 + §8.E-E3
13.15	MITRE ATLAS threat tagging	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14
13.16	Export-control tags (EAR/ITAR)	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.14
13.17	Auditor-export CLI / package	✗	✓ (audit logs)	✗	✓	✓	◐	✗	✗	◐	✗	◐	✓	✓ §7.13 + Appendix I
13.18	Automatic model card generation	✗	◐	✗	✓	◐	✗	✗	✗	◐	✗	✗	◐	✓ §7.3 + §7.14
13.19	Per-skill threat model declaration	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.3 SKILL.md v2
13.20	Watermarking / C2PA on artefacts	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	✗	✓ §7.11 C2PA v2

Table 14 — Cost and FinOps

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
14.1	Token usage tracking per call	◐	✓	✓ (Logfire)	✓	✓	✓	◐	◐	✓	✓	◐	✓	✓ §7.10
14.2	Cost attribution per trace	✗	✓	◐	✓	◐	◐	✗	✗	◐	✗	✗	✓	✓ §7.10
14.3	Cost attribution per tenant/org	✗	✓	✗	✓	◐	◐	✗	✗	◐	✗	✗	✓	✓ §7.10
14.4	Cost attribution per skill / workflow	✗	◐	✗	◐	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.10
14.5	Cost attribution per user	✗	✓	✗	◐	◐	◐	✗	✗	✗	✗	✗	◐	✓ §7.10
14.6	Pre-dispatch budget reservation	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.10 + §8.B-B14
14.7	Per-skill daily cost cap	✗	◐	✗	◐	✗	✗	✗	✗	✗	✗	✗	◐ (policy)	✓ §7.10 + §7.15
14.8	Per-run hard cost cap	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.2 Tier 4 + §7.10
14.9	Circuit breaker on failure-rate	✗	◐	✗	◐	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.10
14.10	Provider failover on 5xx	✗	✓	✗	◐	◐	◐	✗	✗	◐	✗	✗	◐	◐ §10 Phase 26
14.11	Cache-hit optimization	◐	✓ (sem-cache)	✗	✓	✓	✓	◐	✗	✓	✓	◐	◐	✓ §7.13 + §9 #2
14.12	OnCostThreshold hook/callback	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.4 + §7.10

Table 15 — Skill / Plugin / Marketplace Management

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
15.1	Skill / plugin registry	◐	✓	✗	✓ (Agent Studio)	✓	◐	✓ (MCP dir)	✓	✓	✗	✓	✓ (AgentCore skills)	✓ §7.3 marketplace
15.2	Semantic versioning	◐	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §7.3
15.3	Version pinning at runtime	✗	◐	✓	◐	◐	✗	✗	✗	◐	✗	◐	◐	✓ §7.15 skill_version_pin
15.4	Cryptographic signing (sigstore)	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.3
15.5	SBOM attestation	✗	✗	✗	✗	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.3
15.6	CVE scanning at runtime	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.3
15.7	Runtime quality score	✗	✓ (LangSmith evals)	◐	✓ (Eval)	✓	◐	✗	✗	◐	✓ (metric)	◐	✓ (AgentCore evals)	✓ §7.3
15.8	Auto-rollback on quality drop	✗	◐	✗	◐	✗	✗	✗	✗	✗	✗	✗	◐	✓ §7.3 + §7.15
15.9	A/B experiments on skills	✗	◐	✗	◐	✗	◐	✗	✗	✗	◐	✗	◐	✓ §7.15 ab_experiment
15.10	Adversarial eval suite	✗	◐	✗	◐	◐	◐	✗	✗	◐	◐	✗	◐	✓ §7.3
15.11	Declarative install via manifest	◐	✓	✗	✓	✓	◐	✗	✗	◐	✗	◐	◐	✓ §7.15 nexus.manifest.json
15.12	Per-tenant private marketplace	◐ (AMP)	◐	✗	✓	◐	✗	✗	✗	◐	✗	◐	✓	✓ §7.3 + §11
15.13	Airgapped skill marketplace	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.12
15.14	Skill synthesis / composition	✗	✓	✗	✓	✓	◐	✗	✓ (Planner)	✓	✓	✓	◐	✓ §7.3 (skill-synthesizer)

Table 16 — Prompt and Contract Management

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
16.1	Typed input schema	◐	✓	✓	✓	✓	✓	◐	◐	✓	✓	◐	◐	✓ §7.7
16.2	Typed output schema	◐	✓	✓	✓	✓	✓	◐	◐	◐	✓	◐	◐	✓ §7.7
16.3	Automatic validation + retry	✗	✓	✓	◐	✓	✓ (Guardrails)	◐	✗	◐	◐	✗	◐	✓ §7.7
16.4	Optimizer-compiled prompts	✗	◐	◐	◐	✗	✗	✗	✗	✗	✓ (MIPROv2, GEPA)	✗	◐	✓ §7.8 (absorb DSPy)
16.5	Prompt versioning	◐	✓	✓	✓	✓	◐	◐	✓	✓	✓	◐	◐	✓ §7.3 + §7.8
16.6	Prompt template registry	✓ (Crews)	✓ (LangSmith hub)	✓	✓ (Agent Studio)	✓	◐	✗	✓	✓	✓	✓	◐	✓ §7.3
16.7	Guardrails for prompt injection	◐	✓	◐	✓	✓	✓ (Guardrails)	◐	✗	◐	✗	◐	✓	✓ §7.4 + §8.E-E3
16.8	Few-shot compilation vs metric	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓	✗	✗	✓ §7.8

Table 17 — Streaming, Async, and WebSocket

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
17.1	Server-Sent Events streaming	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §4.4
17.2	WebSocket streaming	◐	✓	◐	✓	✓	◐	✓	◐	✓	◐	◐	✓	✓ §4.4 (Socket.IO)
17.3	Resumable streams	✗	✓	✗	✓	✓	✓	◐	✗	◐	✗	◐	✓	✓ §4.4 ring buffer
17.4	Progress events (structured)	◐	✓	◐	✓	✓	◐	✓ (hooks)	◐	✓	◐	◐	✓	✓ §4.4 (17 event types)
17.5	Per-tenant WS channel isolation	✗	✓	✗	✓	✓	✗	✗	✗	◐	✗	✗	✓	✓ §4.4 (org:plugin rooms)
17.6	Pub/Sub fan-out	✗	✓	✗	✓	✓	✗	✗	✗	◐	✗	✗	✓	✓ §4.4 Redis Pub/Sub
17.7	Backpressure signalling	✗	◐	✗	✓	✓	◐	✗	✗	◐	✗	✗	✓	✓ (BullMQ queue)

Table 18 — Developer Experience

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
18.1	First-class CLI	✓	✓	◐	✓ (gcloud)	✓	✓	✓ (claude)	✓	✓ (llamactl)	✓	✓	✓	✓ §7.13 CLI 2.0
18.2	Visual graph / workflow editor	✗	✓ (Studio)	✗	✓ (Agent Studio)	✓ (DevUI)	✗	✗	✗	◐	✗	✓ (deepset)	◐	◐ §7.13 (chain visualize)
18.3	NL agent authoring	✗	✗	✗	✓	◐	✗	✗	✗	✗	✗	◐	◐	◐ §7.13 (future)
18.4	REPL / interactive shell	◐	◐	✓	◐	✓	◐	✓	◐	◐	✓	◐	◐	✓ §7.13 `nexus shell`
18.5	Type safety (end-to-end)	◐	✓ (TS)	✓ (Py types)	✓	✓ (.NET)	✓ (TS)	✓ (TS)	✓ (.NET)	◐	✗	◐	◐	✓ §7.7
18.6	Hot-reload during dev	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
18.7	IDE extension (VS Code etc.)	✗	✓	✓	✓	✓	◐	✓ (Claude Code)	✓ (Rider/VS)	◐	◐	✓	✓	◐ (future)
18.8	Debugger with span inspection	✗	✓ (Polly)	✓ (Logfire)	✓	✓ (DevUI)	✓	◐	✓	◐	✓ (inspect)	◐	✓	✓ §7.9 span tree
18.9	Time-from-zero-to-agent metric	~hours	~day	~min	~hours	~hours	~min	~min	~hours	~min	~hours	~hours	~3 API calls	~min §7.13
18.10	Session save/resume	✗	✓	✓	✓	✓	✓	✓	✗	◐	✗	✗	✓	✓ §4.6 (CLI v1+v2)
18.11	Tab completion / IntelliSense	◐	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓ §4.6
18.12	Plugin scaffold generator	✓	✓	◐	✓	✓	◐	✓ (MCP)	✓	✓	✗	✓	✓	✓ plugin-template

Table 19 — Content Provenance, Audit, and Replay

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
19.1	Deterministic replay	✗	✓ (claim)	✓ (Harness)	◐	✓ (claim)	✗	✗	✗	◐	✗	✗	◐	✓ §7.11
19.2	Bit-for-bit replay manifest	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.11 replay_manifest
19.3	Tool-output capture	✗	✓	✓	✓	✓	✓	✗	✗	◐	✗	✗	✓	✓ §7.11
19.4	Model version pinning	✗	✓	✓	✓	✓	✓	◐	✓	✓	✓	◐	✓	✓ §7.11
19.5	Prompt hash capture	✗	✓	✓	◐	◐	◐	✗	✗	◐	✓	✗	◐	✓ §7.11
19.6	C2PA artefact manifest	✗	✗	✗	◐ (research)	◐ (Purview)	✗	✗	✗	✗	✗	✗	✗	✓ §7.11 C2PA v2
19.7	HITL approval signing in chain	✗	◐	✗	◐	◐	✗	✗	✗	◐	✗	✗	✗	✓ §7.11 + §7.2 Tier 4
19.8	Hash chain across spans	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ Appendix B span_hash
19.9	Audit retention ≥ 7 years	✗	◐	✗	✓	✓	◐	✗	✗	◐	✗	◐	✓ (CloudTrail)	✓ §7.14 FedRAMP
19.10	Erasure certificate on delete	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.14 GDPR

Table 20 — UI/UX Bindings and Action Configuration

(This category is where Nexus v4.0 is most differentiated; we include it in the catalog precisely because no surveyed framework treats it as a primitive.)

#	Feature	CrewAI	LangGraph	Pydantic	Gemini	MS Agent	OpenAI	Claude	SemKernel	LlamaIdx	DSPy	Haystack	AgentCore	Nexus 4.0
20.1	User-configurable action → skill mapping	✗	✗	✗	◐ (Agent Studio)	◐ (YAML)	✗	✗	✗	✗	✗	✗	✗	✓ §7.15
20.2	Plugin manifest declarative buttons	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	✗	✓ §7.15 actions[]
20.3	Scope hierarchy (user / project / org / system)	✗	◐ (workspace)	✗	◐ (IAM)	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.15
20.4	Priority-based resolution	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.15
20.5	Per-binding cost cap	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.15 cost_cap_usd
20.6	Per-binding model + tier override	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.15
20.7	Input schema + inputs_mapping template	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.15
20.8	Visual binding editor (non-developer)	✗	✗	✗	✓ (Agent Studio)	◐ (DevUI)	✗	✗	✗	✗	✗	✗	✗	✓ §8.D-D9
20.9	Policy-gated override (OPA)	✗	✗	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.15 + §8.E-E8
20.10	A/B experiment on binding_key	✗	◐	✗	◐	✗	◐	✗	✗	✗	◐	✗	✗	✓ §7.15 + §9 #12
20.11	Binding audit trail	✗	◐	✗	◐	◐	✗	✗	✗	✗	✗	✗	◐	✓ §7.15
20.12	Auto-deactivation on quality drop	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓ §7.15 + §9 #35

3.3 Matrix-Level Observations

Reading Tables 6–20 vertically (as a catalog of Nexus's position per feature), three patterns dominate.

Where Nexus v4.0 is strictly differentiated (columns where no competitor has a native ✓ while Nexus does): the airgapped sealed-bundle mode (11.5, 15.13), pre-dispatch FinOps reserve (14.6) with the full per-skill plus per-run plus per-binding budget matrix (14.7 + 14.8 + OnCostThreshold hook 14.12), C2PA artefact manifests (19.6), bit-for-bit replay manifests (19.2), hash-chain across spans (19.8), erasure certificates on delete (19.10), DPIA auto-generation (13.5), conformity-assessment records (13.6), ISO 42001 management-system primitives (13.9), per-skill threat-model declaration (13.19), no-sampling span recording (10.12), the closed twelve-type span taxonomy (10.5), and the entire UI/UX Bindings row (20.1–20.12 — every cell). Twenty-plus features have no turnkey equivalent in any surveyed framework.

Where Nexus v4.0 is at parity (Nexus ✓, multiple competitors also ✓): MCP (7.1), function-calling (7.3), short-term memory (9.1), tool-call tracing (10.3), SSO (12.6), RBAC (12.7), WS streaming (17.2), SSE (17.1), CLI (18.1). These are table-stakes; we match rather than lead.

Where Nexus v4.0 is currently behind (Nexus ◐ or ✗ while competitors ship ✓): Magentic-One-style orchestrator (6.19 — only MS Agent Framework; we mark ✗ and defer), NL agent authoring (18.3 — Gemini Enterprise leads; we mark ◐), IDE extension (18.7 — LangGraph, Pydantic AI, Claude Code lead; we mark ◐), SCIM provisioning (12.8 — LangSmith, Azure, AWS ship it; we mark ◐). These are the planned follow-ups for v4.1.

The catalog deliberately over-scores the competitors (any partial implementation marked ◐, not ✗) to avoid self-flattering. Even so, on the twenty-odd rows that constitute v4.0's differentiating wedges — airgap, crypto tenant isolation, FinOps primitives, replay plus C2PA, bindings as a first-class substrate — the market shows a consistent gap.

4. Adverant Nexus Stack v3: Current State

This section maps the production Nexus stack as running on 24 April 2026 — v6.2.1, 44 microservices on K3s at the Adverant cloud VPS, backed by PostgreSQL with row-level security, Redis, Neo4j, and Qdrant. Every file path cited resolves in the Adverant-Nexus monorepo. Where the UNO paper describes architecture that has since been revised, we cite the UNO section and the subsequent migration that changed it.

4.1 The Unified Nexus Orchestrator

UNO is the single dispatch entry point. The authoritative route is POST /api/v1/dispatch in services/nexus-orchestrator/src/routes/dispatch-routes.ts. UNO validates the request via Zod schemas (services/nexus-orchestrator/src/types/dispatch.ts), resolves the job type to a skill from ros.tool_registry — not graphrag.skill_registry as described in the UNO paper Section 6, which was dropped in the Q2 2026 Skills Engine consolidation — runs governance pre-checks for risk classification and data residency, inserts a row into orchestrator.runs via services/nexus-orchestrator/src/services/run-tracker.ts, enqueues the job to BullMQ with priority mapping, and returns HTTP 202 Accepted. Chain DAG coordination is handled by services/nexus-orchestrator/src/services/chain-coordinator.ts, which maintains a state machine reacting to step-completion callbacks. Job events flow out via services/nexus-orchestrator/src/services/ws-emitter.ts to Redis Pub/Sub channels named nexus:jobs:org:{orgId}.

The four-tier execution taxonomy is defined in services/nexus-orchestrator/src/types/execution.ts:


TypeScript
1 line
type ExecutionTier = 'llm_only' | 'tool_using' | 'chain' | 'autonomous';

Tier 1 (llm_only) inlines for timeouts under 30 seconds; Tier 2 (tool_using) runs ReAct loops up to execution_config.maxIterations; Tier 3 (chain) runs DAGs via the chain coordinator; Tier 4 (autonomous) is declared in the type but, as discussed in Section 5, has no concrete HITL waitpoint code, no cost cap, and no determinism-replay semantics in v3.

4.2 The Skills Engine

The Adverant-Nexus-Skills-Engine (documented in Adverant-Nexus-Skills-Engine/docs/skill-format.md) is the source of truth for skill metadata. 729+ SKILL.md files across the plugin fleet declare capabilities, tool requirements, triggers, visibility, and status. The engine's LLM client (Adverant-Nexus-Skills-Engine/src/services/llm-client.ts) routes calls to the gateway AI Provider Router, with exponential-backoff retry and SSE streaming for long-running (400-second) operations. Skills are resolved at dispatch time by ros.tool_registry.job_type lookup. A unified skill registry UI does not yet exist; skills are browsable only programmatically.

4.3 The AI Provider Router

The AI Provider Router (services/nexus-gateway/src/services/ai-provider-router.ts) is the single service authorised to call external LLM APIs. The ALLOWED_AI_CALLERS principal list — enforced at three layers (Istio AuthorizationPolicy, service-key HMAC, caller-identity verification) — contains exactly three services: nexus-orchestrator, nexus-workflows, and chat-orchestrator (the chat exception). Four adapters implement the provider abstraction: GeminiAdapter, AnthropicAdapter, ClaudeMaxAdapter, and OpenRouterAdapter. Per-organization configuration is resolved via resolveOrgConfig() hitting nexus-auth, which returns AES-256-decrypted keys. Role-based routing — default, fast, reasoning, code, long-context — derives from roleAssignments.default in the org config. The tool-calling loop runs up to MAX_TOOL_ITERATIONS rounds, stopping when no tool calls are returned or the iteration limit is reached.

4.4 WebSocket Event Relay and PCC

Job events flow from the orchestrator to the dashboard through a three-layer relay. The orchestrator publishes to Redis Pub/Sub channel nexus:jobs:org:{orgId}. services/nexus-gateway/src/websocket/job-event-relay.ts subscribes with pattern matching and emits to Socket.IO rooms — org-plus-plugin rooms are the v3 target state, with org-plus-user rooms retained under a compat mode mirror. A ring buffer of 50 events per channel prevents event loss during transient Redis reconnections. The Progress Command Center (nexus-dashboard/src/stores/progress-command-center-store.ts) is a Zustand store with localStorage persistence that subscribes to these events and exposes the TrackedJob model: jobId, runId, status, stage, progress, steps, ReAct thinking log, tool-call trace, billing breakdown, governance metadata (risk level, data residency, flagged-for-review), and HPC session state (log buffer up to 500 lines, GPU metrics).

4.5 Multi-tenant Isolation (Four Layers)

Layer 1 is middleware: Express and Next.js middleware reject requests lacking X-Organization-Id, X-App-Id, X-User-Id headers (JWT-derived). Layer 2 is PostgreSQL row-level security: session variables app.current_company_id, app.current_app_id, app.current_user_id gate every SELECT, INSERT, UPDATE, and DELETE via RLS policies. Layer 3 is payload-and-label filtering: Qdrant filters searches by org_id in vector payloads, and Neo4j Cypher queries carry WHERE org_id = $orgId clauses. Layer 4 is Istio: AuthorizationPolicy per service with SPIFFE identity verification, mTLS between pods, and NetworkPolicy whitelists.

4.6 Adverant Nexus CLI (v1)

The current CLI auto-discovers 44+ microservices from Docker Compose, Kubernetes, and OpenAPI specifications, exposes 70+ MCP tools as commands, and supports a ReAct agent mode with up to 20 autonomous iterations. Commands include nexus services list, nexus mcp tools, nexus ask "<prompt>", nexus workflows list | run, nexus session save | resume, and nexus monitor. Notable gaps: the CLI is not integrated with orchestrator /api/v1/dispatch; there is no streaming tail of run events; no chain DAG visualization; no skill publish, sign, or verify; no airgap bundle build; and no PCC mirror in-terminal.

4.7 Marketplace Plugin Template

The plugin template (/Users/don/Adverant/plugin-template/) scaffolds a Next.js 14 frontend (static export) with a JWT-protected PluginGate component, Zustand stores for dashboard auth, PCC integration, WebSocket state, and theme, plus hooks for iframe embedding detection and Terminal Computer page context. The backend is Express with TypeScript, JWT middleware, and service layers. nexus.manifest.json declares plugin metadata. Plugins dispatch jobs via POST /api/v1/dispatch with a plugin-scoped trace context. The template does not yet support declarative button bindings in the manifest — that gap is closed in v4.0 Section 7.15.

4.8 Where v3 Stops Short

The UNO paper honestly disclosed seven open gaps: Phase 7 multi-provider routing partial, Phase 8 tool executors partial, per-queue pod deployments unrealized, token-quota precision issues, chain engine formalization incomplete, span storage tiering absent, multi-agent cost controls unfinished. Beyond these, v4.0 addresses ten additional gaps identified through retrospective: Tier 4 HITL specification, chain state persistence beyond Redis TTL, skill versioning enforcement, scheduled dispatch API unification, centralized skill registry UI, cost attribution granularity, multi-agent orchestration formalism, airgapped deployment documentation, event ordering guarantees, and provider failover.

5. Gap Analysis

We separate seven industry gaps (unsolved by any of the twelve surveyed frameworks) from ten v3-internal gaps (identified through retrospective on the UNO migration). Each gap is numbered for cross-reference in Section 7.

5.1 Industry Gaps (Gaps A–G)

Gap A — Airgapped multi-agent deployment with signed plugin bundles. Only a handful of vendors (Tabnine for code, Plane for project management) offer true airgapped operation. None of the twelve surveyed frameworks ship a turnkey airgapped stack with signed MCP server bundles plus offline verification plus offline model weights plus airgapped skill marketplace [43][44][45].

Gap B — Cryptographic per-tenant isolation across shared observability. Logical tenant scopes (LangSmith workspaces, Azure tenant boundaries) are not cryptographic. Shared observability backends can leak reasoning chains and tool outputs across tenants. No framework treats traces as tenant-scoped data with per-tenant encryption keys [46][47].

Gap C — Skill or plugin marketplace with provenance plus SBOM plus quality scoring. Claude's MCP directory lists 75+ connectors but ships no SBOM, no signed provenance, no runtime quality score. AgentCore ships coding-assistant skills but no marketplace. CrewAI has hundreds of tools with no quality rating. This is the npm supply-chain problem waiting to happen [48].

Gap D — Cross-framework agent portability beyond A2A. A2A specifies a wire protocol but not a portable agent definition. MS Agent Framework YAML, OpenAI primitives, LangGraph StateGraph, and Gemini ADK are not interchangeable. There is no "Dockerfile for agents."

Gap E — Deterministic replay and exactly-once for long-horizon agents. LangSmith Deployment, MS Agent Framework, and Pydantic AI Harness claim durability in various forms, but no industry-standard replay protocol guarantees that given a run identifier you can reconstruct every tool call, every LLM output, and every state transition bit-for-bit. Regulated industries require this; no framework delivers it cleanly.

Gap F — Agent-level FinOps. Token budgets per run, per tenant, per skill; cost attribution to business units; automatic circuit-breakers when a runaway agent burns through spend. AgentCore has partial policy controls; LangSmith tracks cost per trace. No framework ships turnkey agent-FinOps.

Gap G — Chain-of-custody for AI-generated artefacts. When an agent produces code, a document, a design, or a decision, where is the auditable chain — model, prompt, tool calls, human approvals, skill version — that produced it? Left as an implementation exercise by all twelve.

5.2 v3-Internal Gaps (Gaps 1–10)

Tier 4 specification. Declared in ExecutionTier type; no HITL waitpoint code; no cost cap; no replay.
Chain state persistence. Redis 24-hour TTL is inadequate for long-horizon chains; needs Postgres-backed persistence.
Skill versioning enforcement. SKILL.md declares version; dispatch does not pin or validate it.
Scheduled dispatch. Scheduling lives in nexus-workflows; should unify with UNO for governance.
Skill registry UI. 729 skills across plugins with no unified discovery surface.
Cost attribution granularity. Per-run cost tracked; per-span and per-step are not.
Multi-agent orchestration formalism. CLI supports up to 10 concurrent agents; orchestrator dispatch is single-skill; formal multi-agent handoff is not codified.
Airgapped deployment documentation. K3s manifests are offline-compatible but no sealed-bundle flow.
Event ordering guarantees. Redis Pub/Sub is best-effort; critical workflows may need event log durability.
Provider failover. No documented fallback when the primary provider returns 5xx persistently.

5.3 Gap Prioritization

We prioritize by commercial leverage times implementation cost. Gaps A, B, C, E, F, G, and v3-internal Gaps 1, 2, 3, 6 rank highest: they are defensible wedges with tractable implementations. Gap D (portability) ranks lower because it requires multi-vendor standardization beyond Adverant's unilateral control, and the A2A protocol already covers the runtime interop case.

6. v4.0 Principles

The v4.0 architecture follows five principles derived from the gap analysis and the UNO paper's retrospective.

Principle 1 — Dispatch does not execute. Retained from UNO. The orchestrator validates, resolves, classifies risk, and enqueues; it never calls an LLM, never invokes a tool, never waits on execution. This survives v4.0 unchanged.

Principle 2 — Execute does not call an LLM directly. Already enforced by ALLOWED_AI_CALLERS at three layers. In v4.0, this is extended: skill authors cannot instantiate LLM clients, provider SDKs, or HTTP calls to model APIs. The AI Provider Router is the only hot path to providers, and every skill consumes it via the shared client.

Principle 3 — Every action is a signed span. Every orchestrator operation, every tool invocation, every LLM call, every human approval, every artefact produced carries a span that is signed (C2PA manifest for artefacts, cryptographic hash chain for spans) and retained for the compliance-framework-specified minimum (seven years for FedRAMP, six years for HIPAA).

Principle 4 — Every skill is a versioned, signed, measured artefact. SKILL.md v2 is a signed manifest with an SBOM, a semantic version, a risk-tier classification, an adversarial-eval record, a quality score that updates from runtime telemetry, and an auto-rollback policy when the quality score falls below a tenant-configurable threshold.

Principle 5 — Every tenant is cryptographically isolated. Logical scopes are insufficient. Memory Bank payloads, span reasoning chains, artefact bytes, and binding metadata are encrypted with a per-tenant key-encryption-key (KEK) held in the nexus-auth KMS — HSM-backed in cloud profiles, TPM-backed in airgapped profiles. Cross-tenant leakage requires breach of both the KMS and the storage backend.

These five principles govern every v4.0 feature in Section 7.

7. Adverant Nexus Stack v4.0: The Proposal

This section is the v4.0 architectural core. Each sub-section specifies one primitive. We defer diagrams to Section 8, use cases to Section 9, migration mapping to Section 10, and appendix-depth schemas to Appendices A through J.

7.1 Principles (Summary)

See Section 6. Summarized: dispatch does not execute; execute does not call an LLM directly; every action is a signed span; every skill is a versioned signed measured artefact; every tenant is cryptographically isolated.

7.2 Execution Tiers Reframed

The four-tier taxonomy — llm_only, tool_using, chain, autonomous — is retained. What changes is Tier 4: it becomes a concrete state machine rather than a declared type. The Tier 4 state machine has five states (start, plan, execute, review, complete) with documented transitions: sub-agent spawning from execute bounded by max_sub_agents and cumulative cost_cap; human-in-the-loop waitpoints triggered from review whenever the bound skill metadata or run-specific binding override requires it (risk-tier high, compliance framework mandate, or explicit override); replan transitions from review back to plan when a quality evaluation falls below threshold or a human reviewer rejects. On-exit hooks include OnHITLPause, OnHITLResume, OnCostThreshold, and OnTierEscalation (triggered when a Tier 2 run exceeds its iteration limit and escalates automatically to Tier 4 with human oversight). Tier 4 state is persisted in orchestrator.chain_runs (repurposed) and orchestrator.autonomous_runs (new table); Redis is used only for short-lived coordination, never as the system of record.

7.3 Skill Marketplace 2.0

SKILL.md v2 is a signed manifest. Publication proceeds through the nexus skill publish CLI: lint the manifest, generate an SBOM from dependencies, run an adversarial-eval suite (prompt injection, tool abuse, scope creep), classify the risk tier against the EU AI Act taxonomy (minimal, limited, high, unacceptable), cross-reference MITRE ATLAS techniques and OWASP LLM Top 10 mitigations, sign with sigstore, bump the semantic version, and insert into ros.skill_definitions v2 plus ros.skill_versions (a new table holding every published version). At dispatch time the runtime verifies the signature, checks that the SBOM contains no known CVEs from the NVD feed (updated daily, cached in airgapped deployments), confirms the quality score meets the tenant's threshold, confirms the risk tier is allowed by the tenant's policy, and confirms the export tags are compatible with the tenant's jurisdiction. Runtime telemetry continuously updates the quality score; if it drops below the auto-deactivate threshold, the binding referencing this skill is flipped is_active=false and traffic falls back to the next-priority binding on the same binding_key. Gap C closed.

7.4 Hooks as First-Class Primitive

Hooks are the extensibility surface of v4.0. Every dispatch-time and execution-time event is a hook point: PreDispatch, PostSkillResolve, PreTierSelect, PreLLMCall, PostLLMCall, PreToolUse, PostToolUse, PreChainStep, PostChainStep, OnCostThreshold, OnIterationLimit, OnTierEscalation, OnHITLPause, OnHITLResume, PostDispatch. Each hook is declared as a manifest (YAML) scoped at org, skill, or plugin level with a matcher expression (e.g., tool.name == 'write_file' && path.startsWith('/etc')) and an action (deny, rewrite, require_hitl, emit_event, call_webhook, policy_ref). Policy references resolve to OPA/Rego bundles. Hooks adopt the Claude Agent SDK pattern [30][31] but elevate them from SDK primitives to platform primitives — they run server-side in the orchestrator and workflows, not in the model client. Gap F (FinOps) and part of Gap E (replay) are closed through hooks — OnCostThreshold enforces budgets; every hook invocation is a span in the replay record.

7.5 Memory Bank with Cryptographic Per-Tenant Isolation

Memory Bank is the long-term memory service (short-term state remains in Postgres orchestrator.runs). Payloads are envelope-encrypted: a per-value data-encryption key (DEK) encrypts the payload; the DEK is wrapped with the tenant KEK held in nexus-auth KMS. In cloud profiles the KEK is HSM-backed (FIPS 140-3); in airgapped profiles it is TPM-backed. Rotation is quarterly by policy or on-demand. Crypto-erasure (Gap A adjacent): deleting a subject's memories can be implemented as DEK destruction without touching the ciphertext, which remains unreadable. Gap B closed.

7.6 A2A and MCP Dual Plane

Tool use flows through MCP; agent-to-agent flows through A2A. We retain MCP for the tool plane [1][2][28] and add A2A as a first-class primitive for the agent plane, enabling interoperation with Gemini Enterprise agents, MS Agent Framework agents, third-party CrewAI workflows, and Bedrock AgentCore agents without protocol bridges. In airgapped profiles A2A discovery is restricted to local SPIFFE identities; cross-cluster A2A is only available in cloud and VDS profiles. The two planes are orthogonal: an agent uses MCP to call a tool; it uses A2A to delegate to another agent.

7.7 Structured Output and Self-Correction

Every skill contract in v4.0 declares an input_schema and an output_schema (JSON Schema or Pydantic model). Invalid LLM output triggers automatic retry with a diagnostic prompt containing the validation error; the retry ceiling is three, after which the run escalates (OnIterationLimit hook). This is the Pydantic AI pattern [25] applied uniformly to every skill.

7.8 Optimizer-Compiled Prompts

Skill prompts become compilable artefacts through DSPy-style optimizers (MIPROv2, GEPA) [36][37][38]. A skill declares a source prompt, few-shot examples, and a metric function; the optimizer produces an optimized variant tied to a specific skill version; runtime telemetry compares the variant's quality score against the incumbent; promotion requires a statistically significant improvement and no regression on a holdout set. Rollback is automatic when the deployed variant's quality score regresses.

7.9 Observability: Insights Agent and Polly-NL

The v3 twelve-type span tree [42] is retained as the substrate. On top of it we build two LangSmith-inspired primitives [8][24]: Insights Agent clusters spans into usage patterns and surfaces anomalies (cost hotspots, latency regressions, failure clusters) automatically; Polly-NL is a natural-language debug interface ("why was last night's chain run expensive?") that translates questions into span-tree queries and returns span citations. Storage tiering splits hot spans in Postgres (last 30 days) from cold spans in ClickHouse (older), addressing the UNO paper's 300 GB/month storage projection.

7.10 FinOps Governance

Per-org, per-skill, per-user, per-binding budgets are enforced pre-dispatch. Every dispatch arrives with an estimated cost (model rate × estimated token count); the orchestrator checks remaining budget and atomically reserves the estimate in a Redis counter. If the reservation fails the dispatch is refused with a troubleshooting JSON payload (per the "no fallbacks" contract). On each LLM call the actual cost is debited; unused reservation is refunded. Tier 4 runs carry a cumulative cost cap — exceedance triggers OnCostThreshold, which may pause for HITL or abort with partial results. Per-skill circuit-breakers open after a failure-rate threshold is breached over a window, rejecting new dispatches until the window expires. Gap F closed.

7.11 Deterministic Replay and Chain-of-Custody

Every run is reconstructable bit-for-bit from: input hashes, pinned model version, prompt-template hash, temperature (zero or seeded PRNG), tool-output captures, span tree, hook invocation log. On invocation, the orchestrator records a replay manifest with all of the above; on replay, the worker is seeded with the manifest and plays back the same sequence. Artefacts leave v4.0 with C2PA content provenance manifests [49] signed by the tenant key, listing the model, skill version, prompt hash, tool calls, and human approvals. Gaps E and G closed simultaneously.

7.12 Airgapped Bundle Mode

A single signed tarball contains: Docker images with OCI labels and signatures, K8s manifests, pinned model weights, Postgres migration bundle, skill bundle (pre-signed), SBOM and licenses, provisioning secrets for TPM-backed KEKs, FIPS 140-3 cryptographic modules, STIG-compliant base images, and an installation manifest. nexus airgap install <bundle.tar.gz> verifies signatures, loads images to the local registry, applies manifests, runs migrations, seeds the skill registry, and TPM-wraps tenant KEKs. Delta bundles (monthly or on-demand) update in place. External network calls are structurally impossible: outbound AllowedIP is the empty set; A2A discovery returns only local SPIFFE identities. Gap A closed. FedRAMP High, DoD IL5, CJIS, and IRS Publication 1075 use cases become tractable.

7.13 Adverant Nexus CLI 2.0

The CLI evolves into a first-class dispatch, streaming, publish, airgap, governance, FinOps, hooks, memory, A2A, and debug client. Illustrative commands (full reference in Appendix C):

$ nexus dispatch ros.code_edit --input @in.json --tier tool_using --provider gemini --budget 5.00 --tail
$ nexus runs show <run_id> --spans-tree
$ nexus runs replay <run_id>
$ nexus chain visualize <run_id>
$ nexus skill publish ./skill-dir --sign --sbom
$ nexus skill rollback ros.code_edit v3.2.0
$ nexus airgap bundle --out ./bundle.tgz --skills all --models all
$ nexus governance export --framework soc2 --out ./soc2-audit.tgz
$ nexus finops burn-rate --org my-org --window 7d
$ nexus hooks apply ./hooks.yaml
$ nexus memory gc --org my-org --older-than 180d
$ nexus a2a peers list
$ nexus debug nl "why did run abc fail last night"

A WebSocket-backed --tail option streams spans in real time into the terminal, mirroring PCC content inside the CLI. CI integration is straightforward: GitHub Actions dispatch skills, stream the result, and export governance evidence in a single job.

7.14 Governance, Compliance, and Security — Native, Not Bolted-On

Every major regulatory regime is a first-class v4.0 primitive with concrete enforcement points.

EU AI Act (Regulation 2024/1689, fully applicable 2 August 2026) [50]. Risk-tier classification is stored in ros.skill_definitions.risk_tier (minimal / limited / high / unacceptable, already present in v3 as TrackedJob.riskLevel). Dispatch rejects unacceptable skills. high skills require an HITL waitpoint (Tier 4 state machine), a conformity-assessment record (stored in compliance.conformity_assessments), and post-market monitoring spans. Article 12 (logging) maps to the span tree. Article 13 (transparency) maps to synthesised-output watermarking plus model-card exposure through the CLI and dashboard. Article 14 (human oversight) maps to the Tier 4 HITL primitive. Article 15 (accuracy, robustness, cybersecurity) maps to DSPy optimizer metrics plus adversarial-eval suites. Article 26 (deployer obligations) maps to a per-org governance document auto-generated from the skill registry.

GDPR, UK GDPR, EU Data Act [51][52]. Data-residency tags (eu_only, us_only, any, or region codes) are enforced at the AI Provider Router and at storage. Right-to-erasure runs as a nexus erase-subject Tier 3 chain that atomically deletes from Postgres (RLS-scoped), Qdrant (payload filter), Neo4j (DETACH DELETE), object storage, and Memory Bank (KEK destruction for crypto-erasure), then schedules backup-retention purge. DPIA artefacts are auto-generated per skill.

SOC 2 Type II, ISO/IEC 27001, ISO/IEC 42001 [53][54][55]. The span tree is the continuous-control evidence pipeline. Control identifiers are mapped to span types in Appendix G. ISO 42001-specific controls (AI risk assessment, AI impact assessment, AI system life-cycle management) attach to the skill-publication workflow.

HIPAA and HITRUST [56]. Protected health information tagging on skill bindings forces residency and provider constraints — no provider without a Business Associate Agreement is routable. Audit span retention is minimum six years. Covered-entity and business-associate roles are modelled in nexus-auth.

FedRAMP Moderate and High, DoD IL4 and IL5, CJIS, IRS Publication 1075 [57][58]. The airgapped bundle (Section 7.12) is the delivery vehicle. FIPS 140-3 validated cryptography; STIG-compliant base images; CAC/PIV SSO; audit retention minimum seven years.

NIST AI RMF 1.0 and NIST AI 600-1 [59][60]. GOVERN, MAP, MEASURE, MANAGE functions map to per-org policy documents, skill-registry metadata, quality-evals plus span analytics, and hooks plus FinOps plus HITL respectively.

Regional privacy laws — PIPL, DPDP, LGPD, PIPEDA, Australian Privacy Act. Expanded residency enum plus provider-allowlist table. Cross-border transfer records auto-generated per dispatch crossing jurisdictions.

OWASP LLM Top 10 (2025), MITRE ATLAS, OWASP Agentic AI Threats [61][62][63]. Per-skill threat model in the registry. Runtime enforcement: input classifier hook for LLM01 (prompt injection), structured-output schema for LLM02 (insecure output), signed skills plus SBOM for LLM05 (supply chain), output scanner hook for LLM06 (sensitive disclosure), capability allowlist hooks for LLM07 and LLM08 (insecure plugins, excessive agency), watermarks plus model cards for LLM09 (overreliance), rate limits plus auth plus airgap for LLM10 (model theft).

C2PA [49]. Every generated artefact leaves v4.0 with a C2PA manifest v2 signed by the tenant key.

Export controls — EAR, ITAR, EU Dual-Use Regulation 2021/821 [64][65]. Model and skill export-control tags; dispatch gate refuses cross-border use without an export-license record.

Cryptography and secrets. Org-level keys remain AES-256 in nexus-auth. v4.0 adds envelope encryption with per-tenant KEKs (HSM-backed cloud, TPM-backed airgap), quarterly rotation, and a nexus keys rotate CLI. Post-quantum hybrid X25519 plus ML-KEM for inter-service mTLS is on the forward-looking roadmap.

Enforcement architecture. Three gates (Istio AuthorizationPolicy plus mTLS; service-key HMAC plus caller-identity; per-dispatch OPA policy evaluator). Policies are versioned in a central bundle distributed to services.

Evidence and reporting. A Governance tab in the dashboard and a nexus governance export --framework <soc2 | iso27001 | iso42001 | eu-ai-act | nist-airmf | hipaa | fedramp> CLI command produce auditor-ready packages: control-to-evidence maps, span samples, policy versions, conformity-assessment records, DPIAs, model cards, adversarial-eval reports. Full traceability matrix in Appendix G.

7.15 UI/UX Bindings — User-Configurable Buttons to Workflows

The Bindings primitive generalizes the production ros.skill_bindings table (documented in Adverant-NexusROS/src/schemas/skill-bindings.schema.ts, migration database/migrations/030_skill_bindings.sql, routes src/routes/skill-bindings.ts, and the resolution skill src/skills/ros-skill-binding-resolve.ts) into a first-class substrate where every clickable action in any plugin or marketplace application is a Binding resolved at runtime to a skill plus tier plus provider plus model plus cost cap plus risk tier plus residency plus inputs mapping plus hook set.

Why. In v3, reconfiguring what a button does requires editing plugin source code and shipping a new version. In v4.0, a power user opens the Binding Editor, changes the skill or the tier or the model, saves — and the button now dispatches differently on their next click. No code deploy. Organizational admins and plugin authors retain veto through OPA override policies.

Resolution. Lookup by binding_key (a regex-validated string like lead.scoring.v2) proceeds through a four-level scope hierarchy: user > project > org > system. Within the most-specific matching scope, the active binding with the highest priority (0–1000) wins. If an A/B experiment is active on the binding_key, traffic splits by split_ratio (hashed user identifier). Configuration overrides merge via a precedence chain: skill_definition.config is the base, binding.config_overrides overrides, caller.runtime_overrides takes final precedence.

Metadata. The full v4.0 binding metadata set spans identity (id, org, binding_key, scope, scope_id, priority), resolution target (skill_definition_id, skill_version_pin), execution (tier, provider_preference, model_preference, routing_hint, queue_name, response_format), cost and limits (cost_cap_usd, daily_cap_usd, token_cap_in, token_cap_out, timeout_ms, max_iterations, max_sub_agents), governance (risk_tier, data_residency, export_tags, requires_hitl, policy_refs, tier_restrictions, phi_tagged, compliance_frameworks), hooks (hooks[], allowed_tools[], denied_tools[]), inputs and mapping (input_schema, inputs_mapping, output_target), UI presentation (display_name, description, icon, placement[], confirmation, badge, shortcut), observability and experiments (ab_experiment_id, quality_score_threshold, telemetry_tags), and lifecycle (status, is_active, deleted_at, created_by, updated_by, config_overrides, conditions).

Plugin manifest declarative actions. Plugin authors declare bindings in nexus.manifest.json's actions[] array; on install these seed SYSTEM-scope bindings; on uninstall they are removed; on upgrade they are diff-reviewed. Authors specify defaults (skill, tier, provider, model, cost cap) that admins and users can override within policy.

Override policy (OPA). A user-scope binding save is routed through an OPA rule that prevents weakening of organizational governance: residency cannot be widened, phi-required providers must include the proposed provider, max-cost-cap must be ≥ proposed cost cap, HITL-mandatory-for-high-risk must be honoured, allowed-tools must cover the proposed allowed-tools list, and the user role must have bindings:write:<scope> permission. Denials return structured troubleshooting JSON (per the no-fallbacks contract).

Audit trail. Every binding change is an audit row (who, when, what, policy verdict). A config-drift detector flags bindings whose quality score drops post-edit. Bindings are observable in a Governance tab and exportable in the SOC 2 package.

Integration. Bindings are consumed by hooks (Section 7.4 — hooks referenced in the binding metadata run during the dispatch path), by FinOps (Section 7.10 — binding cost caps reserve against tenant budgets), by Memory Bank (Section 7.5 — binding metadata is encrypted with the tenant KEK if flagged phi_tagged or otherwise sensitive), and by Governance (Section 7.14 — binding metadata declares applicable frameworks and the override policy enforces them).

Full schema in Appendix J.

(Sections 8 through 14 and Appendices A through J follow.)

8. Reference Architecture

This section renders the v4.0 architecture in a canonical diagram set grouped by concern: current state (v3), proposed architecture (v4.0), user journeys, UI/UX mocks, compliance enforcement, and deployment profiles. Every diagram is also available as a standalone figure file (Mermaid and PlantUML sources in figures/). All widths capped at 110 columns for monospace readability. ASCII renderings below; equivalent SVG figures in the companion package.

Diagrams are grouped by concern. Every current-capability and every v4.0 capability has at least one diagram. Widths kept ≤110 cols for monospace readability.

4.A Current State (v3) — Architecture

A1. Service topology — v3 (current, 44 services)

                              ┌─────────────────────────────────────────────────┐
                              │             ADVERANT NEXUS v6.2.1               │
                              │         K3s cluster @ 157.173.102.118           │
                              └─────────────────────────────────────────────────┘
                                                   │
      ┌──────────────────────┬──────────────────────┼──────────────────────┬──────────────────────┐
      │                      │                      │                      │                      │
      ▼                      ▼                      ▼                      ▼                      ▼
┌───────────┐        ┌──────────────┐       ┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│ INGRESS   │        │   GATEWAY    │       │ ORCHESTRATOR │       │  WORKFLOWS   │       │ SKILLS ENGINE│
│ Istio     │──mTLS─▶│ Socket.IO WS │──────▶│  Dispatch    │──────▶│  BullMQ      │──────▶│  SKILL.md    │
│ VirtSvc   │        │ AI Provider  │       │  Governance  │       │  Workers     │       │  tool_reg    │
└───────────┘        │  Router      │       │  Run Tracker │       │  Scheduler   │       └──────────────┘
                     └──────────────┘       └──────────────┘       └──────────────┘
                            │                       │                       │
                            │                       │                       │
                     ┌──────┴───────┐        ┌──────┴───────┐        ┌──────┴───────┐
                     │   NEXUS-AUTH │        │    REDIS     │        │   POSTGRES   │
                     │   AES-256    │        │  Pub/Sub     │        │  orchestr.   │
                     │   Org keys   │        │  BullMQ queue│        │  runs        │
                     │   RBAC       │        │  Ring buffer │        │  skill_reg   │
                     └──────────────┘        └──────────────┘        └──────────────┘
                                                     │                       │
                                                     ▼                       ▼
                                              ┌──────────────┐        ┌──────────────┐
                                              │    QDRANT    │        │    NEO4J     │
                                              │  embeddings  │        │  GraphRAG    │
                                              │  voyage 1024d│        │  entities    │
                                              └──────────────┘        └──────────────┘

   PLUGIN FLEET (marketplace):                           FRONTEND FLEET:
   ┌────────────┐ ┌────────────┐                        ┌────────────┐ ┌────────────┐
   │NexusROS    │ │ProseCreator│                        │ dashboard  │ │ adverant.ai│
   │ (UNO)      │ │ (writing)  │                        │ PCC + chat │ │ (marketing)│
   └────────────┘ └────────────┘                        └────────────┘ └────────────┘
   ┌────────────┐ ┌────────────┐ ┌────────────┐         ┌────────────┐ ┌────────────┐
   │NexusQA     │ │Forge       │ │EE-Design   │         │ plugin-UIs │ │ nexus-cli  │
   │ (testing)  │ │ (hardware) │ │ (PCB)      │         │ (N plugins)│ │ (CLI)      │
   └────────────┘ └────────────┘ └────────────┘         └────────────┘ └────────────┘

A2. UNO dispatch pipeline — v3 (8-step, current)

 Client          Orchestrator         Skills Eng.      AI Router         Workflows      Dashboard
   │                   │                   │               │                 │              │
   │ POST /dispatch    │                   │               │                 │              │
   ├──────────────────▶│                   │               │                 │              │
   │                   │ 1. Validate       │               │                 │              │
   │                   │    Zod schema     │               │                 │              │
   │                   │                   │               │                 │              │
   │                   │ 2. Resolve job_type → skill       │                 │              │
   │                   ├──────────────────▶│               │                 │              │
   │                   │◀──── exec_config──┤               │                 │              │
   │                   │                   │               │                 │              │
   │                   │ 3. Governance precheck            │                 │              │
   │                   │    (residency, risk)              │                 │              │
   │                   │                   │               │                 │              │
   │                   │ 4. Insert run row (orchestrator.runs)               │              │
   │                   │                   │               │                 │              │
   │                   │ 5. Enqueue BullMQ │               │                 │              │
   │                   ├───────────────────────────────────────────────────▶ │              │
   │ 202 Accepted      │                   │               │                 │              │
   │◀──────────────────┤                   │               │                 │              │
   │ {run_id,trace_id} │ 6. Publish job:queued             │                 │              │
   │                   │    to Redis Pub/Sub                                 │              │
   │                   │                                                     │              │
   │                   │                                              7. Worker dequeue     │
   │                   │                                                     │              │
   │                   │                                   ┌────Tier 1 llm_only─┐           │
   │                   │                                   │   call AI Router   │           │
   │                   │                                   ├────Tier 2 ReAct────┤           │
   │                   │                                   │   loop LLM+tool    │           │
   │                   │                                   ├────Tier 3 Chain────┤           │
   │                   │                                   │   DAG coordinator  │           │
   │                   │                                   ├────Tier 4 Auton────┤           │
   │                   │                                   │   (undefined)      │           │
   │                   │                                   └────────────────────┘           │
   │                   │                   │               │                 │              │
   │                   │ 8. job:* events → Redis → WS relay → Socket.IO → PCC              │
   │                   │◀──────────────────────────────────────────────────────────────────┤
   │                   │                                                                    │

A3. Execution tier matrix — v3 (current, with gaps)

┌───────┬──────────────┬─────────────┬────────────────┬─────────────┬──────────────────────────────┐
│ Tier  │  Name        │  LLM Calls  │  Tool Calls    │  State      │  Status in v3                │
├───────┼──────────────┼─────────────┼────────────────┼─────────────┼──────────────────────────────┤
│  1    │ llm_only     │  1          │  0             │ inline or   │ ✅ Deployed                   │
│       │              │             │                │ BullMQ      │                              │
├───────┼──────────────┼─────────────┼────────────────┼─────────────┼──────────────────────────────┤
│  2    │ tool_using   │  N (ReAct)  │  M per LLM     │ BullMQ      │ ✅ Deployed, hooks missing    │
│       │              │             │                │             │    quality evals missing     │
├───────┼──────────────┼─────────────┼────────────────┼─────────────┼──────────────────────────────┤
│  3    │ chain        │  N steps    │  M per step    │ Redis 24h + │ ✅ Deployed                   │
│       │              │             │                │ BullMQ      │ ⚠️  No persistent audit       │
│       │              │             │                │             │ ⚠️  No visual editor          │
│       │              │             │                │             │ ⚠️  No loops/dynamic DAG      │
├───────┼──────────────┼─────────────┼────────────────┼─────────────┼──────────────────────────────┤
│  4    │ autonomous   │  Extended   │  Many          │ Unclear     │ ❌ Defined in type, no code   │
│       │              │             │                │             │ ❌ No HITL waitpoint          │
│       │              │             │                │             │ ❌ No cost cap                │
└───────┴──────────────┴─────────────┴────────────────┴─────────────┴──────────────────────────────┘

A4. AI Provider Router — v3 (current, 4 adapters)

                               ┌──────────────────────────┐
    Authorised callers         │   AI PROVIDER ROUTER     │          Org config resolution
    (exactly 3 principals):    │   nexus-gateway          │
    ─ nexus-orchestrator       │                          │          ┌──────────────────┐
    ─ nexus-skills-engine      │ chatWithTools()          │──ask────▶│  nexus-auth      │
    ─ chat-orchestrator        │  MAX_TOOL_ITERATIONS     │          │  resolveOrgConfig│
         │                     │                          │          │  AES-256 keys    │
         │ POST /internal/ai/  │    ┌─ GeminiAdapter      │          └──────────────────┘
         │      chat           │    ├─ AnthropicAdapter   │                  │
         ├────────────────────▶│    ├─ ClaudeMaxAdapter   │       provider + role models
         │                     │    └─ OpenRouterAdapter │                  │
         │                     │                          │                  ▼
         │                     │ Role routing:            │          ┌──────────────────┐
         │                     │   default/fast/reasoning │          │  Google / Anthro │
         │                     │   /code/long_context     │────API──▶│  /OpenRouter/    │
         │                     │                          │          │  Claude Max      │
         │                     │ Response fmt: json/text  │          └──────────────────┘
         │                     └──────────────────────────┘

    Enforcement (3 layers):
    ┌────────────────────────────────────────────────────────────────────────────────────┐
    │ Layer 1: Istio AuthorizationPolicy  │  Layer 2: validateServiceKey  │  Layer 3:    │
    │   NetworkPolicy whitelist            │   HMAC header per caller      │   validate   │
    │                                      │                               │   CallerId   │
    └────────────────────────────────────────────────────────────────────────────────────┘

A5. Socket.IO + Redis Pub/Sub event flow — v3 (current)

 Orchestrator            Redis Pub/Sub            JobEventRelay          Socket.IO        Dashboard PCC
       │                       │                        │                   │                 │
       │ emitJobEvent()        │                        │                   │                 │
       ├──────────────────────▶│                        │                   │                 │
       │   channel:            │                        │                   │                 │
       │   nexus:jobs:org:{id} │                        │                   │                 │
       │                       │ pattern subscribe      │                   │                 │
       │                       ├───────────────────────▶│                   │                 │
       │                       │                        │ emit to rooms     │                 │
       │                       │                        ├──────────────────▶│                 │
       │                       │                        │                   │ org:{id}:       │
       │                       │                        │                   │   plugin:{pid}  │
       │                       │                        │                   ├────────────────▶│
       │                       │                        │                   │                 │
       │                       │                        │ (compat mirror):  │                 │
       │                       │                        │   org:{id}        │                 │
       │                       │                        │   user:{uid}      │                 │
       │                       │                        │                   │                 │
       │                       │                        │                   │                 │
       │ Ring buffer (50 evts) ◀── replay on reconnect ─┤                   │                 │
       │                                                                                      │

  Event types: job:dispatched, job:queued, job:started, job:skill_resolved,
              job:llm_call, job:llm_response, job:llm_stream_chunk,
              job:tool_call, job:tool_result,
              job:chain_step_start, job:chain_step_complete,
              job:progress, job:warning, job:completed, job:failed, job:timeout, job:cancelled

A6. PCC TrackedJob model — v3 (current)

 ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
 │ TrackedJob (Zustand + localStorage, hydrated from /api/workflows/runs/{runId})              │
 ├─────────────────────────────────────────────────────────────────────────────────────────────┤
 │  jobId          triggerRunId       jobType            jobLabel          stage               │
 │  progress 0-100 message            steps[]            startedAt         completedAt?        │
 │  error?         result?                                                                     │
 │                                                                                             │
 │  ─── Skill transparency ───────────────────────────────────────────────────────────────     │
 │  skillId        skillName          executionType      currentIteration  maxIterations       │
 │                                                                                             │
 │  ─── ReAct transparency ──────────────────────────────────────────────────────────────      │
 │  thinkingLog[]                     toolCalls[]        {name,args,result,durationMs}         │
 │                                                                                             │
 │  ─── Billing ─────────────────────────────────────────────────────────────────────          │
 │  billing.tokensIn               .tokensOut        .costUSD      .provider                   │
 │                                                                                             │
 │  ─── Governance ──────────────────────────────────────────────────────────────────          │
 │  riskLevel      flaggedForReview  dataResidency                                             │
 │                                                                                             │
 │  ─── HPC/GPU ─────────────────────────────────────────────────────────────────────          │
 │  logBuffer[] (≤500)  sessionUrl   gpuMetrics {epoch, loss, accuracy, cost}                  │
 └─────────────────────────────────────────────────────────────────────────────────────────────┘

A7. Multi-tenant isolation — v3 (current, 4 layers)

 Incoming request
      │
      │ X-Organization-Id, X-App-Id, X-User-Id (JWT-derived)
      ▼
 ┌──────────────────────────────────────────┐
 │ Layer 1: MIDDLEWARE                      │   Reject if tenant headers missing
 │   Express / Next.js middleware           │   Set req.orgId / req.appId / req.userId
 └──────────────────────────────────────────┘
      │
      ▼
 ┌──────────────────────────────────────────┐
 │ Layer 2: POSTGRES RLS                    │   SET app.current_company_id = ...
 │   Session-var-driven RLS policies        │   SELECT/INSERT/UPDATE/DELETE all gated
 │   Every table has USING / WITH CHECK     │
 └──────────────────────────────────────────┘
      │
      ▼
 ┌──────────────────────────────────────────┐
 │ Layer 3: VECTOR + GRAPH FILTER           │   Qdrant: org_id in payload, filter on search
 │   Qdrant payload filter                  │   Neo4j: org_id label, WHERE in every Cypher
 │   Neo4j Cypher WHERE clause              │
 └──────────────────────────────────────────┘
      │
      ▼
 ┌──────────────────────────────────────────┐
 │ Layer 4: ISTIO SERVICE MESH              │   mTLS between pods
 │   AuthorizationPolicy per service        │   Caller-identity SPIFFE ID verified
 │   NetworkPolicy whitelist                │
 └──────────────────────────────────────────┘

A8. nexus-cli — v3 (current capabilities)

 $ nexus services list              # Auto-discover 44+ microservices
 $ nexus mcp tools                  # List 70+ MCP tools
 $ nexus ask "prompt"               # ReAct agent (≤20 iterations)
 $ nexus workflows list             # List workflow templates
 $ nexus workflows run <template>   # Execute template
 $ nexus session save <name>        # Checkpoint
 $ nexus session resume <name>      # Restore
 $ nexus monitor                    # Real-time dashboard

 GAPS:
   ❌ No first-class dispatch to orchestrator /api/v1/dispatch
   ❌ No streaming tail of run events
   ❌ No chain DAG visualisation
   ❌ No skill publish / sign / verify
   ❌ No airgap bundle build
   ❌ Not integrated with PCC

A9. Plugin template — v3 (current)

 plugin/
 ├── frontend/                   Next.js 14 static export
 │   ├── app/
 │   │   ├── layout.tsx          BrandingProvider
 │   │   └── dashboard/{slug}/
 │   │       └── page.tsx        PluginGate (JWT)
 │   ├── components/gates/       PluginGate.tsx
 │   ├── stores/
 │   │   ├── dashboard-store.ts            auth token
 │   │   ├── progress-command-center-store.ts  PCC integration
 │   │   ├── plugin-ws-store.ts            WS state
 │   │   └── theme-store.ts
 │   └── hooks/
 │       ├── useEmbedded.ts       iframe detect
 │       └── usePageContext.ts    Terminal Computer ctx
 ├── backend/                    Express + TS
 │   ├── routes/
 │   ├── middleware/auth.ts      JWT validator
 │   └── services/
 ├── nexus.manifest.json         plugin metadata
 └── k8s/deployment.yaml

4.B v4.0 — Architecture

B1. v4.0 service topology (additions marked ★NEW)

                          ┌─────────────────────────────────────────────────────────────┐
                          │                    ADVERANT NEXUS v4.0                      │
                          │  One codebase, four profiles: cloud | VDS | on-prem | air   │
                          └─────────────────────────────────────────────────────────────┘
                                                   │
     ┌──────────────────┬──────────────────┬───────┴──────┬──────────────────┬──────────────────┐
     ▼                  ▼                  ▼              ▼                  ▼                  ▼
┌──────────┐     ┌─────────────┐    ┌─────────────┐ ┌─────────────┐  ┌─────────────┐   ┌─────────────┐
│ Ingress  │     │   Gateway   │    │Orchestrator │ │  Workflows  │  │Skills Engine│   │ NEXUS-AUTH  │
│ Istio    │────▶│  AI Router  │◀───│  Dispatch   │ │   BullMQ    │  │   SKILL v2  │   │  AES-256    │
│ OPA/Rego★│     │             │    │             │ │   Workers   │  │  signed+SBOM│   │  KEK/HSM★   │
└──────────┘     │ + A2A Plane★│    │ + Hooks★    │ │ + per-tier★ │  │  + QualScore│   │  + RBAC/SSO │
                 └─────────────┘    │ + Policy★   │ │   queues    │  │  + Optimizer│   └─────────────┘
                                    │ + Replay★   │ └─────────────┘  │   prompts★  │
                                    │ + FinOps★   │                  └─────────────┘
                                    │ + Govern★   │
                                    │ + Tier 4 SM★│
                                    └──────┬──────┘
                                           │
   ┌─────────────┐   ┌──────────────┐      │     ┌───────────────┐    ┌───────────────┐
   │MEMORY BANK★ │   │ SPAN STORE ★ │      │     │ MARKETPLACE ★ │    │ POLICY ENGINE★│
   │ per-tenant  │◀──┤ Postgres +   │◀─────┼────▶│ sigstore+SBOM │◀──▶│ OPA/Rego      │
   │ KEK encrypt │   │ ClickHouse   │      │     │ publish flow  │    │ versioned     │
   │ checkpoints │   │ (tiered)     │      │     │               │    │ rules         │
   └─────────────┘   └──────────────┘      │     └───────────────┘    └───────────────┘
                                           │
                               ┌───────────┼───────────┐
                               ▼           ▼           ▼
                        ┌──────────┐ ┌──────────┐ ┌──────────┐
                        │  REDIS   │ │ POSTGRES │ │  QDRANT  │
                        │ Pub/Sub  │ │ RLS+audit│ │ per-tenant│
                        │ + BullMQ │ │ + spans  │ │ namespaces│
                        └──────────┘ └──────────┘ └──────────┘
                                                     │
                                          ┌──────────┼──────────┐
                                          │          │          │
                                     ┌────▼────┐ ┌───▼────┐ ┌──▼─────┐
                                     │ NEO4J   │ │ HSM/TPM│ │AIRGAP★ │
                                     │GraphRAG │ │  KMS ★ │ │ bundle │
                                     └─────────┘ └────────┘ │ registry│
                                                            └────────┘

   CLI 2.0 ★                                   PCC v2 ★
   ┌──────────────────────────┐                ┌────────────────────────────┐
   │ nexus dispatch           │                │ Live spans + hooks + cost  │
   │ nexus runs tail          │                │ HITL inbox                 │
   │ nexus chain visualize    │                │ Marketplace browser        │
   │ nexus skill publish      │◀── WS stream ─▶│ Governance tab             │
   │ nexus airgap bundle      │                │ Replay scrubber            │
   │ nexus governance export  │                │ FinOps dashboard           │
   └──────────────────────────┘                └────────────────────────────┘

B2. v4.0 end-to-end dispatch flow

 Client   Orchestrator         Policy          Skills        Memory      AI Router     Span      PCC / CLI
                                Engine         Marketplace   Bank                      Store
   │           │                  │                │            │            │            │          │
   │ POST      │ validate zod     │                │            │            │            │          │
   │ /dispatch │                  │                │            │            │            │          │
   ├──────────▶│                  │                │            │            │            │          │
   │           │ PreDispatch HOOKS★                │            │            │            │          │
   │           ├─ input classifier (LLM01 injection guard)                                            │
   │           ├─ residency check                                                                     │
   │           ├─ budget check (FinOps)                                                               │
   │           ├─ export-control check                                                                │
   │           ├─ risk-tier gate (EU AI Act)                                                          │
   │           │                  │                │            │            │            │          │
   │           │ evaluate policy ─▶│                │            │            │            │          │
   │           │◀─ allow/deny + conditions ────────│            │            │            │          │
   │           │                                                                                     │
   │           │ resolve skill ──▶│ verify sig + SBOM + qual-score ≥ threshold                       │
   │           │                  │◀────── skill contract + exec_config v2 ──┤                       │
   │           │                  │                                                                  │
   │           │ load memory checkpoint ─────────▶│                                                  │
   │           │◀──── per-tenant decrypted state ─┤                                                  │
   │           │                                                                                     │
   │           │ insert run + span(root) ────────────────────────────────▶│                         │
   │           │ publish job:dispatched ───────────────────────────────────────────────────────────▶│
   │ 202 ack   │                                                                                     │
   │◀──────────┤ enqueue BullMQ per-tier queue                                                       │
   │           │                                                                                     │
   │           │ ─────────── worker dequeues, selects tier state machine ──────────                  │
   │           │                                                                                     │
   │           │   T1 llm_only ────▶ AI Router (single call) ──▶ span ──▶ PCC                        │
   │           │   T2 tool_using ──▶ ReAct loop + PreToolUse/PostToolUse hooks + spans               │
   │           │   T3 chain ───────▶ DAG coordinator (checkpointed) + PreChainStep hooks + spans     │
   │           │   T4 autonomous ──▶ multi-agent state machine + HITL waitpoints + cost caps         │
   │           │                                                                                     │
   │           │ PostDispatch HOOKS★ ─ sign artefact (C2PA) ─ write replay manifest ─ FinOps debit   │
   │           │                                                                                     │
   │           │ job:completed + governance-evidence-row ──────────────────────────────────────────▶ │
   │           │ memory checkpoint save (tenant-KEK encrypt) ────────────▶│                          │

B3. Tier 1 — llm_only (reframed, v4.0)

   ┌──────────────┐        ┌──────────────┐        ┌──────────────┐         ┌──────────────┐
   │ Dispatch     │───────▶│ PreDispatch  │───────▶│  AI Router   │────────▶│ Struct. Out  │
   │ validated    │        │  hooks       │        │  role=default│         │ validate+    │
   └──────────────┘        │  + policy    │        │  cache read  │         │ self-correct │
                           │  + budget    │        │              │         │ up to R=3    │
                           └──────────────┘        └──────────────┘         └──────┬───────┘
                                                          │                         │
                                                          ▼                         ▼
                                                    ┌──────────────┐         ┌──────────────┐
                                                    │ Span: llm_   │         │ Sign output  │
                                                    │  call (typed)│         │ C2PA manifest│
                                                    └──────────────┘         └──────────────┘
                                                                                    │
                                                                                    ▼
                                                                             ┌──────────────┐
                                                                             │ PostDispatch │
                                                                             │ FinOps debit │
                                                                             │ PCC emit     │
                                                                             └──────────────┘

B4. Tier 2 — tool_using (ReAct with hooks, v4.0)

  Start
    │
    ▼
  ┌──────────────┐     iter=0
  │ System prompt│     max = exec_config.maxIterations
  │ + memory     │
  │ + tool specs │
  └──────┬───────┘
         │
         ▼
  ┌──────────────┐  ┌─────────────────────┐
  │ AI Router    │─▶│ PostLLMCall HOOK★   │
  │ (role=code)  │  │  - PII scan         │
  └──────┬───────┘  │  - injection detect │
         │          └─────────┬───────────┘
         │  has tool_call?    │
         │       no ──────────┼──▶ final answer ──▶ struct-out validate ──▶ PCC complete
         │       yes          │
         ▼                    │
  ┌──────────────┐            │
  │ PreToolUse★  │            │
  │ - allowlist  │            │
  │ - arg rewrite│ deny ──▶ error, escalate
  │ - cost check │
  └──────┬───────┘
         │ allow
         ▼
  ┌──────────────┐
  │ Execute tool │
  │ span+sign    │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │PostToolUse★  │
  │ - redact     │
  │ - cache      │
  │ - FinOps     │
  └──────┬───────┘
         ▼
  iter++; loop if iter < max else OnIterationLimit hook ▶ escalate to Tier 4 or fail

B5. Tier 3 — chain (persistent DAG coordinator, v4.0)

                 exec_config.chainSteps = [ A, B, C, D, E ]
                 Persistent state: orchestrator.chain_runs + chain_steps (Postgres, not Redis TTL)

                              ┌──── step A ────┐
                              │  (Tier 1)      │
                              └───────┬────────┘
                                      │ output.x
                            ┌─────────┼─────────┐
                            ▼         ▼         ▼
                       ┌────────┐┌────────┐┌────────┐
                       │ step B ││ step C ││ step D │  PARALLEL fork (DAG)
                       │ (Tier 2││ (Tier 1││ (MCP   │  PreChainStep hook per branch
                       │ ReAct) ││ llm)   ││ tool)  │
                       └───┬────┘└───┬────┘└───┬────┘
                           │         │         │
                           └────┬────┴────┬────┘  join(all) (or any, or quorum)
                                │         │
                                ▼         ▼
                           ┌────────────────┐
                           │     step E     │   checkpoint after each step
                           │   (Tier 2 w/   │   resumable after worker crash
                           │   Memory Bank) │   visual editor in dashboard
                           └───────┬────────┘
                                   │
                                   ▼
                             ┌──────────┐
                             │  Done →  │
                             │ C2PA sign│
                             │ + export │
                             └──────────┘

B6. Tier 4 — autonomous (multi-agent + HITL + cost cap, v4.0)

   ┌──────────────────────────── TIER 4 STATE MACHINE ──────────────────────────────┐
   │                                                                                │
   │    start ──▶ plan ──▶ execute ──▶ review ──▶ complete                          │
   │     │         │         │          │           │                               │
   │     │         │         │          │           └─▶ artefact sign + C2PA        │
   │     │         │         │          │                                           │
   │     │         │         │          ├─▶ HITL waitpoint (risk=high)              │
   │     │         │         │          │     │                                     │
   │     │         │         │          │     ▼                                     │
   │     │         │         │          │   ┌──────────────┐                        │
   │     │         │         │          │   │ reviewer UI  │ ── approve ──▶ resume  │
   │     │         │         │          │   │ inbox on PCC │ ── reject  ──▶ replan  │
   │     │         │         │          │   │ evidence bundle                       │
   │     │         │         │          │   └──────────────┘                        │
   │     │         │         │          │                                           │
   │     │         │         │          └─▶ quality-eval < threshold ─▶ replan      │
   │     │         │         │                                                      │
   │     │         │         └─▶ spawn sub-agents (crew/handoff/groupchat pattern)  │
   │     │         │               max agents, max cost cap, timeout                │
   │     │         │                                                                │
   │     │         └─▶ cost cap exceeded ─▶ OnCostThreshold hook ─▶ HITL or abort   │
   │     │                                                                          │
   │     └─▶ OnTierEscalation hook (escalated from Tier 2 iter-limit)               │
   │                                                                                │
   └────────────────────────────────────────────────────────────────────────────────┘

B7. Skill Marketplace 2.0 — publication + verification flow

  Developer                   Skill Registry              Marketplace           Runtime
      │                             │                         │                    │
      │ nexus skill publish         │                         │                    │
      ├────────────────────────────▶│                         │                    │
      │                             │ 1. lint SKILL.md v2     │                    │
      │                             │ 2. generate SBOM        │                    │
      │                             │ 3. sign (sigstore)      │                    │
      │                             │ 4. adversarial-eval run │                    │
      │                             │ 5. quality-score init   │                    │
      │                             │ 6. semver bump          │                    │
      │                             │ 7. cross-ref MITRE ATLAS│                    │
      │                             │    + OWASP LLM Top 10   │                    │
      │                             │ 8. risk-tier (EU AI Act)│                    │
      │                             │                         │                    │
      │                             │ INSERT ros.skill_reg v2 │                    │
      │                             ├────────────────────────▶│                    │
      │                             │                         │ broadcast          │
      │                             │                         │ marketplace:update │
      │                             │                         ├───────────────────▶│
      │                             │                         │                    │
      │                             │                         │         At dispatch, runtime:
      │                             │                         │         1. verify signature
      │                             │                         │         2. check SBOM (no CVEs)
      │                             │                         │         3. check quality score ≥ tenant threshold
      │                             │                         │         4. check risk-tier allowed
      │                             │                         │         5. check export tags vs org
      │                             │                         │         6. version-pin or "latest-minor"
      │                             │                         │
      │                             │                         │ runtime telemetry ─▶ re-score
      │                             │                         │ auto-rollback if quality drops

B8. Hooks lifecycle — v4.0 (first-class orchestrator primitive)

  ┌─────────────────────────── REQUEST LIFECYCLE WITH HOOK POINTS ─────────────────────────┐
  │                                                                                        │
  │   PreDispatch ──▶ PostSkillResolve ──▶ PreTierSelect ──▶ tier dispatch                 │
  │        │                                                                               │
  │        │                                (per iteration / step)                         │
  │        │                                                                               │
  │   ┌────┴────┐                         ┌──── PreToolUse ──── PostToolUse ────┐          │
  │   │ deny    │                         │                                     │          │
  │   │ enrich  │    ┌── PreLLMCall ──┬───┤                                     │          │
  │   │ reroute │    │                │   │                                     │          │
  │   └─────────┘    │  PostLLMCall ──┤   └─── PreChainStep ── PostChainStep ───┤          │
  │                  │                │                                         │          │
  │                  └────────────────┴───── OnCostThreshold ───────────────────┤          │
  │                                   └───── OnIterationLimit ──────────────────┤          │
  │                                   └───── OnTierEscalation ──────────────────┤          │
  │                                   └───── OnHITLPause / OnHITLResume ────────┤          │
  │                                                                             │          │
  │   PostDispatch ◀──── sign artefact ◀── FinOps debit ◀── replay manifest ◀───┘          │
  │                                                                                        │
  └────────────────────────────────────────────────────────────────────────────────────────┘

  Hook definition (manifest in plugin or org config):
  ┌───────────────────────────────────────────────────────────────────────────┐
  │ event: PreToolUse                                                         │
  │ scope: org | skill | plugin                                               │
  │ target: skills/nexusros.code_edit                                         │
  │ matcher: tool.name == 'write_file' && path.startsWith('/etc')             │
  │ action: deny | rewrite | require_hitl | emit_event | call_webhook         │
  │ policy_ref: opa/skills/restrict-etc.rego  (if action=policy)              │
  └───────────────────────────────────────────────────────────────────────────┘

B9. Memory Bank + cryptographic per-tenant isolation

         nexus-auth (KMS)                    Memory Bank                    Span/Run
         ┌─────────────┐                    ┌─────────────┐                 store
         │ per-tenant  │                    │ Postgres +  │
         │  KEK        │◀──wrap/unwrap──────┤ Redis       │
         │ HSM-backed  │                    │ checkpoints │
         │ (cloud)     │                    │             │
         │ TPM-backed  │                    │ keys:       │
         │ (airgap)    │                    │  org/skill/ │
         └─────────────┘                    │  run/user   │
                                            │             │
                                            │ values:     │
                                            │  envelope-  │
                                            │  encrypted  │
                                            │  JSON       │
                                            └─────────────┘
                                                    ▲
                                                    │
           write checkpoint (encrypt w/ DEK; wrap DEK w/ tenant KEK)
                                                    │
     Worker ──────────────────────────────────────── ┤
                                                    │
           read checkpoint (request KEK-unwrap; decrypt DEK; decrypt value)

   Guarantee: observability store can see span tree but NOT reasoning payloads.
              Cross-tenant leak requires breach of nexus-auth KMS + span store.

B10. A2A + MCP dual-plane protocol

                 Nexus v4.0                                     Outside world
   ┌────────────────────────────────────┐           ┌─────────────────────────────┐
   │    AGENT PLANE                     │           │                             │
   │   ┌──────────────────────────┐     │◀──A2A────▶│  Gemini Enterprise agent    │
   │   │  Tier 4 autonomous agent │     │           │  MS Agent Framework agent   │
   │   └──────────────────────────┘     │           │  Third-party crew (CrewAI)  │
   │              │                     │           │  Bedrock AgentCore runtime  │
   │              ▼                     │           └─────────────────────────────┘
   │    TOOL PLANE                      │
   │   ┌──────────────────────────┐     │◀──MCP────▶│  GitHub, Linear, Slack,     │
   │   │  Tier 2/3 skill invokes  │     │           │  SaaS + OSS MCP servers     │
   │   │  MCP server (tool)       │     │           └─────────────────────────────┘
   │   └──────────────────────────┘     │
   │                                    │
   │   MCP = vertical (I use tools)     │
   │   A2A = horizontal (I collaborate) │
   └────────────────────────────────────┘

B11. Structured output + self-correction (typed skill I/O)

      skill contract: input_schema, output_schema (Pydantic / Zod)
              │
              ▼
   ┌──────────────────┐    generate     ┌───────────────┐
   │ LLM              │───────────────▶│ validator     │
   │ (AI Router)      │                │ (schema check)│
   └──────────────────┘                └──┬────────────┘
              ▲                            │ pass ──▶ return
              │                            │
              │ retry ≤R with              │ fail
              │ diagnostic prompt          ▼
              │                      ┌───────────────┐
              └──────────────────────┤ build critique │
                                     │ prompt w/ err  │
                                     └───────────────┘

B12. Optimizer-compiled prompts (DSPy MIPROv2 / GEPA)

   Evaluations (AgentCore-style quality evals)
       │
       ▼
   ┌──────────────┐   compile   ┌──────────────┐
   │ skill source │────────────▶│ optimised    │──▶ stored in registry as
   │ prompt+few-  │   MIPROv2   │ prompt +     │     compiled-artifact v{N}
   │ shots +      │   GEPA      │ few-shots    │
   │ metric fn    │◀────────────│              │     rollback on metric drop
   └──────────────┘   regress   └──────────────┘

B13. Observability — Insights Agent + Polly-NL debug

  Span Store (Postgres tier-hot + ClickHouse tier-cold)
       │
       │
   ┌───┴────────────────────────────────────────┐
   │                                            │
   ▼                                            ▼
 ┌──────────────────┐                    ┌──────────────────┐
 │ INSIGHTS AGENT   │                    │ POLLY-NL DEBUG   │
 │ clusters spans   │                    │ NL query of spans│
 │ into usage       │                    │ "why did run X   │
 │ patterns;        │                    │  fail last night"│
 │ anomaly detect;  │                    │  → summary + links│
 │ cost hotspots    │                    │  to exact spans   │
 └──────────────────┘                    └──────────────────┘
       │                                            │
       ▼                                            ▼
  Dashboard Insights tab                      CLI: nexus debug nl "question"

B14. FinOps governance — pre-dispatch budget + circuit breaker

   Org budget (Postgres): per-org / per-skill / per-user / per-day / per-month
                     │
                     ▼
   ┌─────────────────────────────────────────────────────┐
   │ Dispatch arrives with estimated cost (model+tokens) │
   └─────────────────┬───────────────────────────────────┘
                     │
                     ▼
   ┌─────────────────────────────────────────────────────┐
   │ Check budget: remaining ≥ estimate?                 │
   │   yes ─▶ reserve estimate (Redis atomic counter)    │
   │   no  ─▶ reject dispatch w/ troubleshooting JSON    │
   │         (see NO FALLBACKS contract)                 │
   └─────────────────┬───────────────────────────────────┘
                     │ dispatch proceeds
                     ▼
   On each LLM call: debit actual cost, emit FinOps span
                     │
                     ▼
   Per-run cost cap (Tier 4): if cumulative > cap →
       OnCostThreshold hook → HITL or auto-abort with partial results
                     │
                     ▼
   Circuit breaker: per-skill failure rate > T% in N min →
       open for M min → dispatch rejected with breaker-open error

B15. Deterministic replay + chain-of-custody (C2PA)

   Original run                                   Replay (cryptographically bit-for-bit)
      │                                                         │
      ├─ inputs hashed (SHA-256) ──────┐                        │
      ├─ model version pinned          │                        │
      ├─ prompt template hash          │                        │
      ├─ temperature 0 or              │                        │
      │  seeded stochastic             │                        │
      ├─ tool outputs captured         │─── replay manifest ────▶
      ├─ time freeze (virtual clock)   │     (signed)
      ├─ RNG seeds captured            │
      └─ span tree + hooks logged      │
                                       │
      C2PA manifest on artefact: who signed, which model, which prompt hash,
          which tools, which human approvals, run_id, replay_manifest_id

B16. Airgapped bundle mode — sealed offline deployment

    BUILD (at Adverant, online)                DELIVER (via encrypted USB / sneakernet)
   ┌────────────────────────────┐            ┌────────────────────────────────────────┐
   │ 1. Docker images (signed)  │            │ Tamper-evident seal                    │
   │ 2. K8s manifests           │──tar.gz──▶│ Offline registry manifest              │
   │ 3. Model weights (pinned)  │            │ GPG + sigstore signatures              │
   │ 4. Postgres migrations     │            │ Customer installs with nexus-cli       │
   │ 5. Skill bundle (signed)   │            │ on-prem airgapped cluster              │
   │ 6. SBOM + licenses         │            └────────────────────────────────────────┘
   │ 7. Provisioning TPM KEKs   │                         │
   │ 8. FIPS 140-3 modules      │                         ▼
   │ 9. STIG base images        │            ┌────────────────────────────────────────┐
   │                            │            │ INSTALL (at customer, airgapped)       │
   │ Manifest:                  │            │                                        │
   │   { images: [...],         │            │  nexus airgap install <bundle.tar.gz>  │
   │     skills: [...],         │            │   ├─ verify signatures                 │
   │     models: [...],         │            │   ├─ load images → local registry      │
   │     policies: [...] }      │            │   ├─ apply K8s manifests (no pull)     │
   └────────────────────────────┘            │   ├─ run DB migrations                 │
                                             │   ├─ seed skill registry (pre-signed)  │
                                             │   ├─ TPM-wrap tenant KEKs              │
                                             │   └─ emit readiness event              │
                                             │                                        │
                                             │  nexus airgap update <new-bundle>      │
                                             │   (delta bundle, same verify path)     │
                                             └────────────────────────────────────────┘

B17. Adverant Nexus CLI 2.0 — dispatch + streaming + PCC mirror

  $ nexus login                           # OAuth / PAT
  $ nexus org use <slug>                  # select tenant

  $ nexus dispatch ros.code_edit \
        --input @inputs.json \
        --tier tool_using \
        --provider gemini \
        --model gemini-2.5-pro \
        --risk high \
        --budget 5.00 \
        --tail
     ┌─ streams WS events ────────────────────────────────────────────────────┐
     │  [run_id=abc  trace=xyz]                                               │
     │  ▸ dispatched            (2026-04-24T10:00:01Z)                        │
     │  ▸ skill_resolved        skill=ros.code_edit v3.2.1 sig=✅             │
     │  ▸ hook:PreDispatch      policies=5 passed                             │
     │  ▸ llm_call              provider=gemini model=gemini-2.5-pro         │
     │  ▸ llm_response          tokens=1240 cost=$0.003                       │
     │  ▸ tool_call             write_file /src/foo.ts                       │
     │  ▸ tool_result           ok                                            │
     │  ▸ completed             cost=$0.004 runtime=8.2s artefact=sha256:...  │
     └────────────────────────────────────────────────────────────────────────┘

  $ nexus runs list --since 1h
  $ nexus runs show <run_id> --json
  $ nexus runs replay <run_id>                 # deterministic replay (Gap E)
  $ nexus chain visualize <run_id>             # ascii DAG in-terminal

  $ nexus skill publish ./skill-dir --sign --sbom
  $ nexus skill versions ros.code_edit
  $ nexus skill rollback ros.code_edit v3.2.0

  $ nexus airgap bundle --out ./bundle.tgz --skills all --models all
  $ nexus airgap install ./bundle.tgz
  $ nexus airgap update  ./delta.tgz

  $ nexus governance export --framework soc2         --out ./soc2-audit.tgz
  $ nexus governance export --framework eu-ai-act    --out ./eu-ai-act-conformity.tgz
  $ nexus governance export --framework fedramp-high --out ./fedramp-package.tgz
  $ nexus governance policies list
  $ nexus governance policies apply ./policies/*.rego

  $ nexus hooks list --scope org
  $ nexus hooks apply ./hooks.yaml

  $ nexus finops budgets
  $ nexus finops burn-rate --org my-org --window 7d

  $ nexus memory snapshot <run_id> --decrypt --out ./snapshot.json   # needs KEK grant
  $ nexus memory gc --org my-org --older-than 180d

  $ nexus a2a peers list                           # A2A discovery
  $ nexus a2a call <peer> <capability> --in ...

  $ nexus debug nl "why did run abc fail last night"   # Polly-NL
  $ nexus insights cost-hotspots --window 30d          # Insights Agent

B18. Bindings — UI button to resolved dispatch (v4.0 generalised from ros.skill_bindings)

  USER                 PLUGIN UI               DASHBOARD BFF        BINDINGS SVC         ORCHESTRATOR
   │                       │                         │                   │                     │
   │ click "Score Lead"    │                         │                   │                     │
   │  (bound to key        │                         │                   │                     │
   │   lead.scoring.v2)    │                         │                   │                     │
   ├──────────────────────▶│                         │                   │                     │
   │                       │ POST /bindings/resolve  │                   │                     │
   │                       │  { binding_key,         │                   │                     │
   │                       │    scope_ctx: {         │                   │                     │
   │                       │      user_id, project,  │                   │                     │
   │                       │      org_id },          │                   │                     │
   │                       │    inputs: {...} }      │                   │                     │
   │                       ├────────────────────────▶│                   │                     │
   │                       │                         │ SELECT * FROM     │                     │
   │                       │                         │  ros.skill_bind.. │                     │
   │                       │                         │ ORDER BY scope    │                     │
   │                       │                         │  precedence +     │                     │
   │                       │                         │  priority DESC    │                     │
   │                       │                         │  LIMIT 1          │                     │
   │                       │                         ├──────────────────▶│                     │
   │                       │                         │                   │ resolve skill +     │
   │                       │                         │                   │ merge config        │
   │                       │                         │                   │ overrides           │
   │                       │                         │                   │                     │
   │                       │                         │◀── resolved ──────┤                     │
   │                       │                         │   { skill_id,     │                     │
   │                       │                         │     tier,         │                     │
   │                       │                         │     provider,     │                     │
   │                       │                         │     model,        │                     │
   │                       │                         │     cost_cap,     │                     │
   │                       │                         │     risk_tier,    │                     │
   │                       │                         │     residency,    │                     │
   │                       │                         │     inputs_mapped,│                     │
   │                       │                         │     hooks[],      │                     │
   │                       │                         │     policy_refs[] │                     │
   │                       │                         │   }               │                     │
   │                       │                         │                                         │
   │                       │                         │ POST /dispatch ────────────────────────▶│
   │                       │                         │  (with resolved metadata as dispatch    │
   │                       │                         │   payload; PreDispatch hooks verify     │
   │                       │                         │   cost_cap, residency, risk, export     │
   │                       │                         │   before execution)                     │
   │                       │◀────── run_id + ws tail ─────────────────────────────────────────┤
   │◀── live PCC tile ─────┤                                                                   │

B19. Binding resolution scope hierarchy — "nearest wins, then priority wins"

     Lookup: binding_key = "lead.scoring.v2"

     ┌─────────────────────────────────────────────┐
     │  USER scope      (binding_key, user_id)     │  ← most specific
     ├─────────────────────────────────────────────┤
     │  PROJECT scope   (binding_key, project_id)  │
     ├─────────────────────────────────────────────┤
     │  ORG scope       (binding_key, org_id)      │
     ├─────────────────────────────────────────────┤
     │  SYSTEM scope    (binding_key, scope=system)│  ← most general (Adverant default)
     └─────────────────────────────────────────────┘
                    │
                    ▼
     Within the most-specific matching scope, pick binding with
        max(priority 0-1000)  where is_active=true and deleted_at IS NULL
                    │
                    ▼
     If A/B experiment active on that binding_key:
        split by split_ratio (hash user_id) → variant_a_skill_id | variant_b_skill_id
                    │
                    ▼
     Apply config_overrides hierarchy (skill_definition.config ←
        binding.config_overrides ← caller.runtime_overrides)

B20. Binding metadata v4.0 — the full field set (extended from v3 ros.skill_bindings)

  ┌──────────────────────────── BINDING v4.0 ──────────────────────────────────┐
  │ IDENTITY                                                                   │
  │   id, organization_id, binding_key, scope, scope_id, priority              │
  │                                                                            │
  │ RESOLUTION TARGET                                                          │
  │   skill_definition_id           (→ ros.skill_definitions.id)              │
  │   skill_version_pin             ("latest-minor" | "3.2.1" | "pinned")     │
  │                                                                            │
  │ EXECUTION ★NEW                                                             │
  │   tier                           (1 llm_only | 2 tool_using | 3 chain |   │
  │                                    4 autonomous)                           │
  │   provider_preference            (gemini | anthropic | claude_max |       │
  │                                    openrouter | auto)                      │
  │   model_preference               ("gemini-2.5-pro" | "claude-opus-4" |    │
  │                                    role:fast | role:reasoning | auto)      │
  │   routing_hint                   (fast | reasoning | code | long_context) │
  │   queue_name                     (BullMQ queue override)                   │
  │   response_format                (json | text)                             │
  │                                                                            │
  │ COST & LIMITS ★NEW                                                         │
  │   cost_cap_usd                   (per-run hard ceiling)                    │
  │   daily_cap_usd                  (per-binding per-day)                     │
  │   token_cap_in / token_cap_out   (per-run)                                 │
  │   timeout_ms                                                               │
  │   max_iterations (Tier 2/3/4)                                              │
  │   max_sub_agents (Tier 4)                                                  │
  │                                                                            │
  │ GOVERNANCE ★NEW                                                            │
  │   risk_tier                      (minimal | limited | high | unacceptable)│
  │   data_residency                 (eu_only | us_only | any | <region-tag>) │
  │   export_tags[]                  (EAR / ITAR / dual-use)                   │
  │   requires_hitl                  (bool | on_risk_high)                     │
  │   policy_refs[]                  (OPA bundle refs)                         │
  │   tier_restrictions[]            (starter | growth | enterprise |         │
  │                                    unlimited)                              │
  │   phi_tagged                     (bool, HIPAA)                             │
  │   compliance_frameworks[]        (soc2 | eu-ai-act | iso42001 | hipaa |   │
  │                                    fedramp | nist-airmf)                   │
  │                                                                            │
  │ HOOKS ★NEW                                                                 │
  │   hooks[]                        (PreDispatch | PreToolUse | PostLLMCall |│
  │                                    OnCostThreshold | OnHITLPause …)        │
  │   allowed_tools[]                (tool allowlist for Tier 2/3)            │
  │   denied_tools[]                                                           │
  │                                                                            │
  │ INPUTS & MAPPING ★NEW                                                      │
  │   input_schema                   (JSON-Schema — enforces button payload)  │
  │   inputs_mapping                 (template exprs: {{ selectedEntity.id }})│
  │   output_target                  (where result renders: toast | panel |   │
  │                                    tab | new-window | plugin-callback)    │
  │                                                                            │
  │ UI PRESENTATION ★NEW                                                       │
  │   display_name, description, icon, placement[] (entity-toolbar |          │
  │     batch-action | command-palette | page-header | context-menu)          │
  │   confirmation                   (none | simple | strong | hitl)          │
  │   badge                          (cost preview | tier badge | risk chip)  │
  │   shortcut                       (keybinding, e.g. "mod+shift+s")         │
  │                                                                            │
  │ OBSERVABILITY & EXPERIMENTS                                                │
  │   ab_experiment_id               (→ ros.skill_ab_experiments.id, nullable)│
  │   quality_score_threshold        (auto-deactivate if runtime score <)     │
  │   telemetry_tags{}               (for Insights Agent clustering)           │
  │                                                                            │
  │ LIFECYCLE                                                                  │
  │   status (active|inactive|deprecated), is_active, deleted_at              │
  │   created_by, created_at, updated_by, updated_at                           │
  │   config_overrides (JSONB catch-all for forward-compat)                    │
  │   conditions{}  (contextual match: agent_role, job_type, entity_type)     │
  └────────────────────────────────────────────────────────────────────────────┘

B21. nexus.manifest.json declarative actions (plugin authors publish button defaults)

  {
    "plugin": { "slug": "leads", "version": "2.1.0" },
    "actions": [
      {
        "id": "lead-score-quick",
        "display_name": "Score Lead",
        "description": "Run lead scoring on selected entity",
        "binding_key": "lead.scoring.v2",
        "default_skill_id": "uuid-of-scoring-skill",
        "default_tier": 2,
        "default_provider": "auto",
        "default_model": "role:reasoning",
        "cost_cap_usd": 0.50,
        "risk_tier": "limited",
        "data_residency": "any",
        "requires_hitl": false,
        "input_schema": {
          "type": "object",
          "required": ["entity_id"],
          "properties": {
            "entity_id": { "type": "string", "format": "uuid" }
          }
        },
        "inputs_mapping": {
          "entity_id": "{{ selectedEntity.id }}",
          "enrichment_ctx": "{{ pageContext.enrichmentFlags }}"
        },
        "placement": ["entity-toolbar", "batch-action"],
        "icon": "zap",
        "output_target": "side-panel",
        "confirmation": "none",
        "badge": "cost-preview",
        "shortcut": "mod+shift+l",
        "compliance_frameworks": ["soc2"],
        "allowed_tools": ["crm_lookup", "web_fetch"]
      }
    ]
  }

  On install the plugin's actions[] seed SYSTEM-scope bindings with
     scope=system, priority=100. Org admins may override at org scope,
     power users at project/user scope. User overrides ≠ source code changes.

4.C User Journeys (end-to-end, including existing + new capabilities)

C1. End-user dispatches a skill from dashboard

 USER                   DASHBOARD                 ORCHESTRATOR              PCC PANEL
  │                         │                            │                      │
  │ click "Generate Report" │                            │                      │
  ├────────────────────────▶│                            │                      │
  │                         │ POST /api/dispatch         │                      │
  │                         ├───────────────────────────▶│                      │
  │                         │                            │ PreDispatch hooks    │
  │                         │                            │ policy pass          │
  │                         │                            │ budget reserve       │
  │                         │                            │ enqueue              │
  │                         │◀─── 202 {run_id,trace} ────┤                      │
  │                         │                            │ ws: dispatched       │
  │                         │──── register TrackedJob ───────────────────────▶ │
  │                         │                            │ ws: skill_resolved   │
  │                         │                            │ ws: llm_call         │
  │                         │                            │ ws: llm_response     │
  │                         │                            │ ws: tool_call        │
  │                         │                            │ ws: tool_result      │
  │                         │                            │ ws: completed        │
  │                         │                            │                      │
  │ see live progress   ◀───────────── PCC panel renders every event ──────────┤
  │ see cost counter    ◀───────────── FinOps span streams cost ───────────────┤
  │ see thinking log    ◀───────────── ReAct spans stream tool+LLM ────────────┤
  │                         │                                                   │
  │ click "Replay"          │                                                   │
  ├────────────────────────▶│  GET /runs/{id}/replay ────▶ replay manifest ▶ deterministic rerun

C2. Developer publishes a skill to marketplace

 DEVELOPER                   CLI                      MARKETPLACE           REGISTRY
  │                           │                           │                    │
  │ write SKILL.md v2         │                           │                    │
  │ write input/output        │                           │                    │
  │ schema + adversarial evals│                           │                    │
  │                           │                           │                    │
  │ nexus skill publish       │                           │                    │
  ├──────────────────────────▶│                           │                    │
  │                           │ lint ─ SBOM ─ sign        │                    │
  │                           │ adversarial-eval run      │                    │
  │                           │ risk-tier classify        │                    │
  │                           │ MITRE/OWASP tag           │                    │
  │                           │                           │                    │
  │                           │ POST /marketplace/skills  │                    │
  │                           ├──────────────────────────▶│                    │
  │                           │                           │ verify sig         │
  │                           │                           │ verify SBOM        │
  │                           │                           │ INSERT skill_reg   │
  │                           │                           ├───────────────────▶│
  │                           │                           │ broadcast update   │
  │                           │                           │ quality-score init │
  │                           │◀── ok {id, version} ──────┤                    │
  │◀── ok ────────────────────┤                           │                    │
  │                           │                                                │
  │                           │ runtime telemetry rolls quality-score          │
  │                           │ auto-rollback if score drops below threshold   │

C3. Admin configures tenant — providers, quotas, governance

 ADMIN                       DASHBOARD                   AUTH DB / POLICY ENGINE
  │                               │                                │
  │ Settings → Providers          │                                │
  │ set default=Gemini            │                                │
  │ set reasoning=Claude-Sonnet4  │                                │
  │ set code=Claude-Opus4         │                                │
  ├──────────────────────────────▶│                                │
  │                               │ PUT /org/ai-config             │
  │                               ├───────────────────────────────▶│
  │                               │                                │ AES-256 store keys
  │                               │                                │ update role map
  │                               │                                │
  │ Settings → FinOps             │                                │
  │ set daily budget $500         │                                │
  │ set per-skill caps            │                                │
  ├──────────────────────────────▶│                                │
  │                               │ PUT /org/finops                │
  │                               ├───────────────────────────────▶│
  │                                                                │
  │ Settings → Governance          │                                │
  │ select frameworks:             │                                │
  │   [x] SOC 2  [x] EU AI Act    │                                │
  │   [x] HIPAA [x] NIST AI RMF   │                                │
  │ select residency: EU only     │                                │
  ├──────────────────────────────▶│ PUT /org/governance ──────────▶│ OPA bundle assembled
  │                                                                │ distributed to services
  │                                                                │
  │ Settings → Hooks              │                                │
  │ apply hook YAML               │                                │
  ├──────────────────────────────▶│ PUT /org/hooks ──────────────▶│ hook manifest live

C4. Auditor exports compliance package

 AUDITOR                     CLI                      GOVERNANCE SVC       EVIDENCE BUCKET
  │                           │                            │                      │
  │ nexus governance export   │                            │                      │
  │   --framework soc2        │                            │                      │
  │   --window 2026-Q1        │                            │                      │
  │   --out soc2-audit.tgz    │                            │                      │
  ├──────────────────────────▶│                            │                      │
  │                           │ POST /governance/export    │                      │
  │                           ├───────────────────────────▶│                      │
  │                           │                            │ query spans          │
  │                           │                            │ query policy versions│
  │                           │                            │ query HITL decisions │
  │                           │                            │ query risk-tier hits │
  │                           │                            │ assemble traceability│
  │                           │                            │ control-ID matrix    │
  │                           │                            ├─────────────────────▶│
  │                           │                            │ sign package (GPG)   │
  │                           │◀──── signed .tgz ──────────┤                      │
  │◀── download ──────────────┤                            │                      │

C5. Airgapped customer installs + operates

 CUSTOMER OPS                 CLI                      AIRGAPPED K8S
  │                           │                            │
  │ copy bundle.tgz via USB   │                            │
  │                           │                            │
  │ nexus airgap install      │                            │
  │   --bundle ./bundle.tgz   │                            │
  ├──────────────────────────▶│                            │
  │                           │ verify sigs (sigstore)     │
  │                           │ verify SBOM vs allow-list  │
  │                           │ load images → local reg    │
  │                           │ apply K8s manifests        │
  │                           │ run Postgres migrations    │
  │                           │ seed skill registry        │
  │                           │ TPM-wrap tenant KEKs       │
  │                           ├───────────────────────────▶│
  │                           │                            │ pods Ready
  │                           │◀── readiness event ────────┤
  │                                                        │
  │  Operate completely offline. nexus dispatch / runs tail / governance export
  │  all work; A2A restricted to local peers; no external calls possible.
  │
  │  nexus airgap update --delta ./delta.tgz                (monthly patch bundle)

C6. HITL reviewer approves a high-risk autonomous run

 TIER-4 RUN                  HITL INBOX (PCC)          REVIEWER
      │                             │                     │
      │ reach waitpoint             │                     │
      │ risk=high                   │                     │
      ├────────────────────────────▶│                     │
      │                             │ alert + evidence    │
      │                             │  - plan             │
      │                             │  - sub-agent spans  │
      │                             │  - tool calls       │
      │                             │  - cost so far      │
      │                             │  - residency tags   │
      │                             │  - schema diff      │
      │                             ├────────────────────▶│
      │                             │                     │
      │                             │                     │ review evidence
      │                             │                     │ decide
      │                             │                     │
      │                             │◀─── approve/reject ─┤
      │◀── OnHITLResume ────────────┤      with note      │
      │   continue OR replan        │                     │
      │                             │                     │
      │ Audit: decision + reviewer identity + policy version recorded as span

C7. Developer uses CLI to dispatch + tail + debug

 $ nexus dispatch ros.refactor --in @issue-123.json --tail
   (stream ...)
   ▸ completed  run_id=r-789  artefact=sha256:beef...

 $ nexus runs show r-789 --spans-tree
   root ─ dispatch
      ├─ hook:PreDispatch [5 policies]
      ├─ skill_resolve
      ├─ tier_selected: tool_using
      ├─ iter=0
      │   ├─ llm_call model=gemini-2.5-pro tokens=1,240 $0.003
      │   └─ tool:write_file /src/foo.ts
      ├─ iter=1
      │   └─ llm_call (final answer)
      ├─ hook:PostDispatch
      └─ sign_c2pa

 $ nexus debug nl "why was iter=0 slow?"
   Insights: 3.2s spent in tool:write_file (p95 0.4s). Network stall to
             sandbox filesystem. See span 0xabc for details.

 $ nexus runs replay r-789            # deterministic bit-for-bit
 $ nexus runs export   r-789 --c2pa   # artefact + provenance manifest

C8. Power user reconfigures a plugin button via the Binding Editor (no code deploy)

 POWER USER                DASHBOARD                BINDINGS SVC          POLICY ENGINE
     │                         │                          │                      │
     │ open plugin "Leads"     │                          │                      │
     │ right-click "Score Lead"│                          │                      │
     │ → "Edit action..."      │                          │                      │
     ├────────────────────────▶│                          │                      │
     │                         │ GET /bindings/resolve?   │                      │
     │                         │    key=lead.scoring.v2   │                      │
     │                         │    scope_ctx=me          │                      │
     │                         ├─────────────────────────▶│                      │
     │                         │◀─ current resolution ────┤                      │
     │                         │   (system default inherited)                    │
     │                         │                                                 │
     │ Binding Editor opens:                                                     │
     │   change tier:       2 → 3 (chain, chapter-style subtasks)                │
     │   change model:      auto → claude-opus-4                                 │
     │   cost_cap:          $0.50 → $2.00                                        │
     │   requires_hitl:     false → on_risk_high                                 │
     │   allowed_tools:     + crm.bulk_update                                    │
     │   placement:         + command-palette                                    │
     │   shortcut:          mod+shift+l                                          │
     │   scope:             user                                                 │
     │   priority:          300                                                  │
     │                                                                           │
     │ click Save              │                          │                      │
     ├────────────────────────▶│ POST /bindings (scope=   │                      │
     │                         │   user, user_id=me, …)   │                      │
     │                         ├─────────────────────────▶│                      │
     │                         │                          │ validate against     │
     │                         │                          │ org policy ─────────▶│
     │                         │                          │◀──── allow? ─────────┤
     │                         │                          │ (e.g. org disallows  │
     │                         │                          │  claude-opus for PHI-│
     │                         │                          │  tagged skills →     │
     │                         │                          │  override rejected;  │
     │                         │                          │  reason returned)    │
     │                         │◀── 201 Created (id) ─────┤                      │
     │  ◀── toast "saved" ─────┤                                                 │
     │                                                                           │
     │ click button again → next dispatch uses the user-scoped binding (highest  │
     │ precedence), which overrides the system default until removed.            │
     │                                                                           │
     │ Every binding change is an audit row: who/when/what/policy-verdict.       │
     │ Config-drift detector flags bindings whose quality-score drops post-edit. │

4.D UI/UX Elements (dashboard + PCC + CLI surfaces)

D1. Dashboard layout (v4.0 additions marked ★)

 ┌────────────────────────────────────────────────────────────────────────────────────┐
 │ adverant nexus          Org: ACME ▾   User: Jane    🔔 2 HITL★    💰 $213/500     │
 ├────────────────────────────────────────────────────────────────────────────────────┤
 │ ┌──────────────┐  ┌─────────────────────────────────────────────────────────────┐ │
 │ │ SIDEBAR      │  │  MAIN                                                       │ │
 │ │ ● Home       │  │                                                             │ │
 │ │ ● Plugins    │  │    ┌───────────────────────────────────────────────────┐    │ │
 │ │   - ROS      │  │    │  active plugin workspace                          │    │ │
 │ │   - Prose    │  │    │                                                   │    │ │
 │ │   - QA       │  │    │                                                   │    │ │
 │ │ ● Marketplace★│ │    │                                                   │    │ │
 │ │ ● Workflows  │  │    └───────────────────────────────────────────────────┘    │ │
 │ │ ● Skills★    │  │                                                             │ │
 │ │ ● Chains★    │  │  ┌──── PCC PANEL (dockable) ────────────────────────────┐  │ │
 │ │ ● Insights★  │  │  │  active runs | HITL inbox★ | replay★ | finops★       │  │ │
 │ │ ● Governance★│  │  └──────────────────────────────────────────────────────┘  │ │
 │ │ ● FinOps★    │  └─────────────────────────────────────────────────────────────┘ │
 │ │ ● Settings   │                                                                  │
 │ └──────────────┘                                                                  │
 └────────────────────────────────────────────────────────────────────────────────────┘

D2. PCC panel (v4.0 TrackedJob+)

 ┌─────────────────────────── Progress Command Center ────────────────────────────┐
 │  ◉ r-789   ros.refactor   tool_using   iter 2/8   $0.04/$5.00   🟢 running      │
 │  ◉ r-790   prose.draft    chain        step 3/5   $0.12/$2.00   🟡 HITL wait    │
 │  ◉ r-791   qa.regression  autonomous   agent 2/4  $2.80/$10.00  🔴 budget near  │
 ├─────────────────────────────────────────────────────────────────────────────────┤
 │  SELECTED RUN: r-789                                                            │
 │                                                                                 │
 │  Tabs:  Progress | Spans | Thinking | Tools | Hooks★ | Cost | Policy | Replay★ │
 │                                                                                 │
 │  [Spans view — live tree]                                                       │
 │    dispatch                                                                     │
 │    ├─ hook:PreDispatch ✓                                                        │
 │    ├─ skill_resolve ros.refactor v3.2.1 ✅sig                                   │
 │    ├─ iter 0                                                                    │
 │    │   ├─ llm_call     gemini-2.5-pro  1.2s  $0.003                             │
 │    │   └─ tool:write   /src/foo.ts     0.4s                                     │
 │    └─ iter 1 (running)                                                          │
 │        └─ llm_call     gemini-2.5-pro  ...                                      │
 │                                                                                 │
 │  [FinOps bar]  ████████░░░░░░░░  $0.04 / $5.00 budget                           │
 │  [Policy bar]  ✅ residency:eu  ✅ risk:limited  ✅ budget  ✅ export            │
 └─────────────────────────────────────────────────────────────────────────────────┘

D3. Governance tab — compliance control dashboard

 ┌─────────────────────────── GOVERNANCE ─────────────────────────────────────────┐
 │  Frameworks enabled:  [x] SOC 2   [x] ISO 27001   [x] ISO 42001                │
 │                       [x] EU AI Act   [x] NIST AI RMF   [x] HIPAA              │
 │                       [x] FedRAMP Moderate   [ ] FedRAMP High                  │
 │                                                                                │
 │  Coverage:                                                                     │
 │   EU AI Act             ████████████████████  100%  (27/27 controls mapped)    │
 │   SOC 2 CC              ████████████████░░░░   85%  (58/68 controls mapped)    │
 │   ISO 42001             ██████████████████░░   92%  (36/39 mapped)             │
 │                                                                                │
 │  Recent events:                                                                │
 │   • risk=high run r-791 → HITL approved by alice@acme 2h ago                   │
 │   • residency violation attempt blocked 9h ago (policy eu-only)                │
 │   • 3 skills quality-score dropped → auto-rolled back yesterday                │
 │                                                                                │
 │  Export:  [ Download SOC 2 package ] [ EU AI Act conformity ] [ FedRAMP ]      │
 │                                                                                │
 │  Policy engine:  42 active policies • last update 3h ago • OPA v1.3            │
 └────────────────────────────────────────────────────────────────────────────────┘

D4. Marketplace UI — skill browsing

 ┌─────────────────────────── SKILL MARKETPLACE ──────────────────────────────────┐
 │ 🔍 search...                                            sort: quality ▾        │
 │                                                                                │
 │  ros.code_edit              v3.2.1  ⭐4.9  ✅signed  ⭐qual 0.94  low risk     │
 │  ros.refactor               v1.4.0  ⭐4.8  ✅signed  ⭐qual 0.91  low risk     │
 │  prose.chapter_write        v2.0.1  ⭐4.7  ✅signed  ⭐qual 0.88  limited risk │
 │  qa.pentest                 v0.9.3  ⭐4.2  ✅signed  ⭐qual 0.76  HIGH risk★   │
 │                                                                                │
 │ Selected: ros.code_edit v3.2.1                                                 │
 │   Publisher: Adverant Inc • SBOM: ✅ • CVEs: 0 • License: Apache-2.0           │
 │   MITRE ATLAS: AML.T0015 | OWASP LLM01 mitigated | EU AI Act: limited-risk    │
 │   Runtime quality last 30d: 0.94 (trending +0.01)                              │
 │   [ Install ]   [ Pin version ]   [ View source ]   [ History ]                │
 └────────────────────────────────────────────────────────────────────────────────┘

D5. Chain visualizer — DAG editor/viewer

 ┌─────────────────────────── CHAIN: prose.full_book ─────────────────────────────┐
 │                                                                                │
 │         [A: outline]────┬────▶[B: chapter1]────▶                               │
 │                         │                      ┌▶[E: compile]                  │
 │                         ├────▶[C: chapter2]────┤                               │
 │                         │                      │                               │
 │                         └────▶[D: chapter3]────┘                               │
 │                                                                                │
 │   status: A✅  B✅  C🟢running  D⏳queued  E⏳waiting                            │
 │   checkpointed: ✅ (resumable from worker failure)                             │
 │                                                                                │
 │   [ Open editor ]    [ Re-run failed steps ]    [ Replay deterministic ]       │
 └────────────────────────────────────────────────────────────────────────────────┘

D6. Span tree explorer + Polly-NL debug

 ┌─────────────────────────── SPAN EXPLORER r-789 ────────────────────────────────┐
 │ Tree                                         Details (click any span)          │
 │ ─────────────────────────────────            ──────────────────────────────    │
 │ ● dispatch              2.1s                 span_id: 0xab12                   │
 │   ├● hook:PreDispatch   0.04s                type: llm_call                    │
 │   ├● skill_resolve      0.11s                provider: gemini                  │
 │   ├● iter 0             1.2s                 model: gemini-2.5-pro             │
 │   │  ├● llm_call        0.9s    ◀── sel     tokens_in: 1240                   │
 │   │  └● tool:write      0.3s                 tokens_out: 420                   │
 │   ├● iter 1             0.4s                 cost: $0.003                      │
 │   └● sign_c2pa          0.02s                prompt_hash: sha256:...           │
 │                                              model_version: 2025-11            │
 │ NL debug box:                                                                  │
 │ ┌──────────────────────────────────────────────────────────────────────────┐   │
 │ │ "why was iter 0 slow?"                                                   │   │
 │ │ ──────────────────────────────────────────────────────────────────────── │   │
 │ │ tool:write took 0.3s vs p95 0.08s. Sandbox FS stall. See span 0xabcd.   │   │
 │ └──────────────────────────────────────────────────────────────────────────┘   │
 └────────────────────────────────────────────────────────────────────────────────┘

D7. FinOps dashboard

 ┌─────────────────────────── FINOPS ─────────────────────────────────────────────┐
 │  This month                                                                    │
 │    Spent      $1,234.56 / $5,000.00         ████████░░░░░░░░░░░░  24.7%        │
 │    Burn rate  $42/day (trending +8%)                                           │
 │                                                                                │
 │  By skill (top 5)                            By provider                       │
 │   ros.code_edit       $421                   Gemini          $812              │
 │   prose.chapter_write $389                   Anthropic       $298              │
 │   qa.regression       $176                   Claude Max      $84               │
 │   prose.outline       $154                   OpenRouter      $40               │
 │   ros.refactor        $94                                                      │
 │                                                                                │
 │  Circuit breakers:  2 open  (qa.vision-heavy, prose.critic)                    │
 │  Alerts: 1 skill trending to budget exhaustion in 6 days                       │
 │                                                                                │
 │  [ Set caps ]  [ View cost by user ]  [ View cost by tier ]  [ Export CSV ]   │
 └────────────────────────────────────────────────────────────────────────────────┘

D9. Binding Editor — visual editor for any button in any plugin

 ┌─────────────────────────── BINDING EDITOR ────────────────────────────────────┐
 │  Binding key: lead.scoring.v2             Scope: [user ▾]  Priority: [300]    │
 │  Status: [active ▾]                       Pin version: [latest-minor ▾]       │
 │                                                                               │
 │  ─── TARGET SKILL ──────────────────────────────────────────────────────────  │
 │  Skill:  ros.lead_score          ▾   v3.2.1 ✅signed  ⭐0.94                   │
 │                                                                               │
 │  ─── EXECUTION ─────────────────────────────────────────────────────────────  │
 │  Tier:          (○)1 llm_only   (●)2 tool_using   (○)3 chain   (○)4 autonomous│
 │  Provider:      [ auto ▾ ]   Model: [ role:reasoning ▾ ]                      │
 │  Routing hint:  [ reasoning ▾ ]   Queue: [ default ▾ ]                        │
 │  Response fmt:  [ json ▾ ]                                                    │
 │                                                                               │
 │  ─── COST & LIMITS ─────────────────────────────────────────────────────────  │
 │  Cost cap per run:   [ $2.00 ]           Daily cap: [ $200 ]                  │
 │  Token in/out:       [ 50k / 10k ]       Timeout: [ 120s ]                    │
 │  Max iterations:     [ 8 ]               Max sub-agents: [ n/a ]              │
 │                                                                               │
 │  ─── GOVERNANCE ────────────────────────────────────────────────────────────  │
 │  Risk tier:   (●)limited  (○)high                 [ ] requires HITL always    │
 │  Residency:   [ eu_only ▾ ]                       [x] on risk=high only       │
 │  Export:      [ ] EAR   [ ] ITAR                  [x] PHI-tagged (HIPAA)      │
 │  Policies:    [ eu-ai-act ] [ soc2 ] [ iso42001 ]  +add                       │
 │                                                                               │
 │  ─── HOOKS & TOOLS ─────────────────────────────────────────────────────────  │
 │  Hooks:     [ PreDispatch ] [ PreToolUse ] [ OnCostThreshold ]  +add          │
 │  Allowed:   [crm_lookup] [web_fetch] [write_notes]  +add                      │
 │  Denied:    [crm_bulk_delete] [exec_shell]                                    │
 │                                                                               │
 │  ─── INPUTS MAPPING ────────────────────────────────────────────────────────  │
 │  Schema:    required: entity_id  (validated per dispatch)                     │
 │  Mapping:   entity_id     ← {{ selectedEntity.id }}                           │
 │             enrich_ctx    ← {{ pageContext.enrichmentFlags }}                 │
 │             user_locale   ← {{ currentUser.locale }}                          │
 │  Output:    [ side-panel ▾ ]                                                  │
 │                                                                               │
 │  ─── PRESENTATION ──────────────────────────────────────────────────────────  │
 │  Display:   "Score Lead"    Icon: [zap ▾]   Shortcut: [ mod+shift+l ]         │
 │  Placement: [x] entity-toolbar   [x] batch-action   [ ] page-header           │
 │  Confirm:   (○)none  (●)simple  (○)strong  (○)hitl                            │
 │  Badge:     [x] cost-preview  [x] tier-badge  [ ] risk-chip                   │
 │                                                                               │
 │  ─── A/B EXPERIMENT ────────────────────────────────────────────────────────  │
 │  [ + Start A/B ]   Current: none                                              │
 │                                                                               │
 │  ─── TELEMETRY ─────────────────────────────────────────────────────────────  │
 │  Live quality score: 0.91 (30d)   Cost avg: $0.38   p95 latency: 4.2s         │
 │  Auto-deactivate if score < [ 0.75 ]                                          │
 │                                                                               │
 │  [ Save (scope: user) ]  [ Preview dispatch ]  [ Diff vs system default ]     │
 └───────────────────────────────────────────────────────────────────────────────┘

D8. CLI interactive REPL (nexus shell)

 $ nexus shell
 nexus> help
   dispatch, runs, skills, chains, airgap, governance, finops,
   hooks, memory, a2a, insights, debug, session, org, login

 nexus(ACME)> runs tail --since 5m
   ▸ r-801 ros.refactor     running   $0.01
   ▸ r-802 prose.chapter    HITL      $0.08
   ▸ r-803 qa.regression    completed $2.40

 nexus(ACME)> debug nl "cost hotspots last hour"
   Insights: qa.regression is 58% of last-hour spend.
   Suggest: switch qa.regression default model to gemini-flash (-$1.80/run).

 nexus(ACME)> apply suggest 1
   Applied. New default model for qa.regression: gemini-flash.

4.E Compliance + Security Integration

E1. EU AI Act risk-tier enforcement

  skill metadata (registry)       dispatch time                  runtime
  ─────────────────────────        ─────────────                   ──────
  risk_tier: unacceptable ──────────── REJECT at dispatch ──────── (never executes)
  risk_tier: high ─────────────────── PreDispatch hook: require HITL ── after run: conformity record
                                                                       + adversarial-eval + post-market monitoring
  risk_tier: limited ──────────────── PreDispatch hook: transparency notice ── output: watermark + model-card link
  risk_tier: minimal ──────────────── (no extra gate)

E2. GDPR right-to-erasure atomic delete

  nexus erase-subject --user <uid>
      │
      ▼
  Orchestrator: open erasure-job (Tier 3 chain)
      ├── Postgres   DELETE FROM * WHERE user_id=...  (RLS scoped)
      ├── Qdrant     delete points filter payload.user_id=...
      ├── Neo4j      MATCH (n) WHERE n.user_id=... DETACH DELETE n
      ├── Memory Bank delete envelope + rotate tenant KEK (crypto-erasure)
      ├── Object store  delete artefacts + C2PA manifests
      └── Backups     schedule retention-policy purge
      │
      ▼
  Emit erasure-evidence span + sign certificate of erasure ──▶ report to DPO

E3. OWASP LLM Top 10 defense stack (per-dispatch)

  Request ──▶ LLM01 prompt-injection classifier (PreDispatch hook)
         ──▶ LLM03 training-data-poisoning: skill SBOM + origin check
         ──▶ LLM04 denial-of-wallet: FinOps pre-reserve
         ──▶ LLM05 supply-chain: signed skills + SBOM + pinned models
         ──▶ LLM06 sensitive-disclosure: output scanner (PostLLMCall hook)
         ──▶ LLM07 insecure-plugin-design: hook/tool allowlist + scope
         ──▶ LLM08 excessive-agency: capability allowlist per skill+role
         ──▶ LLM09 overreliance: watermark + model card + human-oversight flag
         ──▶ LLM02 insecure-output: structured-output validator
         ──▶ LLM10 model-theft: rate limit + auth + airgapped option

E4. Envelope encryption, per-tenant KEKs

   Data ──(DEK random)──▶ ciphertext  (stored in span store / memory bank / object store)
   DEK ──(tenant KEK)───▶ wrapped DEK (stored alongside ciphertext)
   tenant KEK: held by nexus-auth KMS
               cloud profile: HSM-backed  (FIPS 140-3)
               airgap profile: TPM-backed
               rotation: quarterly (policy) or on-demand

   Read path:  unwrap DEK via tenant KEK (auth to KMS) → decrypt ciphertext
   Erasure:    delete/rotate tenant KEK → all wrapped DEKs unusable → crypto-erasure

E5. Three-gate enforcement (Istio + service key + policy engine)

  Caller (e.g., nexus-workflows) ──▶ Gateway (AI Router)
                                       ├─ Gate 1: Istio AuthorizationPolicy (SPIFFE)
                                       ├─ Gate 2: validateServiceKey (HMAC header)
                                       └─ Gate 3: OPA policy evaluation (per-request)
                                                  eval: caller ∈ ALLOWED_AI_CALLERS
                                                  eval: org.residency compatible with provider.region
                                                  eval: org.budget has headroom
                                                  eval: skill.risk_tier allowed by org.policy
                                                  eval: export-control tags compatible

E6. OPA/Rego policy evaluation flow

  Dispatch ──▶ assemble decision input:
                { org, user, skill, tier, provider, model, inputs-schema,
                  risk_tier, data_residency, export_tags, cost_estimate,
                  time_of_day, caller_service }
            ──▶ POST /opa/v1/data/nexus/dispatch/allow
            ──▶ { result: allow | deny, reasons: [...], conditions: [...] }
            ──▶ if allow with conditions, attach to run (e.g. "require HITL",
                "restrict tools to allowlist", "redact PII in output")

E8. Binding-level policy enforcement (user overrides can't weaken org policy)

   User saves a user-scope binding that overrides:
     ─ provider: gemini → claude-max
     ─ cost_cap: $0.50 → $10.00
     ─ requires_hitl: true → false
                    │
                    ▼
   Bindings svc → POST /opa/v1/data/nexus/bindings/allow_override
     input: { org_policy, proposed_binding, existing_binding,
              skill_metadata, user_role }
                    │
                    ▼
   OPA rules applied:
     ─ org.residency must not be widened (eu_only ≮ any)
     ─ org.phi_required_providers must include proposed provider
     ─ org.max_cost_cap must be ≥ proposed cost_cap
     ─ org.hitl_mandatory_for_high_risk must be honoured
     ─ org.allowed_tools must cover proposed allowed_tools
     ─ user.role must have `bindings:write:<scope>` permission
                    │
                    ▼
   allow  → insert binding, emit audit span, broadcast marketplace:binding_updated
   deny   → return { code, message, troubleshooting[] } (NO FALLBACKS contract)

E7. C2PA content provenance on every artefact

  Artefact bytes
       │
       ▼
  C2PA manifest v2 attached (or sidecar):
    claim_generator: adverant-nexus/4.0.0
    actions: [
      { action: c2pa.created,  software_agent: "skill ros.refactor v3.2.1" },
      { action: c2pa.edited,   software_agent: "gemini-2.5-pro" },
      { action: c2pa.reviewed, software_agent: "human:alice@acme" }
    ]
    ingredients: [ input1.pdf hash, tool-output1 hash, ... ]
    run_id: r-789
    replay_manifest_id: rm-abc
    signature: sigstore / tenant-key

4.F Deployment Profiles

F1. Public cloud multi-tenant

 Internet ──▶ Cloudflare ──▶ Istio Ingress ──▶ Services (shared)
              WAF              mTLS              │
              DDoS             AuthZ             ├─▶ shared Postgres (RLS)
                                                 ├─▶ shared Redis (per-org channels)
                                                 ├─▶ shared Qdrant (per-tenant namespace)
                                                 ├─▶ shared Neo4j (per-org labels)
                                                 └─▶ shared AI Router (per-org keys)
  ALL orgs isolated at DB/WS/auth layer; no hard VM isolation.

F2. Single-tenant VDS

 Customer VPC ──▶ Single-tenant nexus stack on their VDS
                  Same images, same manifests
                  One org_id value, simplified RLS
                  Keys live only in this VDS's nexus-auth
                  Can still talk to public providers OR use customer's own endpoints

F3. On-premise Kubernetes

 Customer data centre ──▶ K8s cluster runs nexus stack
                          Images mirrored from Adverant registry
                          May use customer's own LLM endpoints (Azure OpenAI, internal Llama)
                          SSO → customer IdP
                          Backups on-prem
                          Policy bundle from customer's OPA repo

F4. Airgapped sealed bundle

 Isolated network ──▶ K8s offline
                      Offline image registry pre-loaded
                      Pinned model weights on local GPU
                      No outbound connectivity at all
                      nexus-auth TPM-backed
                      Monthly delta bundles via USB
                      A2A restricted to local peers
                      FedRAMP/DoD/classified use cases

9. Fifty Use Cases

Each use case specifies trigger / tier / hooks / compliance / outcome. Seven cases exercise the Bindings primitive (7.15) explicitly.

Tier 1 — llm_only

Regulatory PDF → one-page brief. Trigger: user-initiated on regulatory-docs plugin. Tier 1. Hooks: PreDispatch residency=eu, PostLLMCall output-watermark. Compliance: EU AI Act limited, GDPR eu_only. Outcome: C2PA-signed JSON summary.
Translate support ticket into customer locale. Trigger: ticket-create webhook. Tier 1, role=fast. Hooks: FinOps "high-volume", cache-read. Compliance: GDPR. Outcome: translated string, 80% cache hit cost saving.
Button binding — "Classify document". Trigger: user clicks "Classify" on the docs plugin toolbar. The doc.classify.v1 binding pins Tier 1, model=haiku, cost_cap=$0.002, output=side-panel. Hooks: PreDispatch budget. Compliance: EU AI Act minimal. Outcome: classification ("contract / invoice / resume") in the side panel.

Tier 2 — tool_using

Research a competitor and produce a comparison table. Trigger: marketing analyst. Tier 2, tools: web_search, web_fetch. Hooks: PreToolUse blocks internal-domain fetch. Compliance: GDPR, export controls. Outcome: markdown table, C2PA-signed.
Refactor a TypeScript file to pass the type-checker. Trigger: engineer. Tier 2, tools: read_file, write_file, run_tsc. Hooks: MITRE ATLAS mitigations active, PostLLMCall PII scan. Compliance: SOC 2. Outcome: modified file, green tsc.
Triage a GitHub issue and propose labels plus assignee. Trigger: issue webhook. Tier 2, tools: gh MCP server. Hooks: require_human_confirm before label change. Compliance: SOC 2. Outcome: suggested labels pending human approval.
Pull last-week AWS cost anomalies and explain them. Trigger: weekly cron. Tier 2, tools: aws-cost-explorer MCP. Hooks: cost-capped. Compliance: FinOps. Outcome: narrative report.
Binding-driven k8s triage button. Trigger: context-menu on a pod row in ops plugin. Binding ops.k8s_triage forces namespace-allowlist hook, blocks prod by default. Inputs mapping pulls pod_name, namespace, cluster. Tier 2, tools: kubectl MCP. Hooks: PreToolUse denies prod unless HITL. Compliance: SOC 2. Outcome: diagnostic plus safe remediation proposal.

Tier 3 — chain

Full novel draft (outline → chapters → compile). Trigger: prose plugin user. Tier 3, 5-step DAG with parallel chapter generation. Hooks: PreChainStep cost cap per step. Compliance: C2PA on final PDF. Outcome: signed novel draft PDF, resumable after worker crash.
End-to-end security audit report. Trigger: security lead. Tier 3, steps: inventory → SAST → DAST → LLM analysis → report. Hooks: per-step risk-tier gate, OnHITLPause for high-risk findings. Compliance: SOC 2, ISO 27001. Outcome: signed audit report plus evidence.
Customer-onboarding workflow. Trigger: new customer. Tier 3, HITL waitpoint on KYC branch. Hooks: residency-gated storage. Compliance: GDPR, HIPAA if applicable. Outcome: onboarded customer with KYC evidence.
A/B-tested binding. Trigger: marketing sends personalized emails. Binding marketing.subject_line.v3 runs 50/50 A/B across two skill variants for 10 000 dispatches. Hooks: PostLLMCall open-rate tracker. Compliance: CAN-SPAM, GDPR. Outcome: Insights Agent auto-promotes winner; loser archived.
Monthly compliance evidence roll-up. Trigger: cron first-of-month. Tier 3, span queries → control mapping → DOC/PDF export → GPG sign. Compliance: SOC 2, ISO 27001, ISO 42001. Outcome: signed evidence tarball.

Tier 4 — autonomous

Autonomous pentest of a staging environment. Trigger: security lead. Tier 4, sub-agents: recon, exploit, report. Hooks: cost cap $50, HITL before any exploit, OnCostThreshold page. Compliance: MITRE ATLAS, SOC 2. Outcome: signed pentest report with human approval trail.
Long-horizon literature review. Trigger: researcher. Tier 4, Memory Bank across sessions. Hooks: OnHITLPause on replan. Compliance: export controls check per source. Outcome: annotated bibliography with replay manifest.
Enterprise RFP response. Trigger: sales lead. Tier 4, 4 sub-agents (legal, pricing, technical, editor). Hooks: HITL before final send. Compliance: C2PA on final doc. Outcome: signed RFP response PDF.
Self-healing production triage. Trigger: alert. Tier 4, sub-agents diagnose then propose. Hooks: nothing applied to prod without HITL plus change-window policy. Compliance: SOC 2, FedRAMP. Outcome: diagnostic plus approved remediation patch.
Hypothesis generation plus evaluation. Trigger: scientist. Tier 4 compete pattern with cost cap. Hooks: top-k surfaces to HITL. Compliance: internal research standards. Outcome: ranked hypothesis list.

Skill Marketplace 2.0

Internal team publishes a skill. Trigger: dev. Hooks: publish pipeline as in Figure B7. Compliance: SBOM, sigstore. Outcome: ros.lead_score v3.2.1 in private marketplace.
Tenant pins skill version for compliance freeze. Trigger: SOX freeze. Hooks: skill_version_pin. Compliance: SOX, audit retention. Outcome: all bindings resolve to v2.3.1 for Q4.
Auto-rollback on quality-score drop. Trigger: runtime telemetry. Hooks: quality_score_threshold. Compliance: ISO 42001 quality management. Outcome: binding auto-deactivated; previous version reactivated.
Binding-scoped skill version pin for compliance freeze. Finance org pins invoice.extract.v2 binding to skill_version_pin: "2.3.1" for Q4 SOX-freeze window; marketplace quality-score rolls continue but binding never auto-bumps. Hooks: quality_score_threshold still monitored but inactive. Compliance: SOX, audit retention. Outcome: binding-level freeze persists across three patch releases.

Hooks

PII redaction on every PostLLMCall. Hook: PostLLMCall Presidio-style redactor. Compliance: GDPR, HIPAA. Outcome: outputs sanitized before PCC emission.
Cost-threshold paging. Hook: OnCostThreshold → PagerDuty webhook. Compliance: FinOps. Outcome: on-call alerted on runaway costs.
Tool-allowlist per role. Hook: PreToolUse role-based gate. Compliance: RBAC, NIST AI RMF. Outcome: junior roles cannot write_file outside /tmp.
Input-classifier hook rejects jailbreak attempts. Hook: PreDispatch injection classifier. Compliance: OWASP LLM01. Outcome: adversarial example logged; dispatch rejected.

Memory Bank and crypto isolation

Multi-turn research assistant recalls last-week context. Hook: PreDispatch Memory Bank decrypt (tenant KEK). Compliance: GDPR (within tenant scope). Outcome: continuity across sessions.
Tenant requests proof of isolation. Hook: audit export lists KEK access log. Compliance: SOC 2 CC 6.1. Outcome: cryptographic evidence of non-leakage.
User-scope binding overrides org default within policy. Power user promotes their preferred report.generate.v4 binding to Tier 3 with model=claude-opus-4. Hook: OPA check confirms cost_cap ≤ org.max_cost_cap and residency compatible. Compliance: organizational policy enforced despite user customization. Outcome: user-scope binding accepted; org policy intact.

A2A plus MCP dual plane

Nexus Tier-4 negotiates with a Gemini Enterprise agent via A2A. Hooks: A2A peer allowlist. Compliance: export controls on cross-border interaction. Outcome: cross-vendor workflow with signed handoff.
Skill invokes Atlassian MCP to create a Jira ticket. Hook: PostToolUse audit. Compliance: SOC 2. Outcome: Jira ticket created with span link.
Customer-hosted Bedrock AgentCore agent joins a Nexus workflow. Hook: A2A identity brokered by nexus-auth. Compliance: AWS BAA if HIPAA. Outcome: cross-platform delegation with KEK-scoped spans.

Structured output plus self-correction

Every skill contract typed; invalid output auto-corrects. Hook: PostLLMCall validator with R=3 ceiling. Compliance: ISO 42001 reliability. Outcome: downstream consumers receive typed output or structured failure.

Optimizer-compiled prompts

DSPy optimizer tunes the prose.outline skill against a metric. Hook: compiled artefact v+1 staged; auto-rollback on regression. Compliance: ISO 42001 continuous improvement. Outcome: improved prompt deployed without human edit.
Binding quality auto-deactivation. support.triage.v2 runtime quality score drops below threshold 0.75 over 500 dispatches; system auto-flips is_active=false, routes to fallback binding on same binding_key at lower priority, notifies publisher. Hook: quality_score_threshold. Compliance: ISO 42001. Outcome: graceful degradation without dispatch failures.

Observability — Insights plus Polly

Insights Agent auto-detects latency regression and files a ticket. Hook: Insights Agent anomaly detect. Compliance: SOC 2 CC 7.3. Outcome: Jira ticket with span evidence.
NL debug: "why was last night's chain expensive?" Hook: Polly-NL query. Compliance: internal ops. Outcome: span narrative with cost hotspot identified.

FinOps

Per-skill budget cap prevents runaway chain. Hook: PreDispatch budget check. Compliance: FinOps, CFO controls. Outcome: dispatch refused with remedy JSON.
Circuit breaker on failing provider. Hook: OnCostThreshold plus provider failure threshold. Compliance: reliability targets. Outcome: breaker opens for 15 min, routes to alternate.
Token-budget enforcement in Tier 4. Hook: OnCostThreshold. Compliance: FinOps. Outcome: sub-agent pauses for HITL before proceeding.

Deterministic replay plus chain-of-custody

Regulator asks to show how output X was produced. Hook: replay API. Compliance: EU AI Act Art. 12, SOC 2 audit. Outcome: bit-for-bit replay plus C2PA manifest plus span plus policy artefacts.
Skill-writer debugs a production failure offline. Hook: replay with pinned model stub. Compliance: internal. Outcome: deterministic local reproduction.

Airgapped

DoD customer installs from sealed USB. Hook: airgap install pipeline. Compliance: FedRAMP High, DoD IL5. Outcome: fully offline Nexus cluster.
Monthly delta bundle updates skills and models. Hook: airgap update. Compliance: FedRAMP continuous monitoring. Outcome: in-place update with no outbound calls.
Airgapped A2A restricted to local peers. Hook: A2A discovery SPIFFE filter. Compliance: DoD IL5 isolation. Outcome: zero external agent reachability.

CLI 2.0

CI pipeline dispatches skills from GitHub Actions. Hook: nexus dispatch --tail. Compliance: SOC 2 CI integration. Outcome: skill runs log-integrated.
DevOps exports SOC 2 package from CLI in 30 seconds. Hook: nexus governance export --framework soc2. Compliance: SOC 2 evidence. Outcome: signed tarball.
Marketplace plugin ships with declared binding actions. nexus.manifest.json declares 12 actions[] with full metadata; nexus marketplace install leads seeds 12 SYSTEM bindings on install, removes them on uninstall, upgrades them on plugin version bump with diff review. Hooks: binding override OPA policy. Compliance: audit trail. Outcome: lifecycle-managed bindings without code edits.

Governance plus compliance

EU customer enables EU AI Act strict — all high-risk skills require HITL. Hook: risk-tier gate. Compliance: EU AI Act Art. 14. Outcome: high-risk dispatches queued to HITL inbox.
HIPAA-covered org enforces "no provider without BAA". Hook: PreDispatch PHI-tag check. Compliance: HIPAA. Outcome: OpenRouter refused for PHI-tagged skills; policy-violation span emitted.

10. Migration Path (Phases 10–27)

UNO established Phases 1–9 [42]. v4.0 adds Phases 10 through 27.

Phase	Name	Scope
10	Tier 4 state machine	Concrete autonomous engine in `orchestrator`; `orchestrator.autonomous_runs` table; HITL waitpoints.
11	Persistent chain state	Migrate `orchestrator.chain_runs` and `chain_steps` from Redis TTL to Postgres primary; checkpoint after each step.
12	Hooks framework	Generic hook dispatcher in orchestrator; hook-manifest YAML; OPA policy integration.
13	Memory Bank	`memory.bank` tables; envelope encryption; KMS integration.
14	Skill Marketplace 2.0	SKILL.md v2 schema; `ros.skill_versions`; sigstore integration; SBOM pipeline; adversarial-eval harness; quality-score updater.
15	Structured-output plus self-correction	Wrap AI Provider Router calls with schema validator; retry harness.
16	DSPy optimizer pipeline	Offline compile job; variant staging; quality-score rollback gate.
17	FinOps pre-reserve	Redis atomic counters; per-skill, per-org budgets; circuit breakers.
18	Deterministic replay plus C2PA	Replay manifest writer; artefact signer; replay worker.
19	Insights Agent plus Polly-NL	Span clustering service; NL query layer; ClickHouse cold tier.
20	A2A dual plane	A2A server plus client; peer discovery via nexus-auth SPIFFE.
21	CLI 2.0 surface	Commands: dispatch, runs, chain visualize, skill publish, airgap, governance, finops, hooks, memory, a2a, debug.
22	Governance primitives	Compliance frameworks enum; per-framework evidence collectors; auditor-export assembly; OPA policy bundle distribution.
23	UI/UX Bindings	Extend `ros.skill_bindings` to v4.0 schema; Binding Editor UI; `nexus.manifest.json` `actions[]`; override OPA policy.
24	Airgapped bundle mode	Bundle builder; offline installer; TPM integration; delta-bundle flow.
25	Per-queue pod deployments	Split nexus-workflows into per-tier / per-queue Deployments with resource profiles.
26	Multi-provider routing (finish Phase 7)	`AIProviderConfig { providers[], routingPolicy }`; adapter selection reads `routingPolicy[hint]`; failover.
27	Governance bypass closure	Remove nexus-mageagent from `ALLOWED_AI_CALLERS`; delete the service; resolve Section 12.3 vs Section 14 contradiction in UNO paper.

Each phase ships behind a feature flag and is validated by the production scorecard before the next begins.

11. Deployment Profiles

The same codebase serves four profiles; manifests differ by values, not code.

Public cloud multi-tenant. Shared Postgres with RLS, shared Redis with per-org channels, shared Qdrant with per-tenant namespaces, shared Neo4j with per-org labels, shared AI Router with per-org keys. Authentication through the Adverant IdP; tenancy through logical scopes plus cryptographic envelopes.

Single-tenant VDS. Same stack on customer VDS. One org_id value simplifies RLS. Keys isolated to the VDS's nexus-auth. Can still use public providers or customer endpoints.

On-premise Kubernetes. Customer-owned K8s cluster. Images mirrored from Adverant registry. May use customer's LLM endpoints (Azure OpenAI, internal Llama deployments). SSO into customer IdP. Policy bundle from customer OPA repo.

Airgapped sealed bundle. Isolated network. Pre-loaded offline registry. Pinned model weights on local GPUs. nexus-auth TPM-backed. Monthly delta bundles. A2A restricted to local peers. FedRAMP High, DoD IL5, CJIS, IRS Pub 1075 use cases.

12. Evaluation Methodology

We propose six evaluation axes; execution of these benchmarks is deferred to follow-up work after Phases 10–18 ship.

Token efficiency per task. Compare v4.0 Tier 2 dispatch against CrewAI, LangGraph, and OpenAI Agents SDK on a fixed task battery (refactor a repository, triage an issue, generate a report). Metrics: tokens-in, tokens-out, cost-per-task.
Dispatch latency. P50, P95, P99 of /api/v1/dispatch response time. Compare against Temporal, Airflow, and BullMQ direct.
Multi-agent cost. Tier 4 compete-pattern vs self-consistent vs best-of-N on MMLU-style benchmarks; measure quality gain per dollar.
Provable tenant-isolation boundaries. Red-team attempts to exfiltrate tenant A data via the observability backend while executing a workload for tenant B. Success criterion: zero exfiltration.
Replay fidelity. Given a run manifest, reconstruct the run bit-for-bit; measure hash equality of every span and every artefact.
Airgapped feature parity. Of the 50 use cases, how many run unmodified in airgapped mode? Target: at least 47 (three are A2A cross-cluster use cases that are restricted by design).

Beyond the twelve-framework survey in Sections 2 and 3, v4.0 draws on several adjacent research streams.

Durable execution and workflow orchestration. Temporal [7] and Netflix Maestro [9] established durable-execution patterns that informed v4.0's Tier 3 persistent state. Our distinction: Temporal couples workflow logic to the worker execution environment, while v4.0 maintains the dispatch-execution separation established in UNO [42].

LLM serving. PagedAttention (vLLM) [66], Orca [67], SGLang [68], and Sarathi-Serve [69] optimize the token-processing layer below our AI Provider Router. v4.0 is agnostic to the serving layer; organizations may deploy vLLM alongside managed providers.

LLM routing. FrugalGPT [70], RouteLLM [71], and the Dekoninck et al. unified routing-cascading framework [72] inform v4.0's role-based routing and the DSPy optimizer pipeline. Where these papers focus on cost-performance tradeoffs at inference time, v4.0 adds skill-level routing hints stored in the registry and runtime governance constraints.

Agent architectures. ReAct [73], Toolformer [74], Reflexion [75], and the Voyager [76] open-ended learning agent informed v4.0 Tier 2 and Tier 4 design. CAMEL [77] and AutoGen [78] established multi-agent conversation patterns that v4.0 treats as Tier 4 special cases rather than primary modes.

Service mesh security. Istio [79] and the SPIFFE identity framework [80] underpin the three-gate enforcement in v4.0. Envelope encryption and per-tenant KEKs follow NIST SP 800-57 [81] key-hierarchy guidance.

AI governance and compliance. The EU AI Act [50], NIST AI Risk Management Framework [59], ISO/IEC 42001 [55], and OWASP LLM Top 10 (2025) [61] directly inform Section 7.14. C2PA [49] provides the content-provenance substrate. MITRE ATLAS [62] and MITRE ATT&CK serve as threat-model references.

14. Conclusion

Adverant Nexus Stack v4.0 is an incremental architectural evolution, not a clean-sheet rewrite. It preserves the dispatch-execution separation that UNO [42] established as the load-bearing discipline of the platform, while adding the primitives that the 2026 agentic framework landscape has collectively identified as necessary and that no single framework ships turnkey: cryptographically isolated memory, signed and measured skill artefacts, first-class hooks, cost governance, deterministic replay, airgapped deployment, user-configurable bindings, and native compliance integration across thirteen regulatory regimes. The eighteen-phase migration (Phases 10–27) is sequenced so that each phase ships behind a feature flag and is validated before the next begins. The paper's claims and limitations are validated at three Gemini 2.5 Pro gates archived alongside the paper. Follow-up work will execute the evaluation methodology in Section 12 and report quantitative results.

15. Appendices

Appendix A — SKILL.md v2 Schema (excerpt)


YAML
36 lines
---
name: string                   # unique id, kebab-case
version: semver                # "3.2.1"
description: string            # one-liner
category: enum                 # scoring|profiling|...|compliance
risk_tier: enum                # minimal|limited|high|unacceptable (EU AI Act)
execution:
  tier: int                    # 1|2|3|4
  max_iterations: int
  chain_steps?: [...]
  response_format: json|text
inputs:
  schema: { ... JSON Schema ... }
outputs:
  schema: { ... JSON Schema ... }
governance:
  data_residency: eu_only|us_only|any|region-tag
  export_tags: [ EAR | ITAR | dual-use ]
  compliance_frameworks: [ soc2 | eu-ai-act | iso42001 | hipaa | fedramp | nist-airmf ]
  phi_tagged: bool
hooks:
  - event: PreToolUse
    action: deny|rewrite|require_hitl|policy_ref
    policy_ref?: opa/...
allowed_tools: [ ... ]
denied_tools: [ ... ]
sbom_ref: string               # path or URL to SBOM
signature:
  type: sigstore
  payload: base64
  identity: string
quality_score_threshold: float # auto-deactivate threshold
metadata:
  mitre_atlas: [ AML.T... ]
  owasp_llm: [ LLM01|LLM02|... ]
---

Appendix B — Span-Tree v2 Schema (excerpt)


SQL
18 lines
CREATE TABLE orchestrator.execution_spans (
  span_id          UUID PRIMARY KEY,
  parent_span_id   UUID,
  job_id           UUID NOT NULL,
  type             span_type NOT NULL,   -- closed 12-type enum (UNO) + 8 new types
  started_at       TIMESTAMPTZ NOT NULL,
  ended_at         TIMESTAMPTZ,
  duration_ms      INTEGER GENERATED ALWAYS AS (...) STORED,
  payload_cipher   BYTEA,                -- envelope-encrypted
  payload_dek_wrapped BYTEA,             -- wrapped DEK
  signature        BYTEA,                -- hash-chain signature
  prev_span_hash   BYTEA,                -- prev span's hash (for chain)
  span_hash        BYTEA,                -- this span's hash
  org_id           UUID NOT NULL,
  ...
) PARTITION BY RANGE (started_at);
-- 12-type v3 enum preserved. v4.0 adds: hook_invocation, policy_eval, binding_resolve,
-- hitl_waitpoint, quality_eval, optimizer_compile, a2a_message, c2pa_sign.

Appendix C — CLI 2.0 Command Reference (abbreviated)

nexus login [--org <slug>]
nexus dispatch <job_type> [--input @file] [--tier <n>] [--provider <p>] [--model <m>]
                          [--cost-cap <$>] [--risk <r>] [--tail] [--json]
nexus runs list|show|replay|tail|export
nexus chain visualize <run_id>
nexus skill publish|versions|rollback|install|uninstall
nexus airgap bundle|install|update|verify
nexus governance export|policies list|apply
nexus hooks list|apply|remove
nexus finops budgets|burn-rate|reserve|debit
nexus memory snapshot|gc|export
nexus a2a peers list|call|serve
nexus binding list|get|set|resolve|diff
nexus insights cost-hotspots|latency-regressions|anomalies
nexus debug nl "<question>"

Appendix D — Hook Specification (full)


YAML
13 lines
event: PreDispatch | PostSkillResolve | PreTierSelect | PreLLMCall | PostLLMCall
     | PreToolUse | PostToolUse | PreChainStep | PostChainStep
     | OnCostThreshold | OnIterationLimit | OnTierEscalation
     | OnHITLPause | OnHITLResume | PostDispatch
scope: org | skill | plugin | binding
target: <skill_id or plugin_slug or binding_key or "*">
matcher: <CEL expression>        # e.g., tool.name == 'write_file' && path.startsWith('/etc')
action: deny | rewrite | require_hitl | emit_event | call_webhook | policy_ref
args:
  webhook?: { url, headers, payload_template }
  policy_ref?: opa/path/to/rule.rego
  rewrite?: <jsonpath rewrites>
priority: 0-1000

Appendix E — A2A Message Format


JSON
12 lines
{
  "a2a_version": "1.0",
  "message_id": "uuid",
  "from": "spiffe://adverant/acme/agent/researcher",
  "to":   "spiffe://other-org/policy/agent/legal-review",
  "capability": "legal.review",
  "payload_schema": "https://.../schema.json",
  "payload": { ... },
  "signature": "sigstore:...",
  "run_context": { "run_id": "uuid", "parent_span_id": "uuid" },
  "compliance": { "residency": "eu_only", "export_tags": [] }
}

Appendix F — Airgapped Bundle Manifest


YAML
27 lines
version: 4.0.0
generated_at: 2026-04-24T00:00:00Z
signature:
  gpg: <ascii-armored>
  sigstore: <sigstore-bundle>
images:
  - name: adverant/nexus-orchestrator
    digest: sha256:...
  - name: adverant/nexus-gateway
    digest: sha256:...
skills:
  - skill_id: ros.code_edit
    version: 3.2.1
    signature: ...
models:
  - name: gemini-2.5-pro
    weights_digest: sha256:...
policies:
  - path: opa/...
    version: 4.0.0
manifests:
  - path: k8s/...
migrations:
  - path: db/migrations/...
licenses:
  - component: ...
    license: ...

Appendix G — Compliance-Control Traceability Matrix (excerpt)

Framework	Control ID	v4.0 Primitive	Evidence Source
EU AI Act	Art. 12 (logging)	Span tree (§8.B-B13)	`orchestrator.execution_spans`
EU AI Act	Art. 13 (transparency)	C2PA manifest (§7.11) + model card	`artefact.c2pa_manifest`
EU AI Act	Art. 14 (human oversight)	Tier 4 HITL waitpoint (§7.2)	`hitl_decisions`
EU AI Act	Art. 15 (accuracy/robustness)	DSPy metrics (§7.8) + adversarial eval (§7.3)	`skill_versions.adv_eval_report`
EU AI Act	Art. 26 (deployer)	Per-org governance doc	`org.governance_doc`
GDPR	Art. 17 (erasure)	Atomic erasure chain (§7.14)	`erasure_certificates`
GDPR	Art. 22 (auto decisions)	Tier 4 HITL (§7.2)	`hitl_decisions`
GDPR	Art. 32 (security)	Envelope encryption (§7.5)	`memory_bank.kek_access_log`
GDPR	Art. 35 (DPIA)	Auto-generated DPIA	`skill_versions.dpia`
SOC 2	CC6.1 (logical access)	Three-gate enforcement (§8.E-E5)	Istio/service-key/OPA logs
SOC 2	CC6.6 (boundary)	RLS + middleware (§4.5)	tenant isolation tests
SOC 2	CC7.2 (monitoring)	Insights Agent (§7.9)	anomaly alerts
SOC 2	CC7.3 (analysis)	Span tree queries	investigation artefacts
ISO 27001	A.8.16 (monitoring)	Span tree + Insights Agent	OT metrics
ISO 27001	A.8.10 (info deletion)	Crypto-erasure (§7.5)	erasure certificates
ISO 42001	AI impact assessment	SKILL.md v2 risk section	`skill_versions.ia_report`
ISO 42001	AI system lifecycle	Skill Marketplace 2.0 (§7.3)	`skill_versions` lineage
HIPAA	§164.312(a)(1) (access ctl)	RLS + JWT + RBAC	authz logs
HIPAA	§164.312(c)(1) (integrity)	C2PA + span hash chain	provenance manifests
HIPAA	§164.308(a)(1) (risk analysis)	EU AI Act risk tier (reused)	registry
FedRAMP	AC-2 (account mgmt)	nexus-auth + SSO	nexus-auth audit
FedRAMP	AU-12 (audit gen)	Span tree	per-span audit
FedRAMP	SC-12 (crypto key mgmt)	Per-tenant KEK + rotation	KMS rotation log
FedRAMP	SC-13 (FIPS crypto)	FIPS 140-3 modules in airgap bundle	bundle manifest
NIST AI RMF	GOVERN	Per-org governance doc	policies
NIST AI RMF	MAP	Skill registry metadata	`ros.skill_definitions`
NIST AI RMF	MEASURE	Quality evals + span analytics	Insights output
NIST AI RMF	MANAGE	Hooks + FinOps + HITL	hook invocations

(Full matrix of approximately 200 rows in the distribution package; excerpt above.)

Appendix H — OPA/Rego Policy Starter Pack (excerpt)


Plain Text
46 lines
package nexus.dispatch

default allow = false

allow {
  input.caller in ["nexus-orchestrator", "nexus-workflows", "chat-orchestrator"]
  residency_ok
  budget_ok
  risk_ok
  export_ok
}

residency_ok {
  org := data.orgs[input.org_id]
  skill := data.skills[input.skill_id]
  org.residency == "any"
}
residency_ok {
  org := data.orgs[input.org_id]
  skill := data.skills[input.skill_id]
  org.residency == skill.data_residency
}

budget_ok {
  reserved := data.finops.reserved[input.org_id]
  remaining := data.orgs[input.org_id].budget - reserved
  remaining >= input.cost_estimate
}

risk_ok {
  skill := data.skills[input.skill_id]
  skill.risk_tier != "unacceptable"
}
risk_ok {
  skill := data.skills[input.skill_id]
  skill.risk_tier == "high"
  input.run_has_hitl == true
}

export_ok {
  skill := data.skills[input.skill_id]
  org := data.orgs[input.org_id]
  every tag in skill.export_tags {
    tag in org.allowed_export_tags
  }
}

Appendix I — Auditor Export Payload Schema


YAML
20 lines
auditor_export:
  framework: soc2 | iso27001 | iso42001 | eu-ai-act | hipaa | fedramp | nist-airmf
  window: { start: ISO8601, end: ISO8601 }
  org_id: UUID
  generated_at: ISO8601
  signature: { type: gpg | sigstore, value: base64 }
  control_evidence:
    - control_id: string        # e.g., CC6.1
      framework_section: string # e.g., SOC 2 Common Criteria 6.1
      evidence:
        - type: span | policy_version | hitl_decision | risk_tier_record
          reference: URI        # resolvable within the package
          summary: string
  span_samples: [...]           # sampled spans with full payload
  policy_versions: [...]
  hitl_decisions: [...]
  conformity_assessments: [...]
  dpias: [...]
  model_cards: [...]
  adversarial_eval_reports: [...]

Appendix J — Bindings Schema v2 (DDL + Resolution + Override OPA)


SQL
96 lines
-- Generalized from Adverant-NexusROS/database/migrations/030_skill_bindings.sql
CREATE TABLE ros.skill_bindings_v2 (
  id                    UUID PRIMARY KEY,
  organization_id       UUID NOT NULL,
  binding_key           VARCHAR(200) NOT NULL,
  scope                 VARCHAR(20) NOT NULL DEFAULT 'organization'
                        CHECK (scope IN ('system','organization','project','user')),
  scope_id              UUID,
  priority              INTEGER NOT NULL DEFAULT 0,
  is_active             BOOLEAN NOT NULL DEFAULT true,

  -- resolution target
  skill_definition_id   UUID NOT NULL,
  skill_version_pin     VARCHAR(40) NOT NULL DEFAULT 'latest-minor',

  -- execution
  tier                  SMALLINT CHECK (tier BETWEEN 1 AND 4),
  provider_preference   VARCHAR(40),
  model_preference      VARCHAR(100),
  routing_hint          VARCHAR(40),
  queue_name            VARCHAR(80),
  response_format       VARCHAR(20),

  -- cost and limits
  cost_cap_usd          NUMERIC(10,4),
  daily_cap_usd         NUMERIC(10,2),
  token_cap_in          INTEGER,
  token_cap_out         INTEGER,
  timeout_ms            INTEGER,
  max_iterations        INTEGER,
  max_sub_agents        INTEGER,

  -- governance
  risk_tier             VARCHAR(20),
  data_residency        VARCHAR(40),
  export_tags           TEXT[],
  requires_hitl         VARCHAR(30) DEFAULT 'never',
  policy_refs           TEXT[],
  tier_restrictions     TEXT[],
  phi_tagged            BOOLEAN DEFAULT false,
  compliance_frameworks TEXT[],

  -- hooks
  hooks                 JSONB NOT NULL DEFAULT '[]',
  allowed_tools         TEXT[],
  denied_tools          TEXT[],

  -- inputs and mapping
  input_schema          JSONB,
  inputs_mapping        JSONB,
  output_target         VARCHAR(40),

  -- UI presentation
  display_name          VARCHAR(120),
  description           TEXT,
  icon                  VARCHAR(60),
  placement             TEXT[],
  confirmation          VARCHAR(20),
  badge                 TEXT[],
  shortcut              VARCHAR(60),

  -- A/B experiments and telemetry
  ab_experiment_id      UUID REFERENCES ros.skill_ab_experiments(id),
  quality_score_threshold NUMERIC(3,2),
  telemetry_tags        JSONB DEFAULT '{}',

  -- lifecycle
  status                VARCHAR(20) NOT NULL DEFAULT 'active',
  config_overrides      JSONB NOT NULL DEFAULT '{}',
  conditions            JSONB NOT NULL DEFAULT '{}',
  created_by            UUID,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_by            UUID,
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  deleted_at            TIMESTAMPTZ,

  UNIQUE (binding_key, scope, scope_id, priority, organization_id)
);

-- Resolution algorithm (pseudo-SQL)
-- Given: binding_key, user_id, project_id, org_id
SELECT *
FROM ros.skill_bindings_v2
WHERE binding_key = :binding_key
  AND is_active = true AND deleted_at IS NULL
  AND organization_id = :org_id
  AND (
        (scope = 'user' AND scope_id = :user_id) OR
        (scope = 'project' AND scope_id = :project_id) OR
        (scope = 'organization') OR
        (scope = 'system')
      )
ORDER BY
  CASE scope WHEN 'user' THEN 4 WHEN 'project' THEN 3 WHEN 'organization' THEN 2 WHEN 'system' THEN 1 END DESC,
  priority DESC
LIMIT 1;


Plain Text
49 lines
# Binding override OPA policy (Appendix H companion)
package nexus.bindings

default allow_override = false

allow_override {
  not widens_residency
  phi_providers_satisfied
  cost_cap_within_org
  hitl_preserved
  tools_within_allow
  user_has_permission
}

widens_residency {
  input.proposed.data_residency == "any"
  input.org_policy.data_residency != "any"
}

phi_providers_satisfied {
  not input.skill.phi_tagged
}
phi_providers_satisfied {
  input.skill.phi_tagged
  input.proposed.provider_preference in input.org_policy.phi_allowed_providers
}

cost_cap_within_org {
  input.proposed.cost_cap_usd <= input.org_policy.max_cost_cap
}

hitl_preserved {
  input.skill.risk_tier != "high"
}
hitl_preserved {
  input.skill.risk_tier == "high"
  input.proposed.requires_hitl in ["always", "on_risk_high"]
}

tools_within_allow {
  every tool in input.proposed.allowed_tools {
    tool in input.org_policy.allowed_tools
  }
}

user_has_permission {
  perm := sprintf("bindings:write:%s", [input.proposed.scope])
  perm in input.user.permissions
}

References

[1] Anthropic. Introducing the Model Context Protocol. 2024. anthropic.com

[2] Anthropic. Donating the Model Context Protocol and Establishing the Agentic AI Foundation. December 2025. anthropic.com

[3] Google. Google Cloud Next 2026: Agent2Agent Goes Production-Grade. 2026. cloud.google.com

[4] Microsoft. Microsoft Agent Framework 1.0 Released. April 2026. devblogs.microsoft.com

[5] LangChain. LangGraph: Graph-Based Agent Orchestration. github.com

[6] Google. Agent Development Kit (ADK). 2026. cloud.google.com

[7] Temporal Technologies. Durable Execution. temporal.io

[8] LangChain. LangSmith: Observability and Evaluation. docs.langchain.com

[9] Netflix Tech Blog. Maestro: Netflix's Workflow Orchestrator. netflixtechblog.com

[10] Amazon Web Services. AgentCore Adds Quality Evaluations and Policy Controls. 2026. aws.amazon.com

[11] Google. Gemini 2.5 Pro. ai.google.dev

[12] Apache Software Foundation. Apache Airflow. airflow.apache.org

[13] Microsoft. Microsoft Agent Framework Overview. learn.microsoft.com

[14] OpenAI. Swarm → OpenAI Agents SDK. github.com

[15] DevOps.com. OpenAI Upgrades Its Agents SDK with Sandboxing and a New Model Harness. 2026. devops.com

[16] Google Cloud Blog. Introducing the Gemini Enterprise Agent Platform. April 2026. cloud.google.com

[17] Google. Gemini Enterprise Agent Platform Product Page. cloud.google.com

[18] SiliconANGLE. Google Brings Agentic Development, Optimization, and Governance Under One Roof. April 2026. siliconangle.com

[19] Amazon Web Services. Introducing Amazon Bedrock AgentCore. aws.amazon.com

[20] CrewAI. CrewAI Documentation. docs.crewai.com

[21] CrewAI. CrewAI GitHub. github.com

[22] Medium. LangGraph vs CrewAI vs AutoGen: Which Agent Framework Should You Actually Use in 2026. medium.com

[23] LangChain. LangGraph Documentation. github.com

[24] LangChain. LangChain and NVIDIA Enterprise. blog.langchain.com

[25] Pydantic. Pydantic AI. ai.pydantic.dev

[26] Cloud Summit. Microsoft Agent Framework Production-Ready Convergence of AutoGen and Semantic Kernel. cloudsummit.eu

[27] OpenAI. OpenAI Agents Python SDK. openai.github.io

[28] Anthropic. Model Context Protocol Announcement. anthropic.com

[29] Anthropic. Claude Agent SDK Overview. platform.claude.com

[30] Anthropic. Claude Agent SDK Subagents. platform.claude.com

[31] Anthropic. Claude Agent SDK Hooks. platform.claude.com

[32] Microsoft. Semantic Kernel. learn.microsoft.com

[33] Microsoft. Semantic Kernel and Microsoft Agent Framework. devblogs.microsoft.com

[34] LlamaIndex. Workflows. llamaindex.ai

[35] LlamaIndex. Announcing Workflows 1.0: A Lightweight Framework for Agentic Systems. llamaindex.ai

[36] Stanford NLP. DSPy. dspy.ai

[37] DSPy. Optimizers. dspy.ai

[38] DSPy. GEPA Optimizer. dspy.ai

[39] deepset. Haystack. haystack.deepset.ai

[40] deepset. Haystack GitHub. github.com

[41] Amazon Web Services. AgentCore New Features April 2026. aws.amazon.com

[42] Adverant Research Team. Unified Nexus Orchestrator: Separation of Dispatch and Execution in Multi-Chain AI Workload Platforms. April 2026. adverant.ai

[43] IntuitionLabs. Enterprise AI Code Assistants in Air-Gapped Environments. intuitionlabs.ai

[44] RapidClaw. AI Agent Marketplace Guide 2026. rapidclaw.dev

[45] The Register. Agentic AI Protocols: MCP, UTCP, A2A, Etc. theregister.com

[46] Prefactor. MCP Security: Multi-Tenant AI Agents Explained. prefactor.tech

[47] Blaxel. Multi-Tenant Isolation for AI Agents. blaxel.ai

[48] AWS. Multi-Tenant Agentic AI Prescriptive Guidance. docs.aws.amazon.com

[49] Coalition for Content Provenance and Authenticity (C2PA). C2PA Specification. c2pa.org

[50] European Union. Regulation (EU) 2024/1689 on Artificial Intelligence (AI Act). Official Journal of the European Union, July 2024.

[51] European Union. General Data Protection Regulation (Regulation (EU) 2016/679). Official Journal of the European Union, April 2016.

[52] European Union. Data Act (Regulation (EU) 2023/2854). Official Journal of the European Union, December 2023.

[53] American Institute of CPAs. SOC 2: Trust Services Criteria. aicpa.org

[54] International Organization for Standardization. ISO/IEC 27001:2022 Information Security Management. iso.org

[55] International Organization for Standardization. ISO/IEC 42001:2023 AI Management Systems. iso.org

[56] U.S. Department of Health and Human Services. HIPAA Security Rule (45 CFR Part 160 and Subparts A and C of Part 164). hhs.gov

[57] General Services Administration. FedRAMP Moderate and High Baselines. fedramp.gov

[58] Department of Defense. DoD Cloud Computing Security Requirements Guide (IL4/IL5). public.cyber.mil

[59] National Institute of Standards and Technology. AI Risk Management Framework 1.0. January 2023. nist.gov

[60] National Institute of Standards and Technology. AI 600-1: Generative AI Profile. nist.gov

[61] OWASP. Top 10 for LLM Applications 2025. owasp.org

[62] MITRE. ATLAS: Adversarial Threat Landscape for AI Systems. atlas.mitre.org

[63] OWASP. Agentic AI Threats Working Group. owasp.org

[64] Bureau of Industry and Security. Export Administration Regulations (EAR, 15 CFR Parts 730–774). bis.doc.gov

[65] European Union. EU Dual-Use Export Control Regulation 2021/821.

[66] Kwon, W., et al. Efficient Memory Management for Large Language Model Serving with PagedAttention. SOSP 2023.

[67] Yu, G. et al. Orca: A Distributed Serving System for Transformer-Based Generative Models. OSDI 2022.

[68] Zheng, L., et al. SGLang: Efficient Execution of Structured Language Model Programs. 2024.

[69] Agrawal, A., et al. Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve. OSDI 2024.

[70] Chen, L., Zaharia, M., Zou, J. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. 2023. arxiv.org

[71] Ong, I., et al. RouteLLM: Learning to Route LLMs with Preference Data. 2024. arxiv.org

[72] Dekoninck, J. et al. A Unified Approach to Routing and Cascading for LLMs. 2024.

[73] Yao, S., et al. ReAct: Synergizing Reasoning and Acting in Language Models. 2022. arxiv.org

[74] Schick, T., et al. Toolformer: Language Models Can Teach Themselves to Use Tools. 2023. arxiv.org

[75] Shinn, N., et al. Reflexion: Language Agents with Verbal Reinforcement Learning. 2023. arxiv.org

[76] Wang, G., et al. Voyager: An Open-Ended Embodied Agent with Large Language Models. 2023. arxiv.org

[77] Li, G., et al. CAMEL: Communicative Agents for Mind Exploration. 2023. arxiv.org

[78] Wu, Q., et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. 2023. arxiv.org

[79] Istio Authors. Istio Service Mesh. istio.io

[80] Cloud Native Computing Foundation. SPIFFE / SPIRE. spiffe.io

[81] National Institute of Standards and Technology. SP 800-57 Part 1 Rev 5: Recommendation for Key Management. csrc.nist.gov

[82] Pydantic. Pydantic AI GitHub. github.com

[83] Pydantic. Pydantic AI Product Page. pydantic.dev

[84] LangChain. LangChain Home. langchain.com

[85] Google Blog. Gemini Enterprise Agent Platform Announcement. blog.google

[86] Visual Studio Magazine. Microsoft Ships Production-Ready Agent Framework 1.0 for .NET and Python. visualstudiomagazine.com

[87] Microsoft. Semantic Kernel → Microsoft Agent Framework Migration Guide. learn.microsoft.com

[88] LlamaIndex. AgentWorkflow Announcement. llamaindex.ai

[89] Stanford NLP. DSPy GitHub. github.com

[90] deepset. Products and Services: Haystack. deepset.ai

[91] dev.to. MCP vs A2A: The Complete Guide to AI Agent Protocols in 2026. dev.to

[92] Digital Applied. AI Agent Protocol Ecosystem Map 2026. digitalapplied.com

[93] Machine Learning Mastery. 7 Agentic AI Trends to Watch in 2026. machinelearningmastery.com

[94] Redwerk. LangGraph vs CrewAI Production. redwerk.com

[95] OpenAgents. Open Source AI Agent Frameworks Compared. openagents.org

[96] Adverant Research Team. Cognitive Memory Architecture for Multi-Tenant LLM Platforms. April 2026. adverant.ai

Paper Completeness Statement

Fifteen sections have been drafted (Abstract through Conclusion plus Appendices A–J). Fifty use cases numbered 1–50 appear in Section 9. Ninety-six references are enumerated. Fifty diagrams (A1–A9, B1–B21, C1–C8, D1–D9, E1–E8, F1–F4) are specified in Section 8 and rendered as ASCII figures in the published LaTeX source plus Mermaid and PlantUML source files under figures/. The Bindings metadata schema (Appendix J) carries all fields listed in Section 7.15. The compliance-control traceability matrix (Appendix G) excerpt maps twenty-four control identifiers across eight frameworks to v4.0 primitives.

Three Gemini 2.5 Pro validation gates — Gate A (post-outline), Gate B (post-Section 7 proposal core), Gate C (pre-publication peer review simulation) — are archived as structured prompt-plus-response files in the gemini-gates/ sibling directory and are part of the published package. The prompts embed arXiv-category adversarial-reviewer personas (cs.SE, cs.DC, cs.AI) as specified in the plan.

This paper is published link-only and excluded from the /docs/research index page, the sitemap.xml, and the crawl allowlist. The same not-discoverable policy used by the UNO Pipeline Redesign paper [42] and the Cognitive Memory Architecture paper [96] applies here.

Keywords

agentic orchestrationnexus stack v4.0skill marketplaceUI bindingshooksmemory bankA2A MCPdeterministic replayC2PA provenanceairgapped deploymentEU AI ActFedRAMPHIPAANIST AI RMFISO 42001tenant isolationFinOpstier 4 autonomouschain of custodymarketplace plugins