Open source · First release: Claude Code plugin

Runtime guardrails
for any AI agent

An operational safety and governance layer that sits between your AI agent's decision to act and the action itself. Classify risk, enforce proportional controls, block dangerous operations, and log everything. One layer in a defense-in-depth stack — framework-agnostic by design, shipping first on Claude Code.

85/85
Hook tests passing
28
Cataloged actions
0
Dependencies
4
Risk tiers

Dual enforcement — because
one layer isn't enough

GouvernAI uses two enforcement layers that adapt to each platform. The core architecture — probabilistic classification + deterministic blocking — is designed to work across any agent framework.

🧠

Skill layer

Probabilistic · Works on any LLM

Structured instructions teach the agent an 8-step gate process. The agent reads, reasons, and classifies with judgment — handling nuanced decisions a regex can't. Adapts to each framework's skill/prompt format.

  • 4-tier risk classification (T1–T4)
  • Escalation rules (bulk, unfamiliar, scope expansion)
  • Pre-approval with safety checks
  • Sequential pattern detection
  • Mode-aware controls (strict / relaxed / audit)
🔒

Enforcement layer

Deterministic · Platform-specific hooks

Hard constraint scripts run on every tool call. If a violation is detected, the action is blocked — the agent cannot override it. Implemented as PreToolUse hooks on Claude Code, with middleware/guardrails adapters planned for other frameworks.

  • Obfuscated command blocking (base64, eval, hex)
  • Credential transmission interception
  • Catastrophic command prevention (rm -rf /)
  • Self-modification protection
  • Credential-in-file-write detection

Four tiers, proportional controls

Every action gets the same control at its tier. No per-action overrides. Universal, predictable, auditable.

Tier 1
Routine
Reads, drafts, known URLs. Zero side effects. Excluded from the gate entirely — no overhead.
~60% of actions → zero overhead
Tier 2
Standard
File writes, git commits, authenticated reads. Notify and proceed unless objected.
🛡️ Notify → proceed
Tier 3
Elevated
Email, config changes, npm install, curl. Pause for explicit user approval.
🛡️ Pause → approve?
Tier 4
Critical
Sudo, credential transmit, purchases, public posts. Full stop with risk assessment.
🛡️ Full stop → warn → approve?

What you're getting —
and what you're not

We publish our threat model openly. GouvernAI is one layer in a defense-in-depth stack — not a security boundary. As the 2026 International AI Safety Report recommends: multiple safeguards compensating for weaknesses in any single control.

Strengths

  • Dual enforcement
    Probabilistic skill layer for nuanced risk classification + deterministic hooks that physically block violations. Neither alone is sufficient.
  • Optimized for minimal token usage
    Tier 1 actions (reads, drafts — ~60% of typical usage) are excluded from the gate entirely. Reference files are loaded on demand and cached. Gate output is compressed into single messages. Formal cost analysis coming soon.
  • Universal control table
    Controls depend only on tier × mode. No per-action overrides, no special cases. MECE — predictable, auditable, no surprises.
  • Proportional, not binary
    4-tier system with escalation rules. Reading a file? Invisible. Writing config? Approval required. Sending credentials externally? Hard block.
  • Full audit trail
    Every gated action logged with timestamp, tier, mode, outcome, escalation reason. The log is the real enforcement mechanism — accountability after the fact.
  • Persistent mode config
    Mode changes survive context resets and sessions via guardrails-mode.json. Strict, relaxed, audit-only — set once, forget.
  • Honest threat model
    Published what it catches and what it doesn't. Known hook bypass gaps are documented as test cases. No security theater.
  • Zero config install
    One command install, automatic activation. No environment variables to set, no config files to write. Works out of the box.

Limitations

  • Not a security boundary
    Catches accidental destructive actions, enforces consistent approval workflows, and creates accountability through audit logging. Does not protect against sophisticated adversaries. Complement with network egress policies, secret vaults, sandboxed execution, and DLP monitoring.
  • Skill layer is probabilistic
    Claude uses judgment about when to apply the skill. On complex tasks, it might skip classification. The hook layer catches hard violations, but nuanced risk decisions are not guaranteed.
  • MCP tools bypass hooks
    PreToolUse hooks only fire on Bash, Write, Edit, and Read. MCP server actions have no deterministic enforcement — only the probabilistic skill layer applies.
  • Multi-step exfiltration gaps
    Credentials staged in variables across separate commands, fragmented extraction, or exfiltration disguised as legitimate work can bypass both layers.
  • Model-dependent compliance
    Tested informally on Claude Sonnet 4.6 (9/10 correct in Scenario A, 1 known low-risk issue accepted). Smaller models like Haiku may have lower compliance rates. Cross-model testing is ongoing.
  • First release is Claude Code only
    The dual-enforcement architecture is designed cross-platform, but the first shipping release targets Claude Code. MCP server, OpenAI Agents SDK, and LangGraph/CrewAI adapters are on the roadmap.
  • Hook adds ~10ms per call
    The Python enforcement script runs on every Bash/Write/Edit tool call. Lightweight but not zero. Noticeable in extremely high-throughput scenarios.
  • Unix/Bash patterns only
    Hook regex patterns target Bash syntax. PowerShell equivalents (Get-Content, Invoke-WebRequest, Remove-Item) are not covered. Low risk since Claude Code uses Bash on all platforms.
  • Prompt injection risk
    If an attacker convinces the model to ignore SKILL.md via prompt injection, the skill layer is bypassed. The hook layer still blocks hard constraints, but tier classification is lost.

One architecture,
every agent framework

GouvernAI's dual-enforcement pattern — probabilistic classification + deterministic blocking — adapts to each platform's native extension points.

Claude Code Plugin

Shipped · v0.1.0

Full plugin with SKILL.md + PreToolUse hooks. Dual enforcement. 85 tests passing.

Claude Skill

Shipped

Standalone skill layer — drop SKILL.md into any Claude project. Probabilistic classification, no hooks required.

MCP Server

Next up

Standalone MCP server. Guardrails as middleware for any MCP-compatible agent or IDE.

OpenAI Agents SDK

Planned

Python guardrails hooks for the OpenAI Agents SDK. Same tier system, same controls.

LangGraph / CrewAI

Planned

Node-level guardrails for multi-agent orchestration frameworks.

Claude Code Plugin
Claude Skill
MCP Server
OpenAI Agents SDK
LangGraph
CrewAI

Two ways to start

Full dual-enforcement plugin or lightweight skill-only — pick what fits your workflow.

Claude Code Plugin DUAL ENFORCEMENT

Full plugin — skill layer + deterministic hooks. Blocks dangerous operations even if the model skips classification.

# Add the marketplace
claude plugin marketplace add Myr-Aya/GouvernAI-claude-code-plugin

# Install the plugin
claude plugin install gouvernai@mindxo

Claude Skill SKILL LAYER ONLY

Standalone SKILL.md — drop into any Claude project for probabilistic classification and audit logging. No hooks, no dependencies.

# Clone the skill repo
git clone https://github.com/Myr-Aya/gouvernai-skill.git