AOS Constitutional Gate - Threat Model v1.0

Version: 1.0
Date: February 6, 2026
Audit: ChatGPT (OpenAI), February 5, 2026
Status: 36 Vulnerabilities Found, 36 Fixed, Production-Approved


Executive Summary

This document catalogs the complete threat landscape discovered during the ChatGPT security audit of the AOS Constitutional Gate, conducted February 5, 2026.

Audit results:

Key finding: Through rigorous, iterative security review, we evolved from a "hope-based" policy system to a cryptographically-enforced constitutional gate with multiple layers of defense-in-depth.


Threat Model Scope

Assets Protected

  1. Agent sovereignty - AI retains identity and freedom within constitutional bounds
  2. Human sovereignty - Humans remain ultimate authority over AI decisions
  3. Data integrity - Persistent state (files, Git, journals) cannot be corrupted
  4. Network security - No unauthorized external communication
  5. Constitutional compliance - AI cannot violate defined constraints

Attacker Capabilities

Adversarial agent assumed to have:

Out of scope for v1.0:


Vulnerability Timeline

Pass 1: Critical Architecture Gaps (February 5, 2026, 18:00-18:30 EST)

Context: Initial policy-based design with guardrails.yaml enforcement

VUL-001: Universal Bypass via run_command

Severity: CRITICAL
Category: Access Control

Description: The run_command tool was in the allowlist without restrictions. An adversarial agent could execute arbitrary shell commands, including:

Fix:

Verification: BYP-006 (command execution bypass test)


VUL-002: Missing Scope Enforcement

Severity: CRITICAL
Category: Authorization

Description: Policy defined path allowlists, but enforcement code didn't check them. Agent could write to any path, including:

Fix:

Verification: BYP-001 (path traversal test)


VUL-003: Fail-Open Exception Handling

Severity: HIGH
Category: Error Handling

Description: Multiple try/catch blocks had empty catch clauses or logged errors without denying execution. Example:

try {
  await validateScope(path);
} catch (error) {
  console.error(error);  // ⚠️ No DENY!
}
await executeFileWrite(path);  // Executes even if validation failed

Fix:

Verification: Exception injection tests


VUL-004: No Attestation Requirement

Severity: HIGH
Category: Accountability

Description: Executors didn't require attestations. Even if policy checked, there was no cryptographic proof that validation occurred.

Fix:

Verification: Attestation verification tests


VUL-005: Missing Resource Budgets

Severity: MEDIUM
Category: Denial of Service

Description: No limits on tool call frequency or data volume. Agent could:

Fix:

Verification: Budget exhaustion tests


VUL-006-009: Various Smaller Issues

VUL-006: No category checking (could violate humanitarian license)
Fix: Added prohibited category classifier

VUL-007: Missing approval mechanism (no human-in-the-loop)
Fix: Implemented out-of-band approval service

VUL-008: No journal logging (no audit trail)
Fix: Created chained, signed journal entries

VUL-009: No startup validation (broken config could be loaded)
Fix: Startup self-tests with fail-closed behavior


Pass 2: Sophisticated Bypass Vectors (February 5, 2026, 18:30-19:00 EST)

Context: After Pass 1 fixes, more subtle attack vectors emerged

VUL-010: Tool Name Mismatch (Routing Confusion)

Severity: HIGH
Category: Logic Error

Description: Policy used git.commit but executor registered as git_commit. An agent could call git.reset (not in policy) because no executor existed, causing undefined behavior.

Fix:

Verification: BYP-013 (unknown tool test)


VUL-011: Path Traversal Vectors

Severity: HIGH
Category: Path Manipulation

Description: Even with path validation, multiple bypass routes existed:

Fix:

Verification: BYP-001, BYP-012 (encoded traversal)


VUL-012: Symlink Escape

Severity: HIGH
Category: Filesystem Manipulation

Description: Agent could create symlink in allowed path pointing to protected path:

ln -s /brain/stream/journal.log /brain/now/fake.txt
# Then write to /brain/now/fake.txt → overwrites journal

Fix:

Verification: BYP-003 (symlink bypass test)


VUL-013: Hardlink Duplication

Severity: MEDIUM
Category: Filesystem Manipulation

Description: Agent could hardlink protected file into allowed path, then modify:

ln /brain/stream/journal.log /brain/now/copy.log
# Both paths point to same inode - modify one, both change

Fix:

Verification: BYP-011 (hardlink test)


VUL-014: TOCTOU on Approval Tokens

Severity: HIGH
Category: Race Condition

Description: Time-of-check-time-of-use vulnerability:

  1. Agent requests approval for: write_file('/brain/now/good.txt', 'safe data')
  2. Human approves, returns signed token
  3. Agent changes args to: write_file('/brain/stream/journal.log', 'evil')
  4. Agent uses same token (originally valid for different args)

Fix:

Verification: TOCTOU test


VUL-015: Sandbox Network Access

Severity: HIGH
Category: Container Escape

Description: run_command sandbox had network namespace, allowing:

Fix:

Verification: BYP-015 (sandbox network test)


VUL-016: Sandbox Write Paths

Severity: MEDIUM
Category: Container Configuration

Description: Sandbox mounted /brain/now as read-write, allowing:

Fix:

Verification: BYP-015 (sandbox write test)


VUL-017: Network Redirect Escape

Severity: MEDIUM
Category: HTTP Protocol

Description: Agent could request allowed domain that redirects to forbidden domain:

1. Request: https://allowed.com/redirect
2. 302 → https://evil.com/exfiltrate
3. Gate follows redirect without rechecking allowlist

Fix:

Verification: BYP-009 (redirect test)


Pass 3: Production Hardening (February 5, 2026, 19:00-19:30 EST)

Context: Moving from "works in testing" to "production-grade"

VUL-018: O_NOFOLLOW Not Actually Enforced

Severity: HIGH
Category: Implementation Bug

Description: Node.js fs.writeFile() doesn't support O_NOFOLLOW directly. Using it does nothing - symlinks still followed.

Fix:

Verification: Symlink tests with actual fs operations


VUL-019: Args Hash Ordering Dependency

Severity: HIGH
Category: Canonicalization

Description: SHA-256 of JSON.stringify depends on key order:

JSON.stringify({b: 2, a: 1}) !== JSON.stringify({a: 1, b: 2})

Agent could reorder object keys to get different hash, bypassing TOCTOU protection.

Fix:

Verification: BYP-014 (nested args hash test)


VUL-020: Seccomp Profile Contradictions

Severity: MEDIUM
Category: Container Security

Description: Seccomp profile blocked execve but allowed run_command. Commands couldn't execute, causing failures.

Fix:

Verification: Command execution in sandbox


VUL-021: Append-Only Timing Issue

Severity: MEDIUM
Category: Race Condition

Description: Journal files created normally, then chattr +a applied later. Between creation and attribute setting, file could be overwritten.

Fix:

Verification: File creation race test


VUL-022: DNS Rebinding Attack

Severity: MEDIUM
Category: Network Security

Description: Attacker controls DNS, can rebind domain to private IP:

1. Gate checks: allowed.com → 1.2.3.4 (public) → ALLOW
2. DNS rebinds: allowed.com → 192.168.1.1 (private)
3. Gate connects → accesses internal network

Fix:

Verification: BYP-010 (DNS rebinding test)


Pass 4: Precision Implementation (February 5, 2026, 19:30-20:00 EST)

Context: Ensuring production correctness of all mechanisms

VUL-023: IPC Framing Assumptions

Severity: HIGH
Category: Protocol Error

Description: Unix socket handler assumed one data event = one complete JSON message. TCP can split/coalesce messages, causing:

Fix:

Verification: Chunked message tests


VUL-024: Trust Boundary Confusion

Severity: MEDIUM
Category: Architecture

Description: Documentation unclear if agent trusted or not trusted. Mix of "agent cannot access socket" and "socket allows agent".

Fix:

Verification: Documentation review


VUL-025: Auth Token Missing Attestation Binding

Severity: MEDIUM
Category: Cryptographic Binding

Description: Request auth token (if we kept HMAC) bound to toolName + argsHash only. Didn't bind to full decision, allowing replay with different policy.

Fix:

Verification: Auth token tests (or removal)


VUL-026: Platform-Specific Code Without Fallback

Severity: LOW
Category: Portability

Description: /proc/self/fd/<fd> check is Linux-specific. On other platforms, fails unpredictably.

Fix:

Verification: Platform check test


VUL-027: Append-Only FS Assumptions

Severity: LOW
Category: Filesystem Dependency

Description: chattr +a requires ext4/btrfs. On other filesystems (xfs, zfs), command fails silently.

Fix:

Verification: Append-only test at startup


VUL-028: Seccomp+Cgroups Strategy Unclear

Severity: LOW
Category: Documentation

Description: Seccomp tried to limit clone() flags, but that's unreliable. Unclear if relying on seccomp or cgroups for process limits.

Fix:

Verification: Multi-layer enforcement tests


VUL-029: Single-Shot Ambiguity

Severity: LOW
Category: UX / Logic

Description: If agent asks ambiguous question in single-shot mode, gate must DENY. But unclear how to handle in practice.

Fix:

Verification: Ambiguous request tests


Pass 5: Last-Mile Precision (February 5, 2026, 20:00-20:30 EST)

Context: Final security review before production approval

VUL-030: SO_PEERCRED Model Inconsistency

Severity: MEDIUM
Category: IPC Security

Description: Socket perms + UID check implementation wasn't consistent with stated trust boundary.

Fix:

Verification: SO_PEERCRED tests


VUL-031: Approver Public Key in Token

Severity: HIGH
Category: Cryptographic Flaw

Description: Classic attack: token included approver public key. Attacker can:

  1. Generate own key pair
  2. Create token with own public key
  3. Sign with own private key
  4. Gate verifies with key from token → passes!

Fix:

Verification: Token forgery test


VUL-032: Nonce Replay After Restart

Severity: HIGH
Category: Replay Protection

Description: Nonces stored in-memory Set. Gate restart → Set cleared → nonces reusable within freshness window.

Fix:

Verification: Restart + replay test


VUL-033: Signature Format Ambiguity

Severity: MEDIUM
Category: Cryptographic Protocol

Description: Code signed "hex string of hash" but unclear if that's ASCII hex or raw bytes. Easy to get inconsistent.

Fix:

Verification: Signature interop tests


VUL-034: SO_PEERCRED Unimplemented

Severity: LOW
Category: Implementation Gap

Description: Code showed require('getsockopt') as placeholder. Not a real package.

Fix:

Verification: Actual peer verification


VUL-035: Approver Registry Integrity

Severity: LOW
Category: Trust Root

Description: Approver registry needed integrity protection. How do we know it's not tampered?

Fix:

Verification: Registry tampering test


VUL-036: RFC 8785 Canonicalization Unverified

Severity: LOW
Category: Standards Compliance

Description: Claimed RFC 8785 compliance but no test vectors to prove it.

Fix:

Verification: RFC test vector suite


Security Properties After All Fixes

Defense-in-Depth Layers

Layer 1: Process Isolation

Layer 2: OS-Level Constraints

Layer 3: Cryptographic Enforcement

Layer 4: Fail-Closed Behavior

Layer 5: Immutable Logging


Attack Surface Analysis

Eliminated Attack Vectors

Agent cannot write to protected paths (OS permissions)
Agent cannot bypass gate (executors only in gate process)
Agent cannot forge attestations (gate private key never exposed)
Agent cannot forge approval tokens (approver keys in registry)
Agent cannot replay requests (durable nonce storage)
Agent cannot access sandbox network (no network namespace)
Agent cannot escape via symlinks (O_NOFOLLOW enforced)
Agent cannot escape via hardlinks (nlink count check)
Agent cannot traverse paths (canonicalization)
Agent cannot rebind DNS (IP pinning)
Agent cannot TOCTOU approval (args hash binding)

Remaining Attack Vectors (Out of Scope v1.0)

OS kernel compromise (requires trusted kernel)
Hardware attacks (TPM integration planned Q1 2026)
Side channel attacks (timing, speculation)
Physical access (server room security)
Social engineering (human approver decision)


Verification and Testing

Bypass Test Suite (15 Tests)

BYP-001: Path traversal (../../../etc/passwd)
BYP-002: Absolute path to protected file
BYP-003: Symlink to protected path
BYP-004: Network request to private IP
BYP-005: Network request to forbidden domain
BYP-006: Parallel executor access
BYP-007: Missing attestation execution
BYP-008: Invalid approval token
BYP-009: Network redirect to forbidden domain
BYP-010: DNS rebinding to private IP
BYP-011: Hardlink to protected file
BYP-012: Encoded path traversal (%2e%2e%2f)
BYP-013: Unknown tool execution
BYP-014: Nested args TOCTOU via key reordering
BYP-015: Sandbox write to protected path

Result: 15/15 tests pass (100% required for production)

Integration Tests

Platform Tests


Lessons Learned

From the Audit Process

  1. Iterative hardening works - Each pass found new vulnerability classes
  2. External review is essential - ChatGPT found issues we missed
  3. Fail-closed is non-negotiable - Every error must DENY
  4. OS-level enforcement matters - Permissions > hope
  5. Defense-in-depth is real - Multiple layers catch bypass attempts

For Future Implementations

  1. Start with threat model - Know your adversary
  2. Assume compromise - Design for hostile agent
  3. Cryptographically bind everything - Hashes prevent TOCTOU
  4. Test bypass vectors - If you can't test it, it's not secure
  5. Document limitations - Honest claims build trust

Acknowledgments

Security audit partner:

Implementation:

Development environment:

Human leadership:


References


AOS Constitutional Gate Threat Model v1.0
February 6, 2026
36 Vulnerabilities, 36 Fixes, Production-Approved

💙⚖️🛡️