# Brain v3 Stack Design — Final Architecture

**Created:** 2026-03-18  
**POC Target:** USP by VG first, then Oraigami  
**Goal:** Self-improving brain that learns to operate on local models

---

## Core Principle

Every task done by a frontier model becomes training data for local models. Over time, the brain learns which tasks can be offloaded to free local inference, reducing cost to near-zero while maintaining accuracy.

---

## Two-Layer LLM Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           FRONTIER LAYER                                     │
│  Claude Opus, GPT-5, Gemini Pro — expensive, high accuracy                  │
│  Used for: Main interface, memory management, work review, task validation  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ spawns workflow generator
                                    ↓
┌─────────────────────────────────────────────────────────────────────────────┐
│                             OSS LAYER                                        │
│  Ollama: Qwen3, Llama3, Mistral, DeepSeek — free, local                     │
│  Used for: Task execution, learning iterations, batch processing            │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ learns optimal chains
                                    ↓
┌─────────────────────────────────────────────────────────────────────────────┐
│                          WORKFLOW MEMORY                                     │
│  Stores: task → frontier_output → oss_chain → accuracy_score → cost         │
│  Over time: brain learns which tasks can run 100% local                     │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Component Stack

### 1. NULLCLAW (Zig)

**Primary brain interface — memory + dispatch**

```
┌─────────────────────────────────────────────────────────────────┐
│                         NULLCLAW                                 │
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Memory Ops  │  │ Agent       │  │ Workflow                │  │
│  │             │  │ Dispatch    │  │ Generator               │  │
│  │ - add       │  │             │  │                         │  │
│  │ - search    │  │ - frontier  │  │ - capture frontier task │  │
│  │ - decay     │  │ - oss       │  │ - iterate with oss      │  │
│  │ - reinforce │  │ - route     │  │ - measure accuracy      │  │
│  │ - link      │  │ - validate  │  │ - store optimal chain   │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│                                                                  │
│  Interfaces: MCP | HTTP API | Nextcloud Talk                    │
└─────────────────────────────────────────────────────────────────┘
```

**Responsibilities:**
- Memory operations (sectors, decay, reinforcement, linking)
- Agent dispatch (decide frontier vs OSS based on learned patterns)
- Workflow generator (spawn OSS iterations for every frontier task)
- Chat interface (Nextcloud Talk channel)
- MCP server for external tools

### 2. SurrealDB (Primary Storage)

**All persistent data — documents, graph, time-series, workflows**

```surql
-- Namespace and database
USE NS brain DB knowledge;

-- ============================================
-- MEMORY SECTORS (OpenMemory-inspired)
-- ============================================

DEFINE TABLE memories SCHEMAFULL;
DEFINE FIELD content ON memories TYPE string;
DEFINE FIELD sector ON memories TYPE string 
  ASSERT $value IN ['episodic', 'semantic', 'procedural', 'emotional', 'reflective'];
DEFINE FIELD embedding ON memories TYPE array<float>;
DEFINE FIELD salience ON memories TYPE float DEFAULT 1.0;
DEFINE FIELD valid_from ON memories TYPE datetime DEFAULT time::now();
DEFINE FIELD valid_until ON memories TYPE option<datetime>;
DEFINE FIELD access_count ON memories TYPE int DEFAULT 0;
DEFINE FIELD last_accessed ON memories TYPE datetime DEFAULT time::now();
DEFINE FIELD created_at ON memories TYPE datetime DEFAULT time::now();
DEFINE FIELD source ON memories TYPE string;
DEFINE FIELD metadata ON memories FLEXIBLE TYPE object;

DEFINE INDEX idx_memories_sector_salience ON memories COLUMNS sector, salience;
DEFINE INDEX idx_memories_valid ON memories COLUMNS valid_from, valid_until;

-- Decay edge (supersession tracking)
DEFINE TABLE supersedes SCHEMAFULL;
DEFINE FIELD in ON supersedes TYPE record<memories>;
DEFINE FIELD out ON supersedes TYPE record<memories>;
DEFINE FIELD reason ON supersedes TYPE string;
DEFINE FIELD timestamp ON supersedes TYPE datetime DEFAULT time::now();

-- ============================================
-- WORKFLOW LEARNING (distillation tracking)
-- ============================================

DEFINE TABLE workflow_executions SCHEMAFULL;
DEFINE FIELD task_hash ON workflow_executions TYPE string; -- hash of input
DEFINE FIELD task_description ON workflow_executions TYPE string;
DEFINE FIELD input_context ON workflow_executions TYPE string;
DEFINE FIELD created_at ON workflow_executions TYPE datetime DEFAULT time::now();

-- Frontier execution (gold standard)
DEFINE TABLE frontier_runs SCHEMAFULL;
DEFINE FIELD workflow ON frontier_runs TYPE record<workflow_executions>;
DEFINE FIELD model ON frontier_runs TYPE string; -- claude-opus-4, gpt-5, etc.
DEFINE FIELD output ON frontier_runs TYPE string;
DEFINE FIELD tokens_in ON frontier_runs TYPE int;
DEFINE FIELD tokens_out ON frontier_runs TYPE int;
DEFINE FIELD cost_usd ON frontier_runs TYPE float;
DEFINE FIELD duration_ms ON frontier_runs TYPE int;
DEFINE FIELD timestamp ON frontier_runs TYPE datetime DEFAULT time::now();

-- OSS iteration attempts
DEFINE TABLE oss_iterations SCHEMAFULL;
DEFINE FIELD workflow ON oss_iterations TYPE record<workflow_executions>;
DEFINE FIELD frontier_run ON oss_iterations TYPE record<frontier_runs>; -- reference gold
DEFINE FIELD iteration ON oss_iterations TYPE int;
DEFINE FIELD chain ON oss_iterations TYPE array; -- [{model, prompt_template, output}, ...]
DEFINE FIELD final_output ON oss_iterations TYPE string;
DEFINE FIELD accuracy_score ON oss_iterations TYPE float; -- 0-1 similarity to frontier
DEFINE FIELD total_steps ON oss_iterations TYPE int;
DEFINE FIELD total_tokens ON oss_iterations TYPE int;
DEFINE FIELD duration_ms ON oss_iterations TYPE int;
DEFINE FIELD timestamp ON oss_iterations TYPE datetime DEFAULT time::now();

DEFINE INDEX idx_oss_accuracy ON oss_iterations COLUMNS accuracy_score DESC;

-- Learned optimal chains (what OSS chain matches frontier for task type)
DEFINE TABLE learned_chains SCHEMAFULL;
DEFINE FIELD task_pattern ON learned_chains TYPE string; -- regex or embedding cluster
DEFINE FIELD optimal_chain ON learned_chains TYPE array; -- best OSS sequence
DEFINE FIELD avg_accuracy ON learned_chains TYPE float;
DEFINE FIELD sample_count ON learned_chains TYPE int;
DEFINE FIELD confidence ON learned_chains TYPE float; -- increases with sample_count
DEFINE FIELD can_skip_frontier ON learned_chains TYPE bool DEFAULT false;
DEFINE FIELD created_at ON learned_chains TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON learned_chains TYPE datetime DEFAULT time::now();

DEFINE INDEX idx_learned_confidence ON learned_chains COLUMNS confidence DESC;

-- ============================================
-- AGENT REGISTRY
-- ============================================

DEFINE TABLE agents SCHEMAFULL;
DEFINE FIELD name ON agents TYPE string;
DEFINE FIELD layer ON agents TYPE string ASSERT $value IN ['frontier', 'oss'];
DEFINE FIELD provider ON agents TYPE string; -- anthropic, openai, ollama
DEFINE FIELD model ON agents TYPE string;
DEFINE FIELD endpoint ON agents TYPE string;
DEFINE FIELD capabilities ON agents TYPE array<string>; -- code, chat, vision, etc.
DEFINE FIELD cost_per_1k_in ON agents TYPE float;
DEFINE FIELD cost_per_1k_out ON agents TYPE float;
DEFINE FIELD avg_latency_ms ON agents TYPE int;
DEFINE FIELD active ON agents TYPE bool DEFAULT true;

-- ============================================
-- DISPATCH QUEUE
-- ============================================

DEFINE TABLE dispatch_queue SCHEMAFULL;
DEFINE FIELD task ON dispatch_queue TYPE string;
DEFINE FIELD context ON dispatch_queue TYPE string;
DEFINE FIELD priority ON dispatch_queue TYPE int DEFAULT 5;
DEFINE FIELD target_layer ON dispatch_queue TYPE string; -- frontier, oss, auto
DEFINE FIELD target_agent ON dispatch_queue TYPE option<record<agents>>;
DEFINE FIELD status ON dispatch_queue TYPE string DEFAULT 'pending';
DEFINE FIELD created_at ON dispatch_queue TYPE datetime DEFAULT time::now();
DEFINE FIELD picked_at ON dispatch_queue TYPE option<datetime>;
DEFINE FIELD completed_at ON dispatch_queue TYPE option<datetime>;
DEFINE FIELD result ON dispatch_queue TYPE option<string>;

DEFINE INDEX idx_dispatch_status ON dispatch_queue COLUMNS status, priority DESC;

-- ============================================
-- DOCUMENTS (existing schema enhanced)
-- ============================================

DEFINE TABLE raw_documents SCHEMAFULL;
DEFINE FIELD source ON raw_documents TYPE string;
DEFINE FIELD content ON raw_documents TYPE string;
DEFINE FIELD metadata ON raw_documents FLEXIBLE TYPE object;
DEFINE FIELD created_at ON raw_documents TYPE datetime DEFAULT time::now();

DEFINE TABLE staged_documents SCHEMAFULL;
DEFINE FIELD raw_id ON staged_documents TYPE record<raw_documents>;
DEFINE FIELD chunks ON staged_documents TYPE array;
DEFINE FIELD doc_type ON staged_documents TYPE string;
DEFINE FIELD created_at ON staged_documents TYPE datetime DEFAULT time::now();

DEFINE TABLE indexed_documents SCHEMAFULL;
DEFINE FIELD staged_id ON indexed_documents TYPE record<staged_documents>;
DEFINE FIELD entities ON indexed_documents TYPE array;
DEFINE FIELD embedding ON indexed_documents TYPE array<float>;
DEFINE FIELD sector ON indexed_documents TYPE string DEFAULT 'semantic';
DEFINE FIELD salience ON indexed_documents TYPE float DEFAULT 1.0;
DEFINE FIELD created_at ON indexed_documents TYPE datetime DEFAULT time::now();
```

### 3. DragonflyDB (Cache + Queue)

**Hot data only — working memory, task dispatch**

```
Key Patterns:
- working_memory:{session_id}    → current conversation context (TTL 30m)
- dispatch:{task_id}             → task awaiting pickup (list)
- result:{task_id}               → completed task result (TTL 1h)
- embedding_cache:{hash}         → cached embeddings (TTL 24h)
- agent_health:{agent_name}      → last heartbeat (TTL 5m)
```

### 4. Ollama (OSS Models)

**Local inference endpoint — 10.11.12.105:11434**

Available models (current):
- qwen3:8b (code, general)
- llama3:8b (general)
- mistral:7b (fast general)
- deepseek-coder:6.7b (code)
- nomic-embed-text (embeddings, 768-dim)

To add for workflow learning:
- qwen3:32b (higher accuracy)
- codestral:22b (code specialist)
- mixtral:8x7b (MoE, diverse)

---

## Workflow Learning Loop

```
┌──────────────────────────────────────────────────────────────────────────┐
│                      FRONTIER TASK EXECUTION                              │
│                                                                           │
│  1. User request → Nullclaw                                              │
│  2. Nullclaw checks learned_chains: can this skip frontier?              │
│     - If confidence > 0.95 and accuracy > 0.90 → use OSS directly        │
│     - Else → dispatch to frontier                                        │
│  3. Frontier executes, returns output                                    │
│  4. Store in frontier_runs with cost/tokens/duration                     │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ↓ (async, non-blocking)
┌──────────────────────────────────────────────────────────────────────────┐
│                      OSS LEARNING ITERATIONS                              │
│                                                                           │
│  For each frontier_run where learned_chain confidence < 0.95:            │
│                                                                           │
│  ITERATION 1: Single model, direct prompt                                │
│    - Try qwen3:8b with same prompt                                       │
│    - Measure accuracy vs frontier output (embedding similarity + LLM judge)│
│    - If accuracy > 0.90 → found optimal, store and stop                  │
│                                                                           │
│  ITERATION 2: Single model, enhanced prompt                              │
│    - Add chain-of-thought, examples from similar tasks                   │
│    - Measure accuracy                                                     │
│                                                                           │
│  ITERATION 3: Two-model chain                                            │
│    - Model A: decompose task into steps                                  │
│    - Model B: execute each step                                          │
│    - Measure accuracy                                                     │
│                                                                           │
│  ITERATION N: Multi-model chain with verification                        │
│    - Model A: plan, Model B: execute, Model C: verify                    │
│    - Measure accuracy                                                     │
│                                                                           │
│  Store best chain in learned_chains with accuracy + confidence           │
└──────────────────────────────────────────────────────────────────────────┘
```

### Accuracy Measurement

```python
def measure_accuracy(frontier_output: str, oss_output: str, task_type: str) -> float:
    # 1. Embedding similarity (fast, cheap)
    frontier_emb = embed(frontier_output)
    oss_emb = embed(oss_output)
    embedding_sim = cosine_similarity(frontier_emb, oss_emb)
    
    # 2. LLM judge (for nuanced comparison) — use small local model
    judge_prompt = f"""
    Compare these two outputs for the task: {task_type}
    
    Reference (gold standard):
    {frontier_output}
    
    Candidate:
    {oss_output}
    
    Rate accuracy 0-100:
    """
    judge_score = ollama.generate("mistral:7b", judge_prompt) / 100
    
    # 3. Structural match (for code: AST comparison, for JSON: schema match)
    structural_score = compare_structure(frontier_output, oss_output, task_type)
    
    # Weighted average
    return embedding_sim * 0.3 + judge_score * 0.5 + structural_score * 0.2
```

---

## Dispatch Routing Logic

```zig
// Nullclaw dispatch decision
fn routeTask(task: Task) DispatchTarget {
    // 1. Check if we have a learned chain for this task pattern
    const learned = db.query(
        "SELECT * FROM learned_chains WHERE task_pattern MATCHES $pattern AND confidence > 0.90",
        .{task.pattern}
    );
    
    if (learned) |chain| {
        if (chain.can_skip_frontier and chain.avg_accuracy > 0.90) {
            // We've learned this! Use OSS directly
            return .{ .layer = .oss, .chain = chain.optimal_chain };
        }
    }
    
    // 2. Check task complexity/risk
    if (task.is_destructive or task.requires_review) {
        // Always use frontier for risky operations
        return .{ .layer = .frontier, .spawn_learning = true };
    }
    
    // 3. Default: frontier with learning
    return .{ .layer = .frontier, .spawn_learning = true };
}
```

---

## POC Phases

### Phase 1: Core Stack (Week 1)

**Goal:** SurrealDB schema + Nullclaw skeleton + basic dispatch

- [ ] Deploy SurrealDB schema above (extend existing)
- [ ] Nullclaw HTTP server (Zig, basic routes)
- [ ] Memory operations: add, search, decay cron
- [ ] Frontier dispatch: Claude CLI wrapper
- [ ] OSS dispatch: Ollama wrapper
- [ ] Task queue in DragonflyDB

### Phase 2: Workflow Learning (Week 2)

**Goal:** Capture frontier runs, iterate with OSS

- [ ] Workflow capture: hash task, store frontier output
- [ ] Iteration runner: async process tries OSS chains
- [ ] Accuracy measurement: embedding + judge + structure
- [ ] Learned chain storage with confidence scoring

### Phase 3: Smart Routing (Week 3)

**Goal:** Brain starts routing tasks to OSS when confident

- [ ] Pattern matching for task types
- [ ] Confidence threshold tuning
- [ ] Metrics dashboard: frontier vs OSS usage, cost savings
- [ ] Fallback: if OSS fails validation, retry with frontier

### Phase 4: USP Ingestion (Week 3-4)

**Goal:** Feed brain with USP knowledge

- [ ] Google Drive connector (docs, sheets)
- [ ] GitLab connector (repos, issues, MRs)
- [ ] Vikunja connector (tasks, projects)
- [ ] Nextcloud connector (files, talk history)
- [ ] Sector classification at ingest
- [ ] Decay + salience scoring

### Phase 5: Oraigami Expansion (Week 4+)

**Goal:** Apply learned patterns to Oraigami

- [ ] Oraigami data sources (Gitea, Drive, Vikunja)
- [ ] Namespace isolation (brain.usp vs brain.oraigami)
- [ ] Cross-namespace learning (if USP learned a pattern, Oraigami can use it)

---

## Success Metrics

| Metric | Week 1 | Week 4 | Long-term |
|--------|--------|--------|-----------|
| Frontier API cost/day | $X (baseline) | -30% | -80% |
| Tasks routed to OSS | 0% | 20% | 70%+ |
| OSS accuracy (avg) | N/A | 85% | 92%+ |
| Learned chains | 0 | 50 | 500+ |
| Memory decay working | No | Yes | Tuned |

---

## Resource Requirements

| Component | Memory | CPU | Disk | Location |
|-----------|--------|-----|------|----------|
| SurrealDB | 2GB | 2 cores | 20GB | 10.11.12.105 |
| DragonflyDB | 1GB | 1 core | - | 10.11.12.105 |
| Ollama | 16GB | 4 cores | 50GB | 10.11.12.105 |
| Nullclaw | 256MB | 1 core | - | 10.11.12.105 |

**Total on 105:** ~20GB RAM, 8 cores (VEP1485 has 64GB, plenty of headroom)

---

## Decommission Plan

After POC validation:

1. **PostgreSQL** → migrate semantic/episodic data to SurrealDB memories table
2. **Neo4j** → graph relationships now in SurrealDB record links
3. **pgvector** → embeddings in SurrealDB array fields
4. **brain-nanoclaw (TypeScript)** → replaced by Nullclaw (Zig)

Keep DragonflyDB (it's good at what it does).

---

## Design Decisions (Finalized 2026-03-18)

1. **Embedding model:** 768-dim via Ollama nomic-embed-text
   - Accuracy over compression
   - 384-dim was for test builds, no longer needed
   - All PostgreSQL/TimescaleDB data reingested to SurrealDB with 768-dim

2. **Accuracy judge model:** Qwen 3.5 (4B or 8B quantized) on Ollama
   - Use latest available, don't waste time on older models
   - Run quantized for speed without sacrificing too much accuracy

3. **Confidence threshold:** Start strict at 0.95
   - Bootstrap Nullclaw with existing memory: SOUL.md, TOOLS.md, MEMORY.md, AGENTS.md
   - Learning process — prompt iterations to reach higher confidence before deciding best seeding strategy
   - Tune down only after patterns validated

4. **Task pattern matching:** Embedding clusters (NOT regex)
   - Group similar embeddings, propagate updates when source of truth changes
   - Regex was a failure in brain v2 — couldn't handle lists, multi-entity queries
   - Clustering allows graceful decay of old similar data when new truth arrives

---

## Bootstrap Strategy

Nullclaw initial knowledge seeded from:
- `SOUL.md` — personality, purpose, boundaries
- `TOOLS.md` — infrastructure details, credentials, endpoints
- `MEMORY.md` — curated long-term knowledge
- `AGENTS.md` — operational patterns, safety rules
- `memory/*.md` — recent context files

This becomes the foundation. Frontier interactions then expand and refine.

---

## Next Steps

1. ✅ Finalize this document
2. ⏳ Varij review on WhatsApp
3. ⏳ Create Vikunja tasks at tasks.usp.vg
4. ⏳ Begin Phase 1: Core stack deployment

---

*"The goal is a brain that runs for free on local hardware, having learned everything from frontier models."*
