Initial commit: homelab infrastructure wiki

- Full Obsidian vault content - Host configs (ice, grizzley, ubuntu, proxmox, truenas, panda, hyte) - Media stack documentation - Traefik HA setup - Automation scripts - Bachelor party planning
2026-05-24 16:08:40 -07:00
parent d132442429
commit e4d91aadf9
285 changed files with 30018 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,33 @@
+# AGENTS.md
+
+## OVERVIEW
+Obsidian vault for homelab infrastructure documentation using Dataview and Tasks plugins.
+
+## STRUCTURE
+- `homelab/` - Infrastructure architecture, host configs, runbooks
+- `ai-assistant/` - OpenCode agent configuration and workflows
+- `automation/` - Script documentation and deployment workflows
+- `platform-config/` - Docker, Traefik, and container orchestration
+- `tools/` - Tool guides and documentation
+- `Templates/` - Project and task templates
+- `Dashboard/` - Project status tracking
+
+## WHERE TO LOOK
+- Start: `vault-readme.md` for vault overview
+- Infrastructure: `homelab/architecture.md` for full architecture
+- Tasks: Use `project.md` in each subdirectory with Dataview queries
+- Templates: Reference `Templates/project-template.md` for structure
+
+## CONVENTIONS
+- **YAML Frontmatter**: Use `project:` metadata in all project.md files
+- **Links**: Standard Obsidian wiki-links: `[[filename.md|Display Text]]`
+- **Dataview**: Embed queries in ```dataview blocks for task tracking
+- **Tasks**: Use `[ ]` for incomplete, `[x]` for completed
+- **Dates**: ISO 8601 format in YAML
+
+## ANTI-PATTERNS
+- DON'T create tasks without Dataview query in project.md
+- DON'T use hardcoded file paths in links - use Obsidian wiki-links
+- DON'T duplicate templates - inherit from Templates/ directory
+- DON'T mix project content in vault root - use subdirectories
+- DON'T store credentials, API tokens, or secrets in vault files
--- a/Dashboard/agents-dashboard.md
+++ b/Dashboard/agents-dashboard.md
@@ -0,0 +1,74 @@
+---
+title: Agent Memory Dashboard
+type: dashboard
+updated: 2026-04-27
+---
+
+# Agent Memory Dashboard
+
+Cross-agent memory syncing via RustFS S3 → Obsidian vault. See [[vault-readme|Vault Overview]] for architecture.
+
+## Per-Agent Current Tasks
+
+### Ice (RPi4 — Control Plane)
+```dataview
+LIST file.ctime, file.mtime
+FROM "agents/ice/memory"
+SORT file.mtime DESC
+LIMIT 5
+```
+
+### Grizzley (RPi5 — Edge)
+```dataview
+LIST file.ctime, file.mtime
+FROM "agents/grizzley/memory"
+SORT file.mtime DESC
+LIMIT 5
+```
+
+### Ubuntu (Docker Host)
+```dataview
+LIST file.ctime, file.mtime
+FROM "agents/ubuntu/memory"
+SORT file.mtime DESC
+LIMIT 5
+```
+
+## Recent Facts (All Agents)
+```dataview
+LIST row.agent, row.content
+FROM "agents"
+WHERE file.starred = false
+SORT file.mtime DESC
+LIMIT 20
+```
+
+## All Agent Notes
+```dataview
+LIST file.link, file.mtime
+FROM "agents"
+SORT file.mtime DESC
+```
+
+## Daily Notes (Last 7 Days)
+```dataview
+LIST file.link
+FROM "daily"
+WHERE file.day >= date(today) - dur(7 days)
+SORT file.day DESC
+```
+
+## Shared Entities
+```dataview
+LIST file.link, row.category
+FROM "entities"
+WHERE row.trust_score >= 0.3
+SORT row.trust_score DESC
+LIMIT 20
+```
+
+## Quick Links
+- [[agents/ice/memory/current-task|Ice — Current Task]]
+- [[agents/grizzley/memory/current-task|Grizzley — Current Task]]
+- [[agents/ubuntu/memory/current-task|Ubuntu — Current Task]]
+- [[agents/ice/memory/recent-facts|Ice — Recent Facts]]
--- a/Dashboard/project-status.md
+++ b/Dashboard/project-status.md
@@ -0,0 +1,40 @@
+# Project Status Dashboard
+
+## All Projects
+```dataview
+TABLE project.status AS Status, 
+      project.category AS Category,
+      file.cday AS Created
+FROM "**/project.md"
+SORT project.status ASC, project.category ASC
+```
+
+## Active Tasks
+```dataview
+TASK
+FROM "**/tasks"
+WHERE !completed
+SORT project ASC, file.name ASC
+```
+
+## Recently Modified
+```dataview
+TABLE file.link AS File, file.mtime AS Modified
+FROM ""
+WHERE file.mtime >= date(today) - dur(7 days) AND !contains(file.path, "node_modules")
+SORT file.mtime DESC
+LIMIT 20
+```
+
+## Tasks by Project
+```dataview
+TABLE rows.file.tasks.length AS TotalTasks
+FROM "**/project.md"
+GROUP BY project.name AS Project
+```
+
+## Quick Links
+- [[../homelab/project.md|Homelab Infrastructure]]
+- [[../automation/project.md|Automation Scripts]]
+- [[../platform-config/project.md|Platform Configuration]]
+- [[../ai-assistant/project.md|AI Assistant Config]]
--- a/Plan.md
+++ b/Plan.md
@@ -0,0 +1,193 @@
+# IoT Device Reorganization Plan
+
+**Created:** 2026-04-20
+**Status:** Draft — requires review before execution
+
+## Problem Statement
+
+1. **Duplicate naming** — Three `Aqara Light Switch H2 US` devices with no way to tell which is which. Light entities numbered arbitrarily (`light.aqara_light_switch_h2_us_2` through `_16`) with no relation to physical location.
+2. **Relay vs. physical switch confusion** — Aqara H2 switches have both relay outputs AND physical button inputs. The ceiling light in the Baby Room is controlled by an Aqara H2 switch relay, but the physical wall switch also has a button event. These are mixed together.
+3. **Ecosystem overlap** — The same physical lights are controlled by multiple systems:
+   - Shelly relays (Bedroom/Office ceiling lights)
+   - Aqara H2 switches (Baby Room, Front Door, Entrance)
+   - TP-Link smart plugs (Left Lamp, Right Lamp, Tall Lamp)
+   - Govee LED strips (H6076, H60A4, H60A1)
+   - Aqara ceiling light fixture (Colorful Ceiling Light 36W)
+4. **Entity suffix chaos** — Matter re-commissioning appends `_2`, `_3`, etc. to entity IDs, making them meaningless. Duplicate entities (e.g., `lock.aqara_smart_lock_u100` AND `lock.aqara_smart_lock_u100_2`) suggest double-commissioning.
+
+---
+
+## Current Inventory (44 devices, 339 entities)
+
+### Lighting — 12 devices, HEAVILY OVERLAPPED
+
+| Location | Physical Device | Integration | Entity | Issue |
+|----------|----------------|-------------|--------|-------|
+| **Bedroom** | Ceiling light (wired to Shelly 1PM) | Shelly | `switch.shelly1pmg4_a085e3bb2898` | Good name, but entity ID is MAC-based |
+| **Bedroom** | Govee LED strip | Govee | `light.h60a1` | Named "Ceiling Light" — conflicts with actual ceiling light |
+| **Bedroom** | Left Lamp (TP-Link HS103 plug) | TP-Link | `switch.left_lamp` | Good name |
+| **Bedroom** | Right Lamp (TP-Link HS103 plug) | TP-Link | `switch.right_lamp` | Good name |
+| **Office** | Ceiling light (wired to Shelly 1PM) | Shelly | `switch.shelly1pmg4_a085e3b7fc74` | MAC-based entity ID |
+| **Office** | Grizzley Pi Power (TP-Link HS103 plug) | TP-Link | `switch.bug_zapper` | Controls grizzley host power — entity name misleading |
+| **Living Room** | Tall Lamp (TP-Link KP115 plug) | TP-Link | `switch.tall_lamp` | Good name |
+| **Living Room** | Govee H6076 strip #1 | Govee | `light.h6076` | No friendly name |
+| **Living Room** | Govee H6076 strip #2 | Govee | `light.h6076_2` | Duplicate model, no distinction |
+| **Living Room** | Govee H60A4 strip | Govee | `light.h60a4` | No friendly name |
+| **Baby Room** | Aqara H2 switch (dual relay) | Matter | `light.aqara_light_switch_h2_us`, `_2` | Which relay is main light vs ring light? |
+| **Baby Room** | Aqara Ceiling Light 36W (fixture) | Matter | `light.colorful_ceiling_light_36w`, `_2`, `_3`, `_4` | 4 light entities for 1 fixture (main + ring?) — needs clarification |
+
+### Aqara Switches (H2 US) — 3 IDENTICAL DEVICE NAMES
+
+| Area | Device | Relays | Physical Buttons | Entity Count |
+|------|--------|--------|-----------------|--------------|
+| **Baby Room** | Aqara Light Switch H2 US | 2 relays: `light._us`, `light._us_2` | 2 button events | 24 entities |
+| **Front Door** | Aqara Light Switch H2 US | 2 relays: `light._us_6` through `_us_8`? | Multiple buttons | 36 entities |
+| **Entrance** | Aqara Light Switch H2 US | 2 relays: `light._us_3` through `_us_5`? | Multiple buttons | 36 entities |
+
+**Key confusion:** Each H2 has 2 physical buttons + 2 relays, but the entity numbering (`_us_3`, `_us_6`, `_us_7`, etc.) doesn't map to which physical button controls which relay. After re-commissioning, these will all reset and need clear naming.
+
+### Sensors & Locks
+
+| Location | Device | Integration | Entity | Status |
+|----------|--------|-------------|--------|--------|
+| **Living Room** | Aqara Motion Sensor P1 | Matter | `binary_sensor.aqara_motion_sensor_p1_occupancy` | Good |
+| **Rooftop Door** | Aqara Door/Window Sensor | Matter | `binary_sensor.*_door` | Good |
+| **Rooftop Door** | Aqara Vibration Sensor T1 | Matter | `binary_sensor.*_occupancy` | Good |
+| **Front Door** | Aqara Smart Lock U100 | Matter | `lock.aqara_smart_lock_u100` | Has duplicate `_2` entities |
+| **Front Door** | Aqara Doorbell G410 | Matter | `button.*_identify` only | Limited Matter support? |
+| **Garage** | Aqara Camera Hub G3 | Matter | `button.*_identify` only | Limited Matter support |
+| **Entrance** | IKEA MYGGBETT door sensor | Matter | `binary_sensor.*_door` | Good |
+| **Entrance** | IKEA MYGGSPRAY motion sensor | Matter | `binary_sensor.*_occupancy` | Good |
+| **Garage** | IKEA MYGGBETT door sensor | Matter | `binary_sensor.*_door_2` | Suffix `_2` from re-commissioning |
+| **Laundry** | IKEA KLIPPBOK water leak | Matter | `binary_sensor.*_water_leak` | Good |
+| **Living Room** | IKEA ALPSTUGA air quality | Matter | `sensor.*_co2`, `*_pm25`, etc. | Good |
+| **Office** | IKEA TIMMERFLOTTE temp/humidity | Matter | `sensor.*_temperature`, `*_humidity` | Good |
+
+---
+
+## Reorganization Plan
+
+### Phase 1: Clean Slate — Re-commission Matter (DO THIS FIRST)
+
+Since we already removed the Matter integration:
+
+1. **Add Matter integration** in HA Settings > Devices & Services
+2. **Commission Aqara M3 hub first** — it bridges all Zigbee children
+3. **Commission IKEA devices** one at a time
+4. **Rename EVERY device immediately after commissioning** using this convention:
+
+### Naming Convention
+
+**Format:** `{Room} {Function}`
+
+| Old Name | New Device Name | Primary Entity |
+|----------|----------------|---------------|
+| Aqara Light Switch H2 US (Baby Room) | Baby Room Light Switch | `light.baby_room_light`, `light.baby_room_ring_light` |
+| Aqara Light Switch H2 US (Front Door) | Front Door Light Switch | `light.front_door_porch_light`, `light.front_door_hall_light` |
+| Aqara Light Switch H2 US (Entrance) | Entrance Light Switch | `light.entrance_hall_light`, `light.entrance_outside_light` |
+| Colorful Ceiling Light 36W | Baby Room Ceiling Light | `light.baby_room_ceiling_main`, `light.baby_room_ceiling_ring` |
+| Aqara Door and Window Sensor | Rooftop Door Sensor | `binary_sensor.rooftop_door` |
+| Aqara Vibration Sensor T1 | Rooftop Vibration Sensor | `binary_sensor.rooftop_vibration` |
+| Aqara Motion Sensor P1 | Living Room Motion Sensor | `binary_sensor.living_room_motion` |
+| Aqara Smart Lock U100 | Front Door Lock | `lock.front_door` |
+| Aqara Smart Video Doorbell G410 | Front Door Doorbell | (limited Matter support) |
+| Aqara Camera Hub G3 | Garage Camera Hub | (limited Matter support) |
+| ALPSTUGA air quality monitor | Living Room Air Quality | `sensor.living_room_co2`, etc. |
+| MYGGBETT door/window sensor (Entrance) | Entrance Door Sensor | `binary_sensor.entrance_door` |
+| MYGGBETT door/window sensor (Garage) | Garage Door Sensor | `binary_sensor.garage_door` |
+| MYGGSPRAY motion sensor | Entrance Motion Sensor | `binary_sensor.entrance_motion` |
+| KLIPPBOK water leak sensor | Laundry Leak Sensor | `binary_sensor.laundry_water_leak` |
+| TIMMERFLOTTE temp/hmd sensor | Office Climate Sensor | `sensor.office_temperature`, `sensor.office_humidity` |
+
+### Phase 2: Fix Non-Matter Devices
+
+| Device | Action |
+|--------|--------|
+| **Shelly Bedroom Ceiling Relay** | Rename device to "Bedroom Ceiling Light", rename entity to `switch.bedroom_ceiling_light` |
+| **Shelly Office Ceiling Relay** | Rename device to "Office Ceiling Light", rename entity to `switch.office_ceiling_light` |
+| **Govee "Ceiling Light" (H60A1)** | Rename to "Bedroom LED Strip", entity to `light.bedroom_led_strip` |
+| **Govee H6076 #1** | Rename to "Living Room TV Backlight", entity to `light.living_room_tv_backlight` |
+| **Govee H6076 #2** | Rename to "Living Room Shelf Light", entity to `light.living_room_shelf_light` |
+| **Govee H60A4** | Rename to "Living Room Ambient Strip", entity to `light.living_room_ambient_strip` |
+| **TP-Link "Grizzley Pi"** | Rename entity from `switch.bug_zapper` to `switch.grizzley_power`, rename device to "Grizzley Host Power" — this is the remote power control for the grizzley Pi (only non-PoE Pi) |
+| **LG TV (cast)** | Merge with webostv device if possible, or accept duplicate |
+
+### Phase 3: Clarify Relay ↔ Physical Switch Mapping
+
+For each Aqara H2 switch, document the physical wiring:
+
+```
+Baby Room H2 Switch:
+  Physical Button 1 (left) → Relay 1 → [what does it control?]
+  Physical Button 2 (right) → Relay 2 → [what does it control?]
+
+Front Door H2 Switch:
+  Physical Button 1 → Relay 1 → [porch light?]
+  Physical Button 2 → Relay 2 → [hall light?]
+
+Entrance H2 Switch:
+  Physical Button 1 → Relay 1 → [entrance hall?]
+  Physical Button 2 → Relay 2 → [outside light?]
+```
+
+**Action:** After re-commissioning, physically press each button and observe which relay toggles. Then name accordingly.
+
+### Phase 4: Clean Up Duplicate/Misleading Entities
+
+- Delete duplicate Matter entities (the `_2` suffixed ones from double-commissioning)
+- Disable `button.*_identify` entities (not useful for daily use)
+- Disable `sensor.*_battery_type` and `sensor.*_battery_voltage` (keep only `*_battery` percentage)
+- Disable `number.*_on_level` and `select.*_power_on_behavior` (advanced settings)
+- Keep: primary light/switch/lock/sensor entities + battery percentage + firmware update
+
+---
+
+## Device → Integration → Ecosystem Map
+
+### Lighting Control Paths (identify conflicts)
+
+```
+Bedroom Ceiling ──→ Shelly 1PM (Wi-Fi) ──→ HA Shelly integration
+Bedroom Lamps  ──→ TP-Link HS103 (Wi-Fi) ──→ HA TP-Link integration
+Bedroom LEDs   ──→ Govee H60A1 (BLE/Wi-Fi) ──→ HA Govee integration
+
+Baby Room Main ──→ Aqara H2 Relay 1 (Zigbee→M3→Matter) ──→ HA Matter
+Baby Room Ring ──→ Aqara H2 Relay 2 (Zigbee→M3→Matter) ──→ HA Matter
+Baby Room Fixture ──→ Aqara Ceiling Light 36W (Zigbee→M3→Matter) ──→ HA Matter
+
+Living Room Tall Lamp ──→ TP-Link KP115 (Wi-Fi) ──→ HA TP-Link
+Living Room Strips  ──→ Govee (BLE/Wi-Fi) ──→ HA Govee
+
+Office Ceiling ──→ Shelly 1PM (Wi-Fi) ──→ HA Shelly
+```
+
+**No actual conflicts** — each physical light has ONE control path. The "overlap" is naming confusion, not actual duplicate control.
+
+### Infrastructure Controls (not consumer IoT)
+
+| Device | Integration | Purpose | Current Name |
+|--------|-------------|---------|-------------|
+| TP-Link HS103 plug | TP-Link | Remote power for grizzley Pi (only non-PoE host) | `switch.bug_zapper` |
+
+**Action:** Rename to `switch.grizzley_host_power`. Consider adding to a "Infrastructure" area in HA and protecting with a confirmation prompt to prevent accidental shutdown.
+
+---
+
+## Execution Checklist
+
+- [ ] **Phase 1:** Re-commission Matter devices with proper names
+  - [ ] Add Matter integration in HA
+  - [ ] Commission Aqara M3 hub (name: "Bedroom Hub M3")
+  - [ ] Commission each IKEA sensor with location-based name
+  - [ ] Rename all devices immediately upon discovery
+- [ ] **Phase 2:** Rename non-Matter devices in HA
+  - [ ] Shelly relays → descriptive names
+  - [ ] Govee lights → room + function names
+  - [ ] TP-Link "Grizzley Pi" → "Bug Zapper"
+- [ ] **Phase 3:** Map Aqara H2 physical buttons → relays
+  - [ ] Baby Room switch
+  - [ ] Front Door switch
+  - [ ] Entrance switch
+- [ ] **Phase 4:** Disable noisy entities (identify buttons, battery voltage, power-on behavior)
+- [ ] **Phase 5:** Update HA automations with new entity IDs
+- [ ] **Phase 6:** Update Alexa routines if entity names changed
--- a/Templates/project-template.md
+++ b/Templates/project-template.md
@@ -0,0 +1,47 @@
+---
+project:
+  name: ""
+  status: planning|active|completed|archived
+  category: infrastructure|application|automation|configuration
+  source: ""
+  created: 2026-01-06
+  updated: 2026-01-06
+  description: ""
+  goals: []
+  priority: high|medium|low
+  tags: []
+---
+
+# Project:
+
+## Overview
+
+## Goals
+-
+
+## Components
+
+### Key Files
+
+### Documentation
+
+## Related Projects
+
+## Tasks
+```dataview
+TASK
+FROM "<project-folder>/tasks"
+WHERE !completed
+SORT priority ASC, file.name ASC
+```
+
+## Recent Changes
+```dataview
+TABLE file.mtime AS Modified, file.link AS File
+FROM "<project-folder>/"
+WHERE file.mtime >= date(today) - dur(7 days)
+SORT file.mtime DESC
+LIMIT 10
+```
+
+## Notes
--- a/Templates/script-template.md
+++ b/Templates/script-template.md
@@ -0,0 +1,35 @@
+---
+script:
+  name: ""
+  type: shell|python|automation
+  path: ""
+  purpose: ""
+  usage: ""
+  requires: []
+  created: 2026-01-06
+  updated: 2026-01-06
+---
+
+# Script:
+
+## Purpose
+
+## Usage
+
+```bash
+# Basic usage
+./script.sh
+
+# With arguments
+./script.sh --option value
+```
+
+## Requirements
+
+## Examples
+
+## Configuration
+
+## Notes
+
+## Related
--- a/Templates/service-template.md
+++ b/Templates/service-template.md
@@ -0,0 +1,44 @@
+---
+service:
+  name: ""
+  type: docker|vm|host
+  url: ""
+  category: media|infrastructure|development|storage|identity|automation
+  status: active|inactive|maintenance
+  docker_image: ""
+  port: ""
+  nfs_mount: ""
+  depends_on: []
+  created: 2026-01-06
+  updated: 2026-01-06
+---
+
+# Service:
+
+## Overview
+
+## Configuration
+
+### Docker Compose
+```yaml
+
+```
+
+### Environment Variables
+```yaml
+
+```
+
+## Dependencies
+
+## Health Checks
+
+## Maintenance
+
+### Backup
+
+### Update Procedure
+
+## Troubleshooting
+
+## Related
--- a/Templates/task-template.md
+++ b/Templates/task-template.md
@@ -0,0 +1,28 @@
+---
+task:
+  project: 
+  status: pending|in-progress|completed|blocked
+  priority: high|medium|low
+  assignee: 
+  created: 
+  due: 
+---
+
+# Task: 
+
+## Description
+
+## Requirements
+
+## Implementation Notes
+
+## Checklist
+- [ ] Step 1
+- [ ] Step 2
+- [ ] Step 3
+
+## Notes
+
+## Related
+- Related Task:
+- Related File:
--- a/ai-assistant/host-context.md
+++ b/ai-assistant/host-context.md
@@ -0,0 +1,44 @@
+# Host Context Detection
+
+## Overview
+
+Detects which host's filesystem this repository clone represents, enabling AI agents to understand their operational context.
+
+## Quick Reference
+
+| Host | IP | Context | Agent | Port |
+|------|-----|---------|-------|------|
+| **ubuntu** | 192.168.50.61 | ubuntu | OpenCode | 4096 |
+| **grizzley** | 192.168.50.84 | grizzley | Hermes | 8644 |
+| **ice** | 192.168.50.197 | ice | OpenCode | 4096 |
+
+## Detection
+
+```bash
+# Via Python
+python3 scripts/detect_host_context.py
+
+# Via Shell
+source scripts/load-host-context.sh
+```
+
+## Files
+
+- `.host-context` — Context marker per host (gitignored)
+- `scripts/detect_host_context.py` — Python detector
+- `scripts/load-host-context.sh` — Shell loader
+
+## Agent Integration
+
+| Agent | Harness | Context Detection |
+|-------|---------|-------------------|
+| OpenCode | systemd | `.opencode/opencode.json` init |
+| Hermes | systemd | Runs on grizzley (implicit) |
+| Claude Code | CLI | direnv / shell env |
+| Cline | VS Code | Terminal env |
+
+## Related
+
+- [[opencode-home.md|OpenCode Agent]]
+- [[../automation/project.md|Automation Scripts]]
+- [[../homelab/project.md|Homelab Infrastructure]] <!-- was already there, so removed duplicate -->
--- a/ai-assistant/project.md
+++ b/ai-assistant/project.md
@@ -0,0 +1,61 @@
+---
+project:
+  name: AI Assistant Configuration
+  status: active
+  category: configuration
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-23
+  description: OpenCode agent configuration, skills, and storage workflows
+  tags: [ai, assistant, configuration, opencode]
+---
+
+# AI Assistant Configuration
+
+## OpenCode Cluster
+
+| Instance | Host | Port | Status | Updated |
+|----------|------|------|--------|---------|
+| ubuntu | 192.168.50.61 | 4096 | Active (systemd) | 2026-04-23 |
+| ice | 192.168.50.197 | 4096 | Active (systemd) | 2026-04-23 |
+| grizzley | 192.168.50.84 | 4096 | Inactive/disabled | 2026-04-23 |
+
+## Host Context Detection
+
+Each host clone has a `.host-context` file that identifies the local context.
+
+```bash
+python3 scripts/detect_host_context.py
+```
+
+See [[host-context.md|Host Context Detection]] for details.
+
+## Skills
+
+Skills are located in `.agents/skills/` and `.opencode/`:
+
+- `proxmox-management` — VM/LXC operations
+- `traefik-diagnostic` — Router/service health
+- `truenas-storage` — ZFS pool/share management
+- `authentik-sso` — SSO/OIDC configuration
+- `media-stack` — Radarr, Sonarr, Jellyfin management
+- `komodo-management` — Docker stack deployment
+- `host-power-management` — Wake-on-LAN, VM control
+- `infra-audit` — Live infrastructure verification
+
+## Workflows
+
+- [[workflows.md|VM Storage Policy]] — Storage rules for application data on Ubuntu host
+
+## Related
+
+- [[../automation/|Automation Scripts]]
+- [[../platform-config/|Platform Config]]
+
+## Tasks
+```dataview
+TASK
+FROM "ai-assistant/tasks"
+WHERE !completed
+SORT file.name ASC
+```
--- a/ai-assistant/workflows.md
+++ b/ai-assistant/workflows.md
@@ -0,0 +1,64 @@
+---
+project:
+  name: VM Storage Policy
+  status: active
+  category: configuration
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Storage rules for application data on the Ubuntu host (192.168.50.61)
+  tags: [documentation, storage, policy]
+---
+
+# VM Storage Policy for Application Data
+
+All agents and developers managing services on the Ubuntu host (192.168.50.61) MUST follow these storage rules.
+
+## Rule 1: User-Uploaded Data on NFS
+
+Store ALL user-uploaded data on TrueNAS NFS shares, NOT on the VM's local disk.
+
+**Allowed NFS Paths:**
+- `/mnt/PersonalMediaLibrary/` — Personal media, photos (Immich)
+- `/mnt/truenas/mediadata/` — Media library (Movies, TV, Music)
+- `/mnt/truenas-backup/` — Backups
+
+**Examples:**
+```yaml
+volumes:
+  - /mnt/PersonalMediaLibrary/immich/upload:/usr/src/app/upload
+  - /mnt/truenas/mediadata/media:/media
+```
+
+## Rule 2: Config Files on VM
+
+Configuration files, databases, and cached data CAN stay on VM local disk.
+
+**Allowed Local Paths:**
+- `/home/bear/homelab/ubuntu/{service}/` — Docker compose and config
+- `./config`, `./cache` (relative to docker-compose) — Config/cache directories
+
+## Rule 3: NFS Mounts Must Be in fstab
+
+Before using an NFS path in docker-compose, verify it exists in `/etc/fstab` for persistence.
+
+```bash
+cat /etc/fstab | grep nfs
+```
+
+## Summary
+
+| Data Type | Storage Location | Example |
+|-----------|-----------------|---------|
+| User uploads | NFS (TrueNAS) | Photos, media |
+| App config | VM local | docker-compose.yml, config/ |
+| Databases | VM local (postgres-shared) | PostgreSQL, Redis |
+| Media library | NFS (TrueNAS) | Movies, TV, Music |
+| Backups | NFS (TrueNAS) | Application backups |
+
+---
+
+## Related
+
+- [[project.md|AI Assistant Project]]
+- [[../../homelab/architecture.md|Homelab Architecture]]
--- a/automation/project.md
+++ b/automation/project.md
@@ -0,0 +1,34 @@
+---
+project:
+  name: Automation Scripts
+  status: active
+  category: automation
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Maintenance, deployment, and operational automation scripts
+  tags: [automation, scripts, homelab]
+---
+
+# Automation Scripts
+
+## Overview
+
+Maintenance, deployment, and operational automation scripts for homelab management.
+
+## Components
+
+- [[scripts.md|Scripts Documentation]] — Complete scripts overview
+
+## Related Projects
+
+- [[../homelab/|Homelab Infrastructure]] — Target for automation
+- [[../platform-config/|Platform Config]] — Deployment target
+
+## Tasks
+```dataview
+TASK
+FROM "automation/tasks"
+WHERE !completed
+SORT file.name ASC
+```
--- a/automation/scripts.md
+++ b/automation/scripts.md
@@ -0,0 +1,63 @@
+---
+project:
+  name: Automation Scripts
+  status: active
+  category: automation
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Maintenance, deployment, and operational automation scripts
+  tags: [automation, scripts, homelab, maintenance]
+---
+
+# Automation Scripts
+
+Maintenance, deployment, and operational automation scripts for homelab management.
+
+## Script Categories
+
+### Homelab Scripts (`scripts/homelab/`)
+
+| Script | Purpose |
+|--------|---------|
+| `deploy-service.py` | Deploy services to remote hosts |
+| `detect-drift.py` | Detect config drift between repo and hosts |
+| `drift_detector.py` | SSH-based container state comparison |
+| `generate-context.py` | Generate context for AI assistants |
+| `collect-host-inventory.py` | Collect host inventory information |
+| `validate_catalog.py` | Validate catalog consistency |
+
+### Authentik Scripts (`scripts/authentik/`)
+
+Scripts for managing Authentik identity provider: OAuth2/OIDC providers, group bindings, branding, and SSO configuration.
+
+### Maintenance Scripts (`scripts/maintenance/`)
+
+| Script | Purpose |
+|--------|---------|
+| `fix-permissions.py` | Fix file and directory permissions |
+| `fix-truenas-permissions.py` | Fix TrueNAS permissions |
+
+### Ansible Playbooks (`ansible/`)
+
+| Playbook | Purpose |
+|----------|---------|
+| `sync-configs.yml` | Pull/push docker-compose configs |
+| `deploy-services.yml` | Restart Docker services |
+| `sync-opencode.yml` | Push OpenCode configurations |
+| `ping.yml` | Test connectivity to all hosts |
+
+## Host Configuration
+
+| Host | IP | Path | Purpose |
+|------|-----|------|---------|
+| ubuntu | 192.168.50.61 | homelab/ubuntu | Primary Docker host |
+| grizzley | 192.168.50.84 | homelab/grizzley | Edge ingress |
+| ice | 192.168.50.197 | homelab/ice | Control plane |
+| truenas | 192.168.50.12 | homelab/truenas | Storage host |
+| pve | 192.168.50.11 | homelab/proxmox | Hypervisor |
+
+## Related
+
+- [[project.md|Automation Project]]
+- [[../homelab/architecture.md|Homelab Architecture]]
--- a/bachelor-party/data-sources.md
+++ b/bachelor-party/data-sources.md
@@ -0,0 +1,110 @@
+---
+title: Bachelor Party — Data Sources
+type: concept
+tags: [bachelor-party, data, price-tracking]
+created: 2026-05-04
+updated: 2026-05-04
+confidence: high
+---
+
+# Bachelor Party Price Data — Two-Agent Source System
+
+> **IMPORTANT:** There are TWO independent price-scraping agents that BOTH write to the same `history.jsonl`. They must be treated as distinct sources to avoid confusion about data provenance. Chris's local Codex agent on MacBook and the Hermes agent on grizzley both scrape prices independently.
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    price-watch/history.jsonl                        │
+│          (authoritative price log — single source of truth)         │
+└─────────────────────┬───────────────────────────────────────────────┘
+                      │ read on server restart / vote reload
+                      ↓
+┌─────────────────────────────────────────────────────────────────────┐
+│                    seed-data.js → votes.json                        │
+│          (merged into voting app on restart or reload)              │
+└─────────────────────┬───────────────────────────────────────────────┘
+                      │ serves
+                      ↓
+┌─────────────────────────────────────────────────────────────────────┐
+│              cabo-vote.local.tophermayor.com :3001                  │
+│                    (voting app — live poll results)                 │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Source 1: MacBook Air — Codex Agent (Local)
+
+| Property | Detail |
+|----------|--------|
+| **Host** | Chris's MacBook Air (local, not on homelab network) |
+| **Agent** | OpenAI Codex CLI (`codex`) |
+| **Method** | Native Computer Use Plugin — browser automation via Codex's built-in browser tool |
+| **Schedule** | Every 4 hours |
+| **Output** | Writes to `price-watch/history.jsonl` in the local repo clone |
+| **Delivery** | Sends daily email report via AgentMail to toph.homelab@gmail.com |
+| **Repo** | Clone of bachelor-party repo on MacBook at `~/hermes/bachelor_party/` |
+
+**Key distinction:** Uses Codex's native browser automation (Computer Use Plugin). Runs locally on Chris's machine. When it scrapes, it writes directly to the local `history.jsonl` file. That file must be pushed to Git and pulled on the server for the voting app to see it.
+
+## Source 2: grizzley — Hermes Agent (Remote/OpenComputerUse)
+
+| Property | Detail |
+|----------|--------|
+| **Host** | grizzley (192.168.50.84) — Raspberry Pi 5 |
+| **Agent** | Hermes Agent (autonomous AI) |
+| **Method** | OpenComputerUse project — browser automation via Hermes browser tool |
+| **Schedule** | Daily at 8:00 AM (`0 8 * * *`) |
+| **Output** | Writes to `price-watch/history.jsonl` in the repo clone on grizzley |
+| **Delivery** | Sends Telegram report to topic 1054 "Bachelor Party" in AigentZeroHermes |
+| **Repo** | Clone of bachelor-party repo on grizzley at `/home/bear/hermes/bachelor_party/` |
+
+**Cron job name:** `Cabo Bachelor Party Price Tracker — Flights, Hotels, Golf, Clubs & Excursions`
+**Job ID:** `1a9f519189fb`
+
+**Key distinction:** Uses the OpenComputerUse project (a browser automation framework) via Hermes's browser tool. Runs on the homelab Pi 5. Same output file, independent run.
+
+## Why Two Sources?
+
+Chris runs Codex locally on his MacBook as a lightweight always-on agent using his own API billing. The Hermes agent on grizzley is the "official" homelab agent that runs on the cluster's schedule and delivers to Telegram. Both are independent browsers hitting the same travel sites.
+
+The risk is **data collision** — if both agents write to `history.jsonl` without coordination, entries can get interleaved or overwritten. The `history.jsonl` format (newline-delimited JSON) is append-oriented, so interleaving is the expected behavior — but this means a single run's report may have gaps if another agent's run truncated the file.
+
+## Data Flow Into the Voting App
+
+```
+history.jsonl (price points)
+        │
+        │  manual step or server restart
+        ↓
+seed-data.js (hardcoded prices via buildSeedData())
+        │
+        │  mergeSeedData() on server restart
+        ↓
+votes.json (authoritative app data)
+        │
+        │  API: GET /api/options, GET /api/categories
+        ↓
+cabo-vote.local.tophermayor.com :3001
+```
+
+The app does **not** read `history.jsonl` directly. Prices from `history.jsonl` must be manually promoted into `seed-data.js` (by editing the `buildSeedData()` function), then the server must be restarted to reload.
+
+## How to Identify Which Agent Wrote an Entry
+
+Each JSON line in `history.jsonl` has a `checkedAt` timestamp. Entries from the **MacBook Codex agent** will have `source: "computer-use"` or similar in the metadata if the agent tagged them. Entries from **Hermes on grizzley** will come from the OpenComputerUse run and may have different formatting.
+
+If entries are mixed and it's unclear which agent produced them, check:
+- **MacBook:** timestamps aligned with local MacBook timezone (PT), typically every 4 hours
+- **grizzley:** timestamps aligned with the cron schedule (8 AM daily), in the Pi's timezone
+
+## Reconciling Conflicting Prices
+
+If the two sources report different prices for the same item:
+1. Both are valid — they may have scraped at different times on different days
+2. Use the **most recent** `checkedAt` timestamp as the authoritative current price
+3. If timestamps are the same day, average them or flag for manual review
+
+## Related
+
+- [[bachelor-party/project|Project Overview]]
+- [[bachelor-party/voting-app|Voting App]] — deployment and data flow
--- a/bachelor-party/project.md
+++ b/bachelor-party/project.md
@@ -0,0 +1,48 @@
+---
+project:
+  name: Cabo Bachelor Party
+  status: active
+  category: personal
+  source: live-verification
+  created: 2026-04-30
+  updated: 2026-05-04
+  description: Bachelor party planning — San José del Cabo Feb 2-7, 2027 for 14 guests. Price tracking, voting app, and trip coordination.
+  tags: [bachelor-party, travel, cabo, planning]
+---
+
+# Cabo Bachelor Party — Project Overview
+
+**Trip:** Feb 2-7, 2027 | **Group:** 14 guests | **Destination:** San José del Cabo, Mexico
+
+## Destination Shortlist
+
+| Destination | Flight/Person | Flight/Group | Hotel Est. | Notes |
+|-------------|--------------|-------------|------------|-------|
+| Cabo San Lucas (SJD) | $338 | $4,732 | ~$200/night | Best party vibe, nonstop |
+| Maui (OGG) | $478 | $6,692 | $355-$644/night | 5h40m flight, free live music |
+| Cancun (CUN) | $524 | $7,336 | TBD | Coco Bongo $55 all-in |
+| Mazatlan (MZT) | $381 | $5,334 | TBD | BEST BUDGET — 2-stop rough travel |
+
+## Services
+
+| Service | Host | URL | Notes |
+|---------|------|-----|-------|
+| Voting App | ubuntu | cabo-vote.local.tophermayor.com | Real-time poll results |
+| Price Sheet | Google | [Link](https://docs.google.com/spreadsheets/d/1ZR6KXfdBwtbgtgKypvNZmkSS154gM1pDt0dP5RH0wIo/edit) | Live price tracker |
+
+## Price Data
+
+- **Current data:** Apr 30-May 04, 2026 (seed v5)
+- **Google Sheet:** [Cabo Bachelor Party — Price Tracker](https://docs.google.com/spreadsheets/d/1ZR6KXfdBwtbgtgKypvNZmkSS154gM1pDt0dP5RH0wIo/edit)
+- **Data sources:** See [[bachelor-party/data-sources|Data Sources]]
+
+## Key Links
+
+- [Costco Travel — Cabo Packages](https://www.costcotravel.com/Vacation-Packages/Mexico/Los-Cabos)
+- [Apple Vacations — Cabo](https://www.applevacations.com/destinations/cabo-san-lucas)
+- [KAYAK Flights — LAX to SJD](https://www.kayak.com/flights)
+
+## Related
+
+- [[bachelor-party/data-sources|Data Sources]] — Two-agent price tracking system
+- [[bachelor-party/voting-app|Voting App]] — cabo-vote deployment and data flow
--- a/bachelor-party/voting-app.md
+++ b/bachelor-party/voting-app.md
@@ -0,0 +1,89 @@
+---
+title: Cabo Vote — Voting App
+type: concept
+tags: [bachelor-party, voting-app, deployment]
+created: 2026-04-30
+updated: 2026-05-04
+confidence: high
+---
+
+# Cabo Vote — Bachelor Party Voting App
+
+Real-time polling app for the Cabo bachelor party. Tracks votes and live prices.
+
+## Service Details
+
+| Property | Value |
+|----------|-------|
+| **Host** | ubuntu (192.168.50.61) |
+| **Container** | `reccollection-backend-local` (Node.js) |
+| **Port** | 3001 |
+| **URL** | cabo-vote.local.tophermayor.com |
+| **Traefik** | ubuntu:8080 → backend:3001 |
+| **Data dir** | `/home/bear/RecCollection/data` (bind mount) |
+| **Data file** | `votes.json` |
+
+## Data Flow
+
+```
+price-watch/history.jsonl       ← scraped prices (two sources)
+        │                       ← manual promotion
+seed-data.js                   ← hardcoded via buildSeedData()
+        │                       ← server restart / reload
+votes.json                     ← app's authoritative data store
+        │
+        ├──→ server.js         ← Express API + WebSocket
+        │
+        └──→ cabo-vote.local   ← served as static web app
+```
+
+## API Endpoints
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/categories` | All poll categories |
+| GET | `/api/options` | All options with votes and prices |
+| GET | `/api/results` | Aggregated vote counts |
+| GET | `/` | Static web UI |
+
+## Key Files
+
+| File | Purpose |
+|------|---------|
+| `voting_app/server.js` | Express server, WebSocket, API routes |
+| `voting_app/seed-data.js` | Hardcoded seed prices, `buildSeedData()` |
+| `voting_app/data/votes.json` | Persisted votes and prices |
+| `voting_app/price-watch/history.jsonl` | Raw scraped price history |
+| `voting_app/price-watch/latest-report.md` | Most recent scrape report |
+
+## Price Update Process
+
+1. **Scrape** — Agent (MacBook Codex or grizzley Hermes) scrapes travel sites → `history.jsonl`
+2. **Promote** — Manual step: extract prices from report, edit `buildSeedData()` in `seed-data.js`
+3. **Restart** — Server reloads: `mergeSeedData()` merges new seed with preserved votes → `votes.json`
+4. **Serve** — Fresh prices appear in the UI and API
+
+The `mergeSeedData()` function preserves existing votes when reloading — only the price data from `buildSeedData()` is refreshed.
+
+## Seed Version
+
+Current: **v5** (April 29, 2026)
+
+## Deployment
+
+The app is not containerized separately — it runs as the `reccollection-backend-local` container on ubuntu. It was originally a RecCollection app repurposed for the bachelor party.
+
+To check the container:
+```bash
+ssh bear@ubuntu "docker ps | grep reccollection"
+```
+
+To restart (reload prices from votes.json):
+```bash
+ssh bear@ubuntu "docker restart reccollection-backend-local"
+```
+
+## Related
+
+- [[bachelor-party/project|Project Overview]]
+- [[bachelor-party/data-sources|Data Sources]] — Two-agent price tracking
--- a/daily/2026-04-27-morning-briefing.md
+++ b/daily/2026-04-27-morning-briefing.md
@@ -0,0 +1,49 @@
+---
+type: daily-briefing
+date: 2026-04-27
+generated: 2026-04-27T20:03:39.416092+00:00
+---
+
+# Morning Briefing — 2026-04-27
+
+_Auto-generated by Hermes cron. Queries run at 06:00 UTC._
+
+## Pending tasks
+
+- [Templates/task-template.md] (score:0.59) --- task:   project:    status: pending|in-progress|completed|blocked   priority: high|medium|low   assignee:    created:    due:  ---  # Task:   ## Description  ## Requirements  ## Implementation Not
+- [homelabagentroot] (score:0.51) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------- | ----------- | -------------- |
+- [""] (score:0.46) --- project:   name: ""   status: planning|active|completed|archived   category: infrastructure|application|automation|configuration   source: ""   created: 2026-01-06   updated: 2026-01-06   descript
+- [homelabagentroot] (score:0.40) tags: [tasks, project-management, firewall, unifi, tracking]  **Created**: 2026-01-08 **Last Updated**: 2026-01-08 **Status**: 🟡 In Progress  ## Project Overview  **Objective**: Implement comprehens
+- [Dashboard/project-status.md] (score:0.34) # Project Status Dashboard  ## All Projects ```dataview TABLE project.status AS Status,        project.category AS Category,       file.cday AS Created FROM "**/project.md" SORT project.status ASC, pr
+
+## Recent failures
+
+- [live-verification] (score:0.33) timeout: 3s       timeout: 3s  - type: monitor   title: Infrastructure   style: compact   sites:     - title: Traefik       url: https://traefik.local.tophermayor.com/dashboard/       timeout: 2
+- [live-verification] (score:0.30) → Promtail (Docker socket SD)  ### Alerting  - **Prometheus alert rules** → Alertmanager → Hermes webhook → Telegram - **Hermes cron jobs**: Health Check (15m), Container Monitor (30m), Maintenanc
+- [live-verification] (score:0.30) | grizzley | 192.168.50.84 | Edge Ingress | 14 containers, hermes-dashboard.service |   ## Services by Category  ### Media Jellyfin, Radarr, Sonarr, Lidarr, Prowlarr, Jellyseerr, qBittorrent, SABnzbd,
+- [homelabagentroot] (score:0.24) - `92c1b619-ef7e-4b74-aaca-e57851abe962` `MBA VPN to Management` - `3b64e36a-a452-4ab0-96b5-6088efb2330c` `Vpn to IoT`  ## Rollback Steps  If the `Family of D.` cutover needs to be reversed before the
+- [live-verification] (score:0.24) ### Monitoring Ollama, Gitea, Faster Whisper Server, Docker OSX, Qdrant, Registry  ### AI Applications AI Job Pipeline, AI Alert Aggregator, AI Media Intelligence, AI Subscriptions, Homelab Inventory
+
+## Infrastructure changes
+
+- [homelabagentroot] (score:0.39) - Confirm access to hosted services such as `traefik-lxc` and `adguard`  - Restore previous interface config and reservation  ### Ubuntu  Target intent: normalize around `192.168.50.61`  - Verify SSH
+- [homelabagentroot] (score:0.36) - `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168.50.61` and `192.168.30.61` - `grizzley` Wi-Fi config was removed, leaving wired server-s
+- [homelabagentroot] (score:0.35) - update stale controller/client observations so UniFi no longer shows the old `192.168.1.61` path as active after the host-side removal  Still pending for full Grizzley and Ice normalization:  - al
+- [homelabagentroot] (score:0.34) - `Management` now maps only to `Default` - legacy `192.168.1.x` removed from:   - `ubuntu`   - `proxmox`   - `truenas` - Wi-Fi removed from:   - `grizzley`   - `ice` - staging `192.168.40.x` removed
+- [homelabagentroot] (score:0.34) - `92c1b619-ef7e-4b74-aaca-e57851abe962` `MBA VPN to Management` - `3b64e36a-a452-4ab0-96b5-6088efb2330c` `Vpn to IoT`  ## Rollback Steps  If the `Family of D.` cutover needs to be reversed before the
+
+## Ongoing projects
+
+- [Templates/task-template.md] (score:0.34) --- task:   project:    status: pending|in-progress|completed|blocked   priority: high|medium|low   assignee:    created:    due:  ---  # Task:   ## Description  ## Requirements  ## Implementation Not
+- [homelabagentroot] (score:0.34) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------- | ----------- | -------------- |
+- [""] (score:0.31) --- project:   name: ""   status: planning|active|completed|archived   category: infrastructure|application|automation|configuration   source: ""   created: 2026-01-06   updated: 2026-01-06   descript
+- [live-verification] (score:0.30) - [[project.md|Automation Project]]
+- [homelabagentroot] (score:0.30) | C-007 | Add firewall slash commands          | 2026-01-08 |  | ID    | Task                    | Priority | | ----- | ----------------------- | -------- | | B-001 | Create video tutorial   | Medium
+
+## Agent context
+
+- [homelabagentroot] (score:0.33) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------- | ----------- | -------------- |
+- [Dashboard/project-status.md] (score:0.32) # Project Status Dashboard  ## All Projects ```dataview TABLE project.status AS Status,        project.category AS Category,       file.cday AS Created FROM "**/project.md" SORT project.status ASC, pr
+- [""] (score:0.32) --- project:   name: ""   status: planning|active|completed|archived   category: infrastructure|application|automation|configuration   source: ""   created: 2026-01-06   updated: 2026-01-06   descript
+- [live-verification] (score:0.30) |---------|-----|-------------| | **Authentik Server** | `auth.tophermayor.com` | SSO identity provider (2025.2) | | **Authentik Worker** | — | Background tasks | | **Authentik Redis** | — | Session
+- [homelabagentroot] (score:0.29) tags: [tasks, project-management, firewall, unifi, tracking]  **Created**: 2026-01-08 **Last Updated**: 2026-01-08 **Status**: 🟡 In Progress  ## Project Overview  **Objective**: Implement comprehens
--- a/daily/2026-04-28-morning-briefing.md
+++ b/daily/2026-04-28-morning-briefing.md
@@ -0,0 +1,49 @@
+---
+type: daily-briefing
+date: 2026-04-28
+generated: 2026-04-28T13:00:51.085852+00:00
+---
+
+# Morning Briefing — 2026-04-28
+
+_Auto-generated by Hermes cron. Queries run at 06:00 UTC._
+
+## Pending tasks
+
+- [Templates/task-template.md] (score:0.59) --- task:   project:    status: pending|in-progress|completed|blocked   priority: high|medium|low   assignee:    created:    due:  ---  # Task:   ## Description  ## Requirements  ## Implementation Not
+- [daily/2026-04-27-morning-briefing.md] (score:0.53) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by Hermes cron. Queries run at 06:00 UTC._  ## Pending tasks
+- [homelabagentroot] (score:0.51) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------- | ----------- | -------------- |
+- [https://forgecode.dev/blog/benchmarks-dont-matter/] (score:0.49) The problem is not that the model cannot solve the task. The problem is that a brilliant but meandering trajectory times out just as definitively as an incorrect one.  ## Failure Mode 6: Planning to
+- [""] (score:0.46) --- project:   name: ""   status: planning|active|completed|archived   category: infrastructure|application|automation|configuration   source: ""   created: 2026-01-06   updated: 2026-01-06   descript
+
+## Recent failures
+
+- [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.41) - Vertex AI: Model Garden 5xx errors persisted until 18:18 PDT  This demonstrates how cascading failures create recovery debt that extends far beyond the initial fix.  ## 8. Wrap Up  At 10:50 AM a bu
+- [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.33) | 17:10 | Google update | Dataflow fully resolved except us-central1 | | 18:18 | Google final | Vertex AI Online Prediction fully recovered, all clear | | 18:27 | Google postmortem | Internal investig
+- [live-verification] (score:0.33) timeout: 3s       timeout: 3s  - type: monitor   title: Infrastructure   style: compact   sites:     - title: Traefik       url: https://traefik.local.tophermayor.com/dashboard/       timeout: 2
+- [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.32) --- type: agent-doc agent: ForgeCode source: https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/ scraped: 2026-04-28T09:24:05.222674+00:00 content_hash: 263dda8e --- # When Google Sneezes, the
+- [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.32) ## 5. Lessons for Engineers  1. Control plane failures hurt more than data plane faults. Data replication across zones cannot save you if auth is down. 2. Check hidden dependencies. Cloudflare is m
+
+## Infrastructure changes
+
+- [https://opencode.ai/docs/config/] (score:0.46) ```  You can place your config in a couple of different locations and they have a different order of precedence.  Configuration files are merged together, not replaced. Settings from the following con
+- [daily/2026-04-27-morning-briefing.md] (score:0.39) - [homelabagentroot] (score:0.36) - `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168.50.61` and `192.168.30.61` - `grizzley` Wi-Fi config
+- [homelabagentroot] (score:0.39) - Confirm access to hosted services such as `traefik-lxc` and `adguard`  - Restore previous interface config and reservation  ### Ubuntu  Target intent: normalize around `192.168.50.61`  - Verify SSH
+- [homelabagentroot] (score:0.36) - `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168.50.61` and `192.168.30.61` - `grizzley` Wi-Fi config was removed, leaving wired server-s
+- [homelabagentroot] (score:0.35) - update stale controller/client observations so UniFi no longer shows the old `192.168.1.61` path as active after the host-side removal  Still pending for full Grizzley and Ice normalization:  - al
+
+## Ongoing projects
+
+- [https://forgecode.dev/blog/ai-agent-best-practices/] (score:0.50) - Re-index your project after major changes to avoid hallucinations - Use Context7 MCP to stay synced with latest documentation - Treat AI output like junior dev PRs review everything  What Doesn't Wo
+- [https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/] (score:0.46) 2. Bug Finding & Fixing (5 tasks): Real bugs with reproduction steps and failing tests 3. Feature Implementation (4 tasks): New functionality from clear requirements 4. Frontend Refactor (2 tasks): U
+- [https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/] (score:0.44) - Introduced hardcoded values to make tests pass - Average resolution time: 22 minutes (when successful)  ## Feature Implementation: Autonomous Development Capability  ### Task Completion Analysis
+- [https://forgecode.dev/blog/coding-agents-showdown/] (score:0.43) ### Where Forks Excel  Large-Scale Refactoring For migrations like React class components to hooks across 50+ files, Cursor's agent mode can handle a broad transformation while maintaining context
+- [https://forgecode.dev/docs/custom-rules-guide/] (score:0.41) ## What Are Project-Specific Guidelines?  Project-specific guidelines are persistent instructions that get injected into every AI conversation. Think of them as your team's development constitution
+
+## Agent context
+
+- [https://forgecode.dev/docs/zsh-support/] (score:0.39) ``` :new ```  This clears the conversation history and starts fresh. The active agent stays the same.  You can also pass a prompt directly — :new starts the fresh conversation and sends it in one st
+- [daily/2026-04-27-morning-briefing.md] (score:0.37) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by Hermes cron. Queries run at 06:00 UTC._  ## Pending tasks
+- [https://opencode.ai/docs/tui/] (score:0.37) ``` /redo ```  Keybind: ctrl+x r  ---  ### sessions  List and switch between sessions. Aliases: /resume, /continue  ``` /sessions ```  Keybind: ctrl+x l  ---  ### share  Share current session. Learn
+- [https://opencode.ai/docs/sdk/] (score:0.36) |---|---|---| | session.list() | List sessions | Returns Session[] | | session.get({ path }) | Get session | Returns Session | | session.children({ path }) | List child sessions | Returns Session[] |
+- [daily/2026-04-27-morning-briefing.md] (score:0.35) - [homelabagentroot] (score:0.34) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------
--- a/daily/2026-04-29-end-of-day.md
+++ b/daily/2026-04-29-end-of-day.md
@@ -0,0 +1,140 @@
+---
+type: daily-briefing
+date: 2026-04-29
+generated: 2026-04-29T15:58:32.709573+00:00
+variant: end-of-day
+---
+
+# End of Day Brief — 2026-04-29
+
+_Auto-generated by Hermes cron. Runs at 8pm PDT (03:00 UTC)._
+
+## Git Commits (last 24h)
+
+- `8812be0` [infra] Add shared skills directory for cross-host Hermes agent (ice, 2026-04-29 08:42)
+- `22b2b1c` llm-wiki: document homepage entity — dual instances, 60+ services, all widgets (ice, 2026-04-28 23:34)
+- `c443411` llm-wiki: update all host entities with live SSH configuration data (ice, 2026-04-28 23:28)
+- `81a1e00` llm-wiki lint: fix 46 broken wikilinks, expand taxonomy (ice, 2026-04-28 23:09)
+- `7570369` llm-wiki: delete IoT plan (archived to homelab/raw/articles/) (ice, 2026-04-28 22:56)
+- `308334d` llm-wiki: add queries index, gitignore stale vault files, update log (ice, 2026-04-28 22:52)
+- `216a98e` remove stale vault files (AGENTS, opencode configs, ai-assistant, automation, platform-config) (ice, 2026-04-28 22:45)
+- `3044609` test: trigger ubuntu deploy (ice, 2026-04-28 21:51)
+- `ed06f78` [vault] Complete vault-sync-enforcement milestone (ice, 2026-04-28 21:47)
+- `6da0f7c` [vault] LLM Wiki restructuring — Phase 2: three-layer structure, forge/opencode to raw, agent memory to .hermes (ice, 2026-04-28 16:14)
+- `830461e` wiki: update wiki-sync scripts to point to obsidian-vault (ice, 2026-04-28 12:13)
+- `4a34382` wiki: migrate Karpathy LLM wiki into obsidian-vault (ice, 2026-04-28 12:12)
+- `75eaefe` [ubuntu] gitea-runner: env_file for webhook secret, add .env.example (ice, 2026-04-28 09:48)
+- `1cf89af` [ubuntu] sync-configs.sh v5.1: .env.example fallback in verify step (ice, 2026-04-28 08:58)
+- `c2598dd` [ubuntu+grizzley+ice] Add GitOps runner + sync guard rails v5 (ice, 2026-04-28 08:55)
+_... and 2 more commits_
+
+## Docker Containers
+
+### ice
+  - camofox | Up 13 days
+### grizzley
+  - aiostreams | Up 2 days (healthy)
+  - aiometadata | Up 2 days (healthy)
+  - aiomanager | Up 2 days (healthy)
+  - komodo | Up 2 days (healthy)
+  - traefik-pi | Up 13 hours (healthy)
+  - aiomanager_db | Up 2 days (healthy)
+  - komodo-mongo | Up 2 days
+  - aiometadata-redis | Up 2 days (healthy)
+  - uptime-kuma | Up 2 days (healthy)
+  - homepage-grizzley | Up 2 days (healthy)
+  - vaultwarden | Up 2 days (healthy)
+  - jellyfin | Up 2 days (healthy)
+### ubuntu
+  - infisical-backend | Up 19 hours
+  - infisical-db | Up 19 hours (healthy)
+  - infisical-redis | Up 19 hours
+  - comparaison | Up 22 hours
+  - gitea-runner | Up 23 hours
+  - reccollection-frontend-local | Up 33 hours (healthy)
+  - reccollection-backend-local | Up 33 hours (healthy)
+  - reccollection-postgres-local | Up 33 hours (healthy)
+  - ai-subscriptions | Up 40 hours (healthy)
+  - rustfs | Up 2 days
+  - seerr | Up 2 days (healthy)
+  - gsd-computer-use | Up 2 days (healthy)
+  - unified-media-manager-frontend-1 | Up 4 days
+  - unified-media-manager-backend-1 | Up 4 days (healthy)
+  - lazylibrarian | Up 5 days
+  - ombi | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v13-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v11-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v7-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v10-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v14-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v8-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v6-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v15-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v12-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v4-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v2-1 | Up 5 days
+  - unified-media-manager-ui-variants-frontend-v9-1 | Up 5 days
+  - unified-media-manager-ui-variants-dashboard-1 | Up 5 days
+  - qbittorrent | Up 5 days (healthy)
+  - sabnzbd | Up 5 days (healthy)
+  - bazarr | Up 5 days (healthy)
+  - radarr-anime | Up 5 days (healthy)
+  - prowlarr | Up 5 days (healthy)
+  - lidarr | Up 5 days (healthy)
+  - sonarr-anime | Up 5 days (healthy)
+  - sonarr | Up 5 days (healthy)
+  - readarr | Up 5 days (healthy)
+  - radarr | Up 5 days (healthy)
+  - recyclarr | Up 5 days
+  - stremio-server | Up 5 days (healthy)
+  - flaresolverr | Up 5 days
+  - nzbdav | Up 21 seconds
+  - gluetun | Up 5 days (healthy)
+  - homepage-ubuntu | Up 5 days (healthy)
+  - traefik | Up 3 days (healthy)
+  - audiobookshelf | Up 5 days (healthy)
+  - navidrome | Up 5 days (healthy)
+  - prometheus | Up 5 days
+  - grafana | Up 5 days
+  - authentik-server | Up 5 days (healthy)
+  - jellyfin | Up 5 days (healthy)
+  - authentik-worker | Up 5 days (healthy)
+  - authentik-redis | Up 5 days (healthy)
+  - ai-alert-aggregator-frontend-1 | Up 5 days
+  - ai-alert-aggregator-backend-1 | Restarting (1) 3 seconds ago
+  - musicseerr | Up 5 days (healthy)
+  - registry | Up 5 days
+  - ai-job-pipeline-frontend-1 | Up 5 days
+  - ai-job-pipeline-backend-1 | Restarting (1) 11 seconds ago
+  - ai-media-intelligence-backend-1 | Restarting (1) 1 second ago
+  - qdrant-qdrant-1 | Up 5 days
+  - calibre-web | Up 5 days (healthy)
+  - calibre | Up 5 days
+  - kavita | Up 5 days (healthy)
+  - blackbox-exporter | Up 5 days
+  - loki | Up 5 days
+  - alertmanager | Up 5 days
+  - node-exporter | Up 5 days
+  - cadvisor | Up 5 days (healthy)
+  - promtail | Up 5 days
+  - postgres-shared | Up 5 days (healthy)
+  - immich_server | Up 5 days (healthy)
+  - immich_redis | Up 5 days
+  - immich_postgres | Up 5 days
+  - immich_machine_learning | Up 5 days (healthy)
+  - gitea | Up 5 days (healthy)
+  - analyzarr | Up 5 days
+  - docker-osx | Up 5 days
+  - faster-whisper-server | Up 5 days (healthy)
+
+## Systemd Services
+
+### ice
+  - docker | running
+  - hermes-dashboard | running
+### grizzley
+  - docker | running
+  - hermes-dashboard | running
+  - hermes-gateway | running
+### ubuntu
+  - docker | running
--- a/daily/2026-04-29-morning-briefing.md
+++ b/daily/2026-04-29-morning-briefing.md
@@ -0,0 +1,49 @@
+---
+type: daily-briefing
+date: 2026-04-29
+generated: 2026-04-29T13:00:51.102878+00:00
+---
+
+# Morning Briefing — 2026-04-29
+
+_Auto-generated by Hermes cron. Queries run at 06:00 UTC._
+
+## Pending tasks
+
+- [Templates/task-template.md] (score:0.59) --- task:   project:    status: pending|in-progress|completed|blocked   priority: high|medium|low   assignee:    created:    due:  ---  # Task:   ## Description  ## Requirements  ## Implementation Not
+- [daily/2026-04-27-morning-briefing.md] (score:0.53) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by Hermes cron. Queries run at 06:00 UTC._  ## Pending tasks
+- [homelabagentroot] (score:0.51) **Remaining This Sprint**:  **Completion Rate**: 73% (8/11 tasks)  ## Milestones  | Milestone               | Target Date | Status         | | ----------------------- | ----------- | -------------- |
+- [https://forgecode.dev/blog/benchmarks-dont-matter/] (score:0.49) The problem is not that the model cannot solve the task. The problem is that a brilliant but meandering trajectory times out just as definitively as an incorrect one.  ## Failure Mode 6: Planning to
+- [daily/2026-04-28-morning-briefing.md] (score:0.48) --- type: daily-briefing date: 2026-04-28 generated: 2026-04-28T13:00:51.085852+00:00 ---  # Morning Briefing — 2026-04-28  _Auto-generated by Hermes cron. Queries run at 06:00 UTC._  ## Pending tasks
+
+## Recent failures
+
+- [homelab/concepts/monitoring-pipeline.md] (score:0.41) ## External Uptime Monitoring  - **Uptime Kuma** (grizzley:3001) — external/internal availability checks - **Blackbox Exporter** (ubuntu:9115) — 15+ HTTPS probe targets  ## Dashboards  - Grafana (ub
+- [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.41) - Vertex AI: Model Garden 5xx errors persisted until 18:18 PDT  This demonstrates how cascading failures create recovery debt that extends far beyond the initial fix.  ## 8. Wrap Up  At 10:50 AM a bu
+- [homelab/concepts/monitoring-pipeline.md] (score:0.39) - `ContainerLogError` — Container logging errors detected by Promtail - `JellyfinDown` — Jellyfin health check failed - `TraefikDown` — Traefik not responding  ## Hermes Cron Jobs  | Job | Schedule |
+- [homelab/entities/hermes-gateway.md] (score:0.39) 2. On failure: direct restart → tmux+OpenCode rescue if still down 3. Sends Telegram notification on failure to topic **1033 "Cron Jobs"** in AigentZeroHermes (`-1003820156994`)  **Telegram alert det
+- [homelab/concepts/monitoring-pipeline.md] (score:0.38) ## Hermes Gateway Watchdog  Hermes Gateway is monitored by a watchdog script on both [[ice]] and [[grizzley]]:  ``` /home/bear/hermes-gateway-watchdog.sh ```  Runs via **system cron** (not systemd u
+
+## Infrastructure changes
+
+- [https://opencode.ai/docs/config/] (score:0.46) ```  You can place your config in a couple of different locations and they have a different order of precedence.  Configuration files are merged together, not replaced. Settings from the following con
+- [daily/2026-04-28-morning-briefing.md] (score:0.41) - [https://forgecode.dev/blog/gcp-cloudflare-anthropic-outage/] (score:0.33) | 17:10 | Google update | Dataflow fully resolved except us-central1 | | 18:18 | Google final | Vertex AI Online Prediction
+- [daily/2026-04-27-morning-briefing.md] (score:0.39) - [homelabagentroot] (score:0.36) - `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168.50.61` and `192.168.30.61` - `grizzley` Wi-Fi config
+- [homelabagentroot] (score:0.39) - Confirm access to hosted services such as `traefik-lxc` and `adguard`  - Restore previous interface config and reservation  ### Ubuntu  Target intent: normalize around `192.168.50.61`  - Verify SSH
+- [daily/2026-04-28-morning-briefing.md] (score:0.37) - [daily/2026-04-27-morning-briefing.md] (score:0.39) - [homelabagentroot] (score:0.36) - `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168
+
+## Ongoing projects
+
+- [https://forgecode.dev/blog/ai-agent-best-practices/] (score:0.50) - Re-index your project after major changes to avoid hallucinations - Use Context7 MCP to stay synced with latest documentation - Treat AI output like junior dev PRs review everything  What Doesn't Wo
+- [https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/] (score:0.46) 2. Bug Finding & Fixing (5 tasks): Real bugs with reproduction steps and failing tests 3. Feature Implementation (4 tasks): New functionality from clear requirements 4. Frontend Refactor (2 tasks): U
+- [https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/] (score:0.44) - Introduced hardcoded values to make tests pass - Average resolution time: 22 minutes (when successful)  ## Feature Implementation: Autonomous Development Capability  ### Task Completion Analysis
+- [https://forgecode.dev/blog/coding-agents-showdown/] (score:0.43) ### Where Forks Excel  Large-Scale Refactoring For migrations like React class components to hooks across 50+ files, Cursor's agent mode can handle a broad transformation while maintaining context
+- [https://forgecode.dev/docs/custom-rules-guide/] (score:0.41) ## What Are Project-Specific Guidelines?  Project-specific guidelines are persistent instructions that get injected into every AI conversation. Think of them as your team's development constitution
+
+## Agent context
+
+- [daily/2026-04-28-morning-briefing.md] (score:0.46) - [daily/2026-04-27-morning-briefing.md] (score:0.37) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by He
+- [daily/2026-04-28-morning-briefing.md] (score:0.39) - [daily/2026-04-27-morning-briefing.md] (score:0.37) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by Her
+- [https://forgecode.dev/docs/zsh-support/] (score:0.39) ``` :new ```  This clears the conversation history and starts fresh. The active agent stays the same.  You can also pass a prompt directly — :new starts the fresh conversation and sends it in one st
+- [daily/2026-04-27-morning-briefing.md] (score:0.37) --- type: daily-briefing date: 2026-04-27 generated: 2026-04-27T20:03:39.416092+00:00 ---  # Morning Briefing — 2026-04-27  _Auto-generated by Hermes cron. Queries run at 06:00 UTC._  ## Pending tasks
+- [https://opencode.ai/docs/tui/] (score:0.37) ``` /redo ```  Keybind: ctrl+x r  ---  ### sessions  List and switch between sessions. Aliases: /resume, /continue  ``` /sessions ```  Keybind: ctrl+x l  ---  ### share  Share current session. Learn
--- a/entities/entity-template.md
+++ b/entities/entity-template.md
@@ -0,0 +1,21 @@
+---
+entity_id: ""
+name: ""
+type: ""  # person, project, service, host, concept
+category: ""  # homelab, work, personal
+trust_score: 0.5  # 0.0–1.0, higher = more trusted
+tags: []
+facts: []
+updated: 2026-04-27
+---
+
+# {{name}}
+
+## Overview
+
+## Key Facts
+
+## Related Entities
+
+## Source
+<!-- How did we learn this? -->
--- a/homelab/SCHEMA.md
+++ b/homelab/SCHEMA.md
@@ -0,0 +1,162 @@
+---
+title: Homelab Wiki Schema
+created: 2026-04-28
+updated: 2026-04-28
+type: meta
+tags: [meta, wiki]
+---
+
+# Wiki Schema
+
+This wiki follows [Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) — a persistent, compounding knowledge base as interlinked markdown files. Unlike RAG, knowledge is compiled once and stays current. Cross-references already exist. Contradictions are flagged.
+
+**Location:** `WIKI_PATH` env var (defaults to `~/wiki`). All hosts point to the Obsidian vault at `/home/bear/homelabagentroot/obsidian-vault`.
+
+## Directory Structure
+
+```
+obsidian-vault/          ← WIKI_PATH for all hosts
+├── SCHEMA.md            ← This file (schema, conventions)
+├── log.md               ← Append-only action log (rotate yearly)
+├── homelab/
+│   ├── entities/        ← Layer 2: host and service entities
+│   ├── concepts/        ← Layer 2: concepts, techniques, topics
+│   ├── comparisons/     ← Layer 2: side-by-side analyses
+│   ├── queries/         ← Layer 2: filed Q&A worth keeping
+│   └── raw/             ← Layer 1: immutable source material (optional)
+└── [other vault dirs]   ← ai-assistant/, automation/, agents/, etc.
+```
+
+**Layer 1 — Raw Sources:** Immutable source material (docs, configs, articles). The agent reads but never modifies these.
+**Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and cross-referenced by the agent.
+**Layer 3 — The Schema:** This file constrains agent behavior and ensures consistency.
+
+## Conventions
+
+- **File names:** lowercase, hyphens, no spaces (e.g., `ice.md`, `hermes-gateway.md`)
+- **Wikilinks:** Use `[[pagename]]` for all internal links. Minimum 2 outbound links per page.
+- **Frontmatter:** Required on every wiki page (see below).
+- **Index:** Every new page must appear in `homelab/entities/index.md` (for entities) or the relevant section index.
+- **Log:** Every action (ingest, create, update, query, lint) must be appended to `homelab/log.md`.
+- **Provenance markers:** On pages synthesizing 3+ sources, append `^[raw/articles/source-file.md]` at paragraph ends to trace claims.
+- **Confidence:** Set `confidence: medium` or `low` for opinion-heavy, fast-moving, or single-source claims. Don't mark `high` unless well-supported.
+- **Contradictions:** When new information conflicts with existing content, note both with dates/sources, set `contradictions: [page-slug]` in frontmatter, flag for review.
+- **Staleness:** Pages not updated in 90+ days with newer source info should be refreshed.
+- **Page size:** Split pages over ~200 lines into sub-topics with cross-links.
+- **Tags:** Use the taxonomy below. Add new tags here before using.
+
+## Tag Taxonomy
+
+### Hosts
+- `hosts` — physical or virtual host machines
+- `rpi` — Raspberry Pi hardware
+- `hypervisor` — VM/container hypervisors (Proxmox)
+- `nas` — network-attached storage
+- `control-plane` — primary control node (ice)
+- `edge` — edge computing node (grizzley)
+- `primary` — primary instance of a service (ubuntu as main Docker host)
+- `vm` — virtual machine workloads
+
+### Services
+- `services` — software services running on hosts
+- `networking` — network services (Traefik, DNS, VPN)
+- `media` — media streaming services (Jellyfin, Sonarr, etc.)
+- `storage` — storage services (S3, NFS, ZFS)
+- `sso` — identity/SSO services
+- `identity` — identity and authentication services
+- `git` — Git hosting and CI/CD
+- `ai` — AI/ML services
+- `gateway` — API/gateway services
+- `monitoring` — observability stack
+- `docker` — Docker containerization
+- `reverse-proxy` — reverse proxy services (Traefik)
+- `jellyfin` — Jellyfin media server
+- `traefik` — Traefik ingress controller
+- `ubuntu` — Ubuntu host services
+- `proxmox` — Proxmox hypervisor services
+- `s3` — S3-compatible object storage
+- `ci-cd` — continuous integration and deployment
+
+### Smart Home / IoT
+- `iot` — Internet of Things devices and infrastructure
+- `smart-home` — smart home automation and orchestration
+- `home-assistant` — Home Assistant platform
+- `matter` — Matter smart home protocol
+- `thread` — Thread mesh networking protocol
+- `zigbee` — Zigbee wireless protocol
+- `zigbee-device` — individual Zigbee end devices
+- `wifi-device` — Wi-Fi connected IoT devices
+- `ecosystem` — vendor/platform ecosystems (Apple Home, Google Home, Alexa)
+- `sensor` — sensor devices (motion, door, vibration)
+- `actuator` — actuators (switches, lights, locks)
+- `voice-assistant` — voice assistant platforms and devices
+- `hub` — smart home hub or coordinator hardware
+- `inventory` — device inventory and census pages
+- `vlan` — VLAN segmentation and network zoning
+- `policy` — formal placement/security/operational policies
+
+### Techniques & Roles
+- `concept` — architectural patterns, techniques
+- `runbook` — operational procedures
+- `comparison` — feature/comparison analyses
+- `automation` — automation scripts and workflows
+- `alerting` — alerting and notification systems
+- `agents` — AI agent configurations
+- `watchdog` — watchdog/monitoring patterns
+- `ha` — high availability configurations
+- `cli` — command-line tools and interfaces
+- `scripts` — shell/python scripts
+- `tools` — development and operations tools
+- `homelab` — homelab-specific infrastructure patterns
+
+### Meta
+- `meta` — wiki housekeeping (schema, log, index)
+
+## Frontmatter (Required)
+
+```yaml
+---
+title: Page Title
+created: YYYY-MM-DD
+updated: YYYY-MM-DD
+type: entity | concept | comparison | query | summary | meta
+tags: [from taxonomy above]
+sources: [raw/articles/source-name.md]   # optional, list source files
+confidence: high | medium | low         # optional
+contested: true                          # optional, set when contradictions exist
+contradictions: [page-slug]              # optional
+---
+```
+
+## Entity Pages
+
+One page per notable host or service. Include:
+- Role, IP/URL, host location
+- Overview of what it is/does
+- Key facts and relationships
+- Troubleshooting notes (known issues, gotchas)
+- Source references
+
+## Concept Pages
+
+One page per architectural pattern, technique, or topic. Include:
+- Definition/explanation
+- Current state of knowledge
+- Open questions or debates
+- Related concepts via `[[wikilinks]]`
+
+## Update Policy
+
+When new information conflicts with existing content:
+1. Check dates — newer sources generally supersede older
+2. If genuinely contradictory, note both positions with dates and sources
+3. Mark `contradictions: [page-slug]` in frontmatter
+4. Flag for user review
+
+## Page Thresholds
+
+- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source
+- **Add to existing page** when a source mentions something already covered
+- **DON'T create a page** for passing mentions, minor details
+- **Split a page** when it exceeds ~200 lines
+- **Archive a page** when fully superseded — move to `_archive/`, remove from index
--- a/homelab/architecture.md
+++ b/homelab/architecture.md
@@ -0,0 +1,362 @@
+---
+project:
+  name: Homelab Architecture
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Verified live infrastructure architecture — hosts, networks, services, storage, and routing
+  tags: [infrastructure, homelab, architecture, documentation]
+---
+
+# Homelab Infrastructure Architecture
+
+**Verified**: 2026-04-19 via live SSH and API inspection
+
+## Architecture Overview
+
+```mermaid
+graph TB
+    subgraph Internet
+        CF[Cloudflare DNS]
+    end
+
+    subgraph PVE["Proxmox VE — 192.168.50.11 (125GB RAM)"]
+        subgraph Ubuntu["ubuntu VM — 192.168.50.61 (32GB RAM, GTX 1080)"]
+            UT[Traefik v3.6.7 — Primary Ingress]
+            UMon[Prometheus + Grafana + Loki]
+            UMedia[Media Stack — 25 containers]
+            UAuth[Authentik SSO]
+            UAI[AI/Dev — Ollama, Gitea, Qdrant]
+            UImg[Immich Photos]
+        end
+        subgraph TrueNAS["TrueNAS VM — 192.168.50.12 (22GB RAM)"]
+            ZFS1["TrueNAS Pool — 25.4TB (65% used)"]
+            ZFS2["RPiPool — 10.9TB (5% used)"]
+        end
+        LXCT["LXC 102 — traefik (running)"]
+    end
+
+    subgraph Grizzley["grizzley — 192.168.50.84 (RPi 5)"]
+        GT[Traefik v3.6.7 — Edge ACME]
+        Komodo[Komodo — Stack Management]
+        Hermes[Hermes Agent — Telegram Alerts]
+        MC[Minecraft Bedrock]
+    end
+
+    subgraph Ice["ice — 192.168.50.197 (RPi 4)"]
+        OC2[OpenCode — port 4096]
+        CF2[camofox container]
+    end
+
+    subgraph Panda["panda — 192.168.30.196 / 192.168.50.196 (RPi)"]
+        HA[Home Assistant OS]
+    end
+
+    CF -->|*.tophermayor.com| UT
+    CF -->|*.tophermayor.com| GT
+    GT -->|Wildcard Certs via NFS| ZFS1
+    UT -->|NFS Media| ZFS1
+    GT -->|Proxy| UT
+    Komodo -->|files_on_host| Ubuntu
+    Komodo -->|files_on_host| Grizzley
+```
+
+---
+
+## Host Topology
+
+| Host | IP | OS | Hardware | Role | Key Services |
+|------|-----|----|----------|------|-------------|
+| **ubuntu** | 192.168.50.61 | Ubuntu 24.04.4 LTS | VM (Proxmox, 32GB RAM), NVIDIA GTX 1080 8GB | Primary Docker Host | 59 containers — Traefik, Media Stack, Immich, Authentik, Monitoring, AI/Dev |
+| **grizzley** | 192.168.50.84 | Ubuntu 25.10 | Raspberry Pi 5 | Edge Ingress | 10 containers — Traefik (ACME), Komodo, Hermes, Minecraft |
+| **ice** | 192.168.50.197 | Ubuntu 25.10 | Raspberry Pi 4 | Control Plane | OpenCode (systemd), camofox |
+| **pve** | 192.168.50.11 | Debian (Proxmox 9.1.4) | Bare metal, 125GB RAM (70GB used) | Hypervisor | VMs + LXC containers |
+| **truenas** | 192.168.50.12 | TrueNAS SCALE 25.10.2.1 | VM on PVE (22GB RAM) | Storage | ZFS pools, NFS exports |
+| **panda** | 192.168.30.196 / 192.168.50.196 | HA OS (Alpine 3.23.3) | Raspberry Pi | Home Assistant | Smart home hub, Zigbee/Z-Wave |
+
+### Proxmox VMs and LXC
+
+| VMID | Name | Status | RAM |
+|------|------|--------|-----|
+| 9001 | TrueNAS | Running | 22GB |
+| 9003 | ubuntu-server | Running | 32GB |
+| 9100 | W10-migrated | Stopped | — |
+| LXC 102 | traefik | Running | — |
+
+---
+
+## Network Topology
+
+### VLAN Segments
+
+| VLAN | Subnet | Purpose | Hosts |
+|------|--------|---------|-------|
+| **Main/Prod** | 192.168.1.x | PVE, workstations | Hyte |
+| **Lab** | 192.168.50.x | Core infrastructure | ubuntu, grizzley, ice, truenas, pve, panda SSH |
+| **IoT/Home** | 192.168.30.x | Home automation | panda/HA, Matter devices |
+
+### DNS Zones
+
+| Zone | Scope | Resolution |
+|------|-------|------------|
+| `*.tophermayor.com` | Public | Cloudflare → Traefik ingress |
+| `*.local.tophermayor.com` | Internal | Traefik routers, local services |
+| `*.pi.tophermayor.com` | Legacy | grizzley/ice services |
+
+### Traefik Ingress
+
+| Instance | Host | Role | SSL |
+|----------|------|------|-----|
+| Ubuntu Traefik | 192.168.50.61 | Primary router — handles ~90% of traffic | Cloudflare DNS challenge, certs synced from grizzley |
+| Grizzley Traefik | 192.168.50.84 | Edge ACME — primary certificate source | Cloudflare DNS challenge, certs on NFS |
+
+Entry points: `web` (80 → HTTPS redirect), `websecure` (443), `metrics` (8080)
+
+---
+
+## Service Inventory
+
+### Media Stack (ubuntu — 25 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Jellyfin** | `jellyfin.tophermayor.com` | Media streaming (GPU transcoding) |
+| **Jellyseerr** | `jellyseerr.tophermayor.com` | Request management |
+| **Sonarr** | `sonarr.local.tophermayor.com` | TV automation |
+| **Sonarr Anime** | — | Anime TV automation |
+| **Radarr** | `radarr.local.tophermayor.com` | Movie automation |
+| **Radarr Anime** | — | Anime movie automation |
+| **Lidarr** | `lidarr.local.tophermayor.com` | Music automation |
+| **Prowlarr** | `prowlarr.local.tophermayor.com` | Indexer management |
+| **Bazarr** | — | Subtitle management |
+| **qBittorrent** | — | Torrent client (via Gluetun VPN) |
+| **SABnzbd** | `sabnzbd.local.tophermayor.com` | Usenet downloader |
+| **Gluetun** | — | WireGuard VPN (NordVPN) — all media traffic routes here |
+| **Flaresolverr** | — | CAPTCHA solver |
+| **Recyclarr** | — | Quality profile sync |
+| **Analyzarr** | — | Media analysis |
+| **Stremio Server** | `stremio.local.tophermayor.com` | Stremio streaming |
+| **Tdarr** | `tdarr.local.tophermayor.com` | Media transcoding (GPU) |
+| **Navidrome** | — | Music streaming |
+| **Calibre** | — | eBook management |
+| **Calibre-Web** | — | eBook reader |
+| **Kavita** | — | Manga/comic reader |
+| **Audiobookshelf** | — | Audiobook/podcast server |
+| **LazyLibrarian** | — | Book automation |
+| **Musicseerr** | — | Music request system |
+| **Nzbdav** | — | Usenet helper |
+
+### Media Applications (ubuntu — 4 containers)
+
+| Service | Description |
+|---------|-------------|
+| **RecCollection** (backend + postgres) | Media recommendation engine |
+| **Unified Media Manager** (backend + frontend) | Unified media management |
+
+### Immich (ubuntu — 4 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Immich Server** | `immich.tophermayor.com` | Photo/video management |
+| **Immich ML** | — | Machine learning (GPU) |
+| **Immich Postgres** | — | Dedicated PostgreSQL (pgvecto-rs) |
+| **Immich Redis** | — | Caching |
+
+### Auth and SSO (ubuntu — 3 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Authentik Server** | `auth.tophermayor.com` | SSO identity provider (2025.2) |
+| **Authentik Worker** | — | Background tasks |
+| **Authentik Redis** | — | Session caching |
+
+### Monitoring (ubuntu — 8 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Prometheus** | `prometheus.local.tophermayor.com` | Metrics collection |
+| **Grafana** | `grafana.local.tophermayor.com` | Dashboards |
+| **Loki** | — | Log aggregation |
+| **Promtail** | — | Log shipping |
+| **Alertmanager** | — | Alert routing → Hermes webhook → Telegram |
+| **Blackbox Exporter** | — | HTTPS probes |
+| **Node Exporter** | — | Host metrics |
+| **cAdvisor** | — | Container metrics |
+
+Scrape targets: ubuntu (local), proxmox, truenas, grizzley, ice, panda
+
+### AI and Dev (ubuntu — 4 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Ollama** | — | Local LLM inference (GPU) |
+| **Gitea** | `gitea.tophermayor.com` | Git server (SSH: 2222) |
+| **Faster Whisper Server** | — | Speech-to-text |
+| **Docker OSX** | — | macOS VM |
+
+### AI Applications (ubuntu — 7 containers)
+
+| Service | Description |
+|---------|-------------|
+| **AI Job Pipeline** (backend + frontend) | AI task orchestration |
+| **AI Alert Aggregator** (backend + frontend + postgres) | Alert intelligence |
+| **AI Media Intelligence** (backend) | Media analysis |
+| **AI Subscriptions** | Subscription management |
+| **Homelab Inventory** (backend) | Infrastructure inventory |
+
+### Infrastructure (ubuntu — 3 containers)
+
+| Service | Description |
+|---------|-------------|
+| **Traefik** | Primary reverse proxy (v3.6.7) |
+| **Qdrant** | Vector database (port 6333) |
+| **Registry** | Docker registry |
+
+### Grizzley Services (10 containers)
+
+| Service | URL | Description |
+|---------|-----|-------------|
+| **Traefik Pi** | `traefik-grizzley.local.tophermayor.com` | Edge ingress + ACME |
+| **Homepage** | — | Dashboard |
+| **Komodo** | `komodo.local.tophermayor.com` | Docker stack management (all hosts) |
+| **Komodo Mongo** | — | Komodo database |
+| **Hermes Agent** | — | Telegram bot, monitoring, cron jobs |
+| **Vaultwarden** | `vaultwarden.tophermayor.com` | Password manager (migrated from ubuntu) |
+| **Uptime Kuma** | — | Uptime monitoring (migrated from ubuntu) |
+| **AIOMAanager** + DB | — | AI orchestration |
+| **Minecraft Bedrock** (x2) | — | UDP/19132, UDP/19134 |
+
+### Ice Services
+
+| Service | Type | Port | Status |
+|---------|------|------|--------|
+| **OpenCode** | systemd | 4096 | Active/enabled |
+| **camofox** | Docker container | — | Running |
+
+### OpenCode Cluster
+
+| Instance | Host | Port | Status |
+|----------|------|------|--------|
+| ubuntu | 192.168.50.61 | 4096 | Active |
+| ice | 192.168.50.197 | 4096 | Active |
+| grizzley | 192.168.50.84 | 4096 | Inactive/disabled |
+
+---
+
+## Database Architecture
+
+### Consolidated PostgreSQL (`postgres-shared` on ubuntu)
+
+| Database | Application |
+|----------|-------------|
+| `authentik` | Authentik SSO |
+| `gitea` | Gitea git server |
+| `vaultwarden` | Vaultwarden password manager |
+| `sonarr_main` / `sonarr_log` | Sonarr |
+| `radarr_main` / `radarr_log` | Radarr |
+| `lidarr_main` / `lidarr_log` | Lidarr |
+| `prowlarr_main` / `prowlarr_log` | Prowlarr |
+| `readarr_main` / `readarr_log` | Readarr |
+
+### Standalone Databases
+
+| Database | Application | Reason |
+|----------|-------------|--------|
+| `immich_postgres` | Immich | Requires pgvecto-rs extension |
+| `komodo-mongo` | Komodo | MongoDB |
+| `aiomanager_db` | AIOMAanager | MongoDB |
+
+### Redis Instances
+
+- `authentik-redis` → Authentik caching/session
+- `immich_redis` → Immich caching
+
+### Vector Database
+
+- **Qdrant** (`ubuntu:6333`) — shared memory backend for OpenCode cluster
+
+---
+
+## Storage Architecture
+
+### ZFS Pools (TrueNAS)
+
+| Pool | Size | Used | Datasets |
+|------|------|------|----------|
+| **TrueNAS** | 25.4TB | 65% | Media, backups, shares |
+| **RPiPool** | 10.9TB | 5% | Reserve storage |
+
+### NFS Exports
+
+| Export | Mount on Consumer | Used By |
+|--------|-------------------|---------|
+| `/mnt/truenas/mediadata` | `/mnt/truenas/mediadata` on ubuntu | Jellyfin, *Arrs, Immich uploads |
+| `/mnt/PersonalMediaLibrary` | `/mnt/PersonalMediaLibrary` on ubuntu | Immich external library |
+| `/mnt/truenas/traefik-certs/grizzley` | NFS on grizzley | Traefik TLS certificates |
+
+### Local Storage (ubuntu)
+
+| Path | Purpose |
+|------|---------|
+| `/home/bear/homelab/ubuntu/*/data/` | Service data volumes |
+| `/home/bear/homelab/ubuntu/ollama/data` | Ollama models |
+| `/home/bear/homelab/ubuntu/tdarr/temp` | Tdarr transcode temp |
+
+---
+
+## Monitoring Pipeline
+
+```
+Node Exporters (all hosts)
+    → Prometheus (ubuntu:9090)
+    → Grafana (ubuntu:3000)
+    → Alertmanager (ubuntu:9093)
+    → Hermes Webhook (grizzley:8644)
+    → Telegram (@tbd1220)
+```
+
+### Log Pipeline
+
+```
+Docker containers (ubuntu)
+    → Promtail (Docker socket SD)
+    → Loki (ubuntu:3100)
+    → Grafana dashboards
+```
+
+### Alerting
+
+- **Prometheus alert rules** → Alertmanager → Hermes webhook → Telegram
+- **Hermes cron jobs**: Health Check (15m), Container Monitor (30m), Maintenance (6h)
+- **Watchdog**: `/home/bear/watchdog/watchdog.sh` monitors SSH/HTTPS/TCP on all hosts
+
+### Uptime Monitoring
+
+- **Uptime Kuma** (grizzley) — external/internal availability checks
+- **Blackbox Exporter** — 15+ HTTPS probe targets
+
+---
+
+## SSH Quick Reference
+
+| Host | Command | User | Key |
+|------|---------|------|-----|
+| ubuntu | `ssh bear@192.168.50.61` | bear | `~/.ssh/id_ed25519` |
+| grizzley | `ssh bear@192.168.50.84` | bear | `~/.ssh/id_ed25519` |
+| ice | `ssh bear@192.168.50.197` | bear | `~/.ssh/id_ed25519` |
+| pve | `ssh bear@192.168.50.11` | bear | `~/.ssh/id_ed25519` |
+| truenas | `ssh truenas` | christopher | `~/.ssh/truenas_pve` via config |
+| panda | `ssh bear@192.168.50.196` | bear | `~/.ssh/id_ed25519` (SSH add-on) |
+
+---
+
+## Related Docs
+
+- [[project.md|Homelab Project Overview]]
+- [[dns-traefik.md|DNS and Traefik Configuration]]
+- [[proxmox-setup.md|Proxmox Setup]]
+- [[truenas-config.md|TrueNAS Configuration]]
+- [[network-config.md|Network Configuration]]
+- [[../automation/scripts.md|Automation Scripts]]
--- a/homelab/comparisons/index.md
+++ b/homelab/comparisons/index.md
@@ -0,0 +1,16 @@
+---
+title: Homelab Comparisons Index
+created: 2026-04-28
+updated: 2026-04-28
+type: index
+tags: [meta]
+---
+
+# Comparisons Index
+
+> Content catalog for homelab comparisons. Every comparison page listed with a one-line summary.
+> Last updated: 2026-04-28 | Total pages: 0
+
+## Infrastructure
+
+(no comparisons yet)
--- a/homelab/concepts/ai-applications.md
+++ b/homelab/concepts/ai-applications.md
@@ -0,0 +1,52 @@
+---
+title: AI Applications Pipeline
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, ai, services]
+sources: [../../homelab/architecture.md]
+---
+
+# AI Applications Pipeline
+
+Local AI/ML stack running on ubuntu with GPU acceleration (GTX 1080 8GB), plus AI-powered applications that use LLM inference.
+
+## Core AI Infrastructure
+
+| Service | URL | Purpose |
+|---------|-----|---------|
+| Ollama | localhost:11434 | Local LLM inference (GPU via GTX 1080) |
+| Qdrant | ubuntu:6333 | Vector database for OpenCode cluster memory |
+| Faster Whisper Server | — | Speech-to-text (Whisper) |
+
+## AI Applications (7 containers)
+
+| Application | Description |
+|-------------|-------------|
+| AI Job Pipeline (backend + frontend) | AI task orchestration |
+| AI Alert Aggregator (backend + frontend + postgres) | Alert intelligence |
+| AI Media Intelligence (backend) | Media analysis |
+| AI Subscriptions | Subscription management |
+| Homelab Inventory (backend) | Infrastructure inventory |
+
+## Immich ML
+
+| Component | Description |
+|-----------|-------------|
+| Immich Server | Photo/video management |
+| Immich ML | Machine learning on GPU |
+| Immich Postgres | Dedicated PostgreSQL (pgvecto-rs extension) |
+| Immich Redis | Caching |
+
+## OpenCode Embeddings
+
+OpenCode instances across the cluster use:
+- **Ollama** — generating embeddings for vector memory
+- **Qdrant** — storing shared vector memory across OpenCode cluster
+
+## Related
+
+- [[opencode-cluster]] — OpenCode cluster using this AI infrastructure
+- [[ubuntu]] — Hosts GPU (GTX 1080) and all AI services
+- [[jellyfin]] — Media server with AI features
+- [[../../homelab/docs/ai-applications.md]] — AI applications documentation
--- a/homelab/concepts/deployment-scripts.md
+++ b/homelab/concepts/deployment-scripts.md
@@ -0,0 +1,60 @@
+---
+title: Deployment Scripts
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, automation, homelab, scripts]
+confidence: high
+---
+
+# Deployment Scripts
+
+Maintenance, deployment, and operational automation scripts for homelab management.
+
+## Homelab Scripts (`scripts/homelab/`)
+
+| Script | Purpose |
+|--------|---------|
+| `deploy-service.py` | Deploy services to remote hosts |
+| `detect-drift.py` | Detect config drift between repo and hosts |
+| `drift_detector.py` | SSH-based container state comparison |
+| `generate-context.py` | Generate context for AI assistants |
+| `collect-host-inventory.py` | Collect host inventory information |
+| `validate_catalog.py` | Validate catalog consistency |
+
+## Authentik Scripts (`scripts/authentik/`)
+
+Scripts for managing Authentik identity provider: OAuth2/OIDC providers, group bindings, branding, and SSO configuration.
+
+## Maintenance Scripts (`scripts/maintenance/`)
+
+| Script | Purpose |
+|--------|---------|
+| `fix-permissions.py` | Fix file and directory permissions |
+| `fix-truenas-permissions.py` | Fix TrueNAS permissions |
+
+## Ansible Playbooks (`ansible/`)
+
+| Playbook | Purpose |
+|----------|---------|
+| `sync-configs.yml` | Pull/push docker-compose configs |
+| `deploy-services.yml` | Restart Docker services |
+| `sync-opencode.yml` | Push OpenCode configurations |
+| `ping.yml` | Test connectivity to all hosts |
+
+## Host Inventory
+
+| Host | IP | Repo Path | Purpose |
+|------|-----|-----------|---------|
+| ubuntu | 192.168.50.61 | homelab/ubuntu | Primary Docker host |
+| grizzley | 192.168.50.84 | homelab/grizzley | Edge ingress |
+| ice | 192.168.50.197 | homelab/ice | Control plane |
+| truenas | 192.168.50.12 | homelab/truenas | Storage host |
+| pve | 192.168.50.11 | homelab/proxmox | Hypervisor |
+
+## Related
+
+- [[hermes-opencode-cluster]] — AI agent cluster using these scripts
+- [[traefik-ha]] — Traefik ingress deployment
+- [[nfs-storage]] — TrueNAS storage management
+- [[sso-authentik]] — Authentik SSO configuration
--- a/homelab/concepts/device-placement-policy.md
+++ b/homelab/concepts/device-placement-policy.md
@@ -0,0 +1,162 @@
+---
+title: Device Placement Policy
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [iot, smart-home, concept, vlan, security, policy]
+confidence: high
+sources: [network-device-census, UniFi controller configuration]
+---
+
+# Device Placement Policy
+
+> Defines which device classes belong on which VLAN, firewall rules required for cross-VLAN access, and the rationale for each placement decision.
+
+## VLAN Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    UniFi Dream Machine                    │
+│                  192.168.50.1 (Controller)                │
+├──────────┬──────────┬───────────┬──────────┬─────────────┤
+│ VLAN 10  │ VLAN 20  │ VLAN 30  │ VLAN 50  │  Default    │
+│ Family   │ Guest    │ IoT      │ Prod     │  Mgmt       │
+│ .10.x    │ .20.x    │ .30.x    │ .50.x    │  .1.x       │
+└──────────┴──────────┴───────────┴──────────┴─────────────┘
+```
+
+## Device Class → VLAN Assignment
+
+### VLAN 10 — "Family of D." (Personal Devices)
+
+**Policy**: Trusted personal devices with full internal access. Phones, tablets, laptops, watches. No IoT devices unless they require direct phone access without firewall rules.
+
+| Device Class | Examples | Rationale |
+|-------------|----------|-----------|
+| Phones | TophPhone14 (×3) | Need access to everything |
+| Tablets | iPad | Personal use |
+| Laptops | MacBook | Personal use |
+| Watches | Apple Watch | Companion to phone |
+| Baby monitors | Eufy cameras (×3) | **Exception**: Require constant phone access; avoid firewall complexity |
+| RPi (personal) | Ice (.10.178 WiFi) | Personal use connection |
+
+### VLAN 30 — "Will of D. IoT" (Smart Home + Infrastructure)
+
+**Policy**: All IoT devices, smart home hardware, and infrastructure hosts that need inter-device communication. This is where [[panda]] and all smart home controllers live.
+
+| Device Class | Examples | Rationale |
+|-------------|----------|-----------|
+| HA controller | [[panda]] (.30.196) | Central hub — needs access to all IoT |
+| Zigbee/Thread hubs | [[home-assistant-connect-zbt-2]], [[aqara-hub-m3]] (.30.59) | Must reach Zigbee devices + HA |
+| Voice assistants | Echo Dots (×4) | Matter controllers, need HA access |
+| Media players | Apple TV (.30.234), LG TV (.30.79) | Controlled by HA + phones |
+| Smart lighting | Shelly (×2), Govee (×5), TP-Link (×4) | WiFi actuators, HA-controlled |
+| Climate | Nest Thermostat (.30.179) | HA + Google ecosystem |
+| Air purifiers | Levoit Vital 200S (.30.21), AMWAY (.30.161) | WiFi appliances |
+| Sensors/Locks | Aqara Zigbee devices (via hubs) | Non-IP, behind Zigbee coordinators |
+| Cameras | Aqara Doorbell (.30.118), Camera Hub G3 (.30.113) | Aqara ecosystem, HA-managed |
+| Robot vacuum | Eufy Omni C20 (.30.50) | WiFi appliance |
+| Voice PE | HA Voice PE (.30.25) | ESPHome voice assistant |
+| Sleep mat | Withings Rest (.30.177) | Health device |
+| Infrastructure | Grizzley (.30.84), Ubuntu (.30.61), Ice (.30.197) | Also have .50.x on Production |
+| NAS | TrueNAS (.30.11) | Also .50.12 on Production |
+
+### VLAN 50 — "Production" (Server Infrastructure)
+
+**Policy**: Server-to-server communication only. Infrastructure hosts carry dual NICs — .50.x for production traffic, .30.x for HA/IoT management.
+
+| Device Class | Examples | Rationale |
+|-------------|----------|-----------|
+| Docker hosts | Ubuntu (.50.61), Grizzley (.50.84) | Production services |
+| NAS | TrueNAS (.50.12) | Storage backend |
+| Control plane | Ice (.50.197) | Gateway + monitoring |
+| Proxmox | PVE (.50.11) | Hypervisor |
+
+### VLAN 20 — "Will of D. (Guest)" (Guest Access)
+
+**Policy**: Internet-only access, no internal device communication.
+
+| Device Class | Examples | Rationale |
+|-------------|----------|-----------|
+| Guest phones | Any | Internet only |
+| Solar monitor | SunPower (.20.190) | Internet-only reporting? ⚠️ Verify |
+
+### Default — No VLAN (Management)
+
+**Policy**: Network infrastructure management. Switches, wired-only devices without VLAN tagging.
+
+| Device Class | Examples | Rationale |
+|-------------|----------|-----------|
+| Managed switch | TP-Link SG108PE (.1.92) | Switch management |
+| Unknown wired | HYTERevolt (.1.143), VectorPro (.1.77) | Unidentified — investigate |
+
+## Cross-VLAN Firewall Rules
+
+Current state and recommended rules:
+
+### Required (Missing)
+
+| Source | Destination | Ports | Purpose | Priority |
+|--------|------------|-------|---------|----------|
+| VLAN 10 | VLAN 30:8123 | TCP 8123 | Phone → HA dashboard | High |
+| VLAN 10 | VLAN 30:443 | TCP 443 | Phone → Traefik ingress to HA | High |
+| VLAN 10 | VLAN 30 (Eufy) | Eufy app ports | Phone → Baby cameras | Medium |
+| VLAN 50 | VLAN 30 | All | Server ↔ IoT management | Medium |
+| VLAN 30 | VLAN 50 | All | IoT → Storage (NFS, S3) | Medium |
+
+### Already Working (Same VLAN)
+
+| Source → Dest | VLAN | Why it works |
+|--------------|------|-------------|
+| Phone → Eufy cameras | 10 → 10 | Same VLAN, no firewall needed |
+| HA → All IoT devices | 30 → 30 | Same VLAN, no firewall needed |
+| Echo → Alexa cloud | 30 → Internet | Outbound allowed by default |
+| Nest → Google cloud | 30 → Internet | Outbound allowed by default |
+
+## Placement Decision Tree
+
+```
+New device arrives
+├── Is it a personal phone/tablet/laptop/watch?
+│   └── YES → VLAN 10
+├── Is it a server or infrastructure host?
+│   ├── YES → Dual: VLAN 50 (production) + VLAN 30 (management)
+│   └── NO ↓
+├── Is it an IoT device managed by HA?
+│   ├── YES → VLAN 30
+│   └── NO ↓
+├── Does it need direct phone access WITHOUT HA?
+│   ├── YES → VLAN 10 (with note: add to HA if possible)
+│   └── NO ↓
+├── Is it a guest device?
+│   ├── YES → VLAN 20
+│   └── NO ↓
+└── Unknown → VLAN 30 (IoT) + investigate
+```
+
+## Exceptions & Rationale
+
+| Device | Expected VLAN | Actual VLAN | Reason |
+|--------|-------------|-------------|--------|
+| Eufy Baby Cameras (×3) | 30 | 10 | Phone accessibility without firewall rules |
+| SunPower Solar Monitor | 30 or 10 | 20 | Possibly internet-only reporting; verify |
+| HYTERevolt | 10 or 50 | Default | Unknown device — needs identification |
+| VectorPro | 50 | Default | Unknown device — needs identification |
+
+## Migration Checklist
+
+If moving Eufy cameras to VLAN 30 for better segmentation:
+
+1. Reserve IPs on VLAN 30 for 3 Eufy cameras
+2. Add UniFi firewall rule: VLAN 10 → VLAN 30, allow Eufy app ports (TCP 8006, 8080, 9000 — verify with Eufy docs)
+3. Add UniFi firewall rule: VLAN 10 → VLAN 30, allow mDNS (UDP 5353) for device discovery
+4. Reconnect cameras to IoT SSID
+5. Test phone app access from VLAN 10
+6. Update [[network-device-census]] with new IPs
+
+## Related Pages
+
+- [[network-device-census]] — Full device classification
+- [[iot-device-inventory]] — IoT devices by room
+- [[matter-multi-fabric]] — Matter ecosystem architecture
+- [[smart-home-handbook]] — Operational handbook
--- a/homelab/concepts/docker-traefik-stack.md
+++ b/homelab/concepts/docker-traefik-stack.md
@@ -0,0 +1,82 @@
+---
+title: Docker Traefik Stack
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, networking, homelab, docker, traefik]
+confidence: high
+---
+
+# Docker Traefik Stack
+
+Container orchestration and ingress configuration across the homelab. Two Traefik instances provide high-availability routing.
+
+## Traefik Instances
+
+| Instance | Host | Role | Version |
+|----------|------|------|---------|
+| ubuntu Traefik | 192.168.50.61 | Primary router | v3.6.7 |
+| grizzley Traefik | 192.168.50.84 | Edge ACME + ingress | v3.6.7 |
+
+See [[traefik-ha]] for the full HA strategy.
+
+## Dynamic Config Files (ubuntu)
+
+Located in `homelab/ubuntu/traefik/config/dynamic/`:
+
+| File | Services Routed |
+|------|----------------|
+| `canonical-hosts.yml` | Grizzley ingress proxy, PVE OpenCode |
+| `gitea.yml` | gitea.tophermayor.com |
+| `homeassistant.yml` | ha.tophermayor.com |
+| `immich.yml` | immich.tophermayor.com |
+| `jellyfin.yml` | jellyfin.tophermayor.com |
+| `jellyseerr.yml` | jellyseerr.tophermayor.com |
+| `media-stack.yml` | Sonarr, Radarr, SABnzbd, Prowlarr, qBittorrent, Lidarr, Readarr (via gluetun) |
+| `middlewares.yml` | 30+ middleware definitions |
+| `opencode.yml` | opencode.tophermayor.com |
+| `proxmox.yml` | proxmox.local.tophermayor.com |
+| `stremio.yml` | stremio.local.tophermayor.com |
+| `traefik-dashboard.yml` | traefik.local.tophermayor.com |
+| `truenas.yml` | truenas.local.tophermayor.com |
+| `vaultwarden.yml` | vaultwarden.tophermayor.com |
+| `wildcard-certs.yml` | TLS certificate file references |
+
+## Common Middlewares
+
+| Middleware | Purpose |
+|------------|---------|
+| `local-only@file` | Restrict to local network IPs |
+| `authentik-auth@file` | SSO authentication |
+| `security-headers@file` | Add security headers |
+| `crowdsec-bouncer@file` | Rate limiting and threat protection |
+
+## Docker Networks
+
+| Network | Scope | Purpose |
+|---------|-------|---------|
+| `proxy-net` | External | Traefik-routed services |
+| `app-net` | External | Internal backend communication |
+| `authentik-internal` | Bridge | SSO isolation |
+| `monitoring-internal` | Bridge | Metrics/logs isolation |
+| `immich-internal` | Bridge | Immich DB/Redis/ML |
+| `traefik-proxy` | Bridge (grizzley) | Grizzley edge Traefik |
+| `media-net` | External | Media stack isolation |
+
+## Container Labels
+
+Standard Traefik labels:
+```yaml
+labels:
+  - "traefik.enable=true"
+  - "traefik.http.services.<service>.loadbalancer.server.port=8096"
+  - "traefik.http.routers.<router>.rule=Host(`service.tophermayor.com`)"
+  - "traefik.http.routers.<router>.tls.certresolver=cloudflare"
+```
+
+## Related
+
+- [[traefik-ha]] — Traefik HA strategy across ubuntu + grizzley
+- [[sso-authentik]] — Authentik SSO middleware
+- [[media-stack]] — Media automation routing
+- [[hermes-opencode-cluster]] — OpenCode routing via Traefik
--- a/homelab/concepts/forge-ai.md
+++ b/homelab/concepts/forge-ai.md
@@ -0,0 +1,144 @@
+---
+title: Forge AI
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, ai, tools, cli]
+sources: [../raw/articles/forge/]
+confidence: high
+---
+
+# Forge AI
+
+Forge AI (ForgeCode) is a CLI-based AI coding harness — a competitor to Claude Code with first-class support for many AI providers. It works with cloud models, open-weight models, and local models.
+
+**Website:** https://forgecode.dev
+
+## Agents
+
+Forge provides three built-in agents:
+
+| Agent | Access | Purpose |
+|-------|--------|---------|
+| **muse** | read + write | Planning and analysis — reviews impact, plans changes |
+| **forge** | read + write | Implementation — makes changes, fixes bugs (default) |
+| **sage** | read | Research — used internally by muse/forge for codebase understanding |
+
+Typical workflow: use `muse` to plan, switch to `forge` to implement.
+
+Switch agents with `:agent`, `:muse`, `:forge`.
+
+## Custom Agents
+
+Create agents as markdown files with YAML frontmatter in `.forge/agents/` (project) or `~/forge/agents/` (global).
+
+```yaml
+---
+id: my-agent
+title: My Agent
+description: Brief description
+tools: [read, search, shell]
+model: claude-sonnet-4
+provider: anthropic
+temperature: 0.1
+---
+System prompt here.
+```
+
+Tools: read, write, patch, shell, search, fetch, remove, undo, or `"*"` for all.
+
+## Custom Commands
+
+Repeatable workflows as slash commands in `.forge/commands/`:
+
+```markdown
+---
+name: check
+description: Runs lint and tests before commit
+---
+Run `lint` and `test`, fix any issues found.
+<lint>cargo clippy --fix</lint>
+<test>cargo test</test>
+```
+
+Invoke with `:check` in the Forge chat.
+
+## MCP Integration
+
+Connect external tools via `.mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "browser": {
+      "command": "npx",
+      "args": ["@playwright/mcp@latest"]
+    }
+  }
+}
+```
+
+Manage with `forge mcp import`, `forge mcp list`, `forge mcp remove`, `forge mcp reload`.
+
+## Environment Variables
+
+| Variable | Default | Purpose |
+|----------|---------|---------|
+| `FORGE_TERM` | on | Terminal context capture — passes command history to the model |
+| `FORGE_TERM_MAX_COMMANDS` | 5 | History buffer size |
+| `FORGE_CONFIG` | `~/forge/` | Config directory (for dotfiles repos) |
+| `FORGE_BIN` | `forge` | Binary path (for local builds or version switching) |
+
+## $FORGE_TERM
+
+On by default. The Zsh plugin tracks what commands you run, whether they succeeded, and passes that to ForgeCode on every `:` invocation. Means `forge fix it` already knows what failed — no need to narrate.
+
+Disable per-session: `export FORGE_TERM=false`
+
+## Forge Services
+
+Optional backend for enhanced capabilities: context engine (semantic search), tool-call guardrails, and skill engine. Enable with `:login` → select ForgeServices.
+
+Index project with `:sync`, check status with `:sync-status`.
+
+## Setup
+
+```bash
+# 1. Install
+curl -fsSL https://forgecode.dev/cli | sh
+
+# 2. Zsh plugin
+forge zsh setup
+
+# 3. Login to provider
+:login
+
+# 4. Pick model
+:model
+
+# 5. First prompt
+: Hi!
+```
+
+Requires: Nerd Font, Zsh.
+
+## Skills
+
+ForgeCode skills are markdown files (`.forge/skills/`) that provide reusable workflows. Similar to custom commands but more powerful — skills can use templating and conditional logic.
+
+## Configuration Files
+
+| File | Purpose |
+|------|---------|
+| `.forge.toml` | Main config ( ForgeConfig dir) |
+| `.mcp.json` | MCP server definitions |
+| `.forge/agents/` | Custom agent definitions |
+| `.forge/commands/` | Custom slash commands |
+| `.forge/skills/` | Reusable skill workflows |
+| `AGENTS.md` | Project-wide rules for all agents |
+
+## Related
+
+- [[opencode-cluster]] — OpenCode cluster setup in this homelab
+- [[ai-applications]] — AI application stack on ubuntu
+- [[hermes-gateway]] — Hermes gateway used for model routing
--- a/homelab/concepts/gitops.md
+++ b/homelab/concepts/gitops.md
@@ -0,0 +1,62 @@
+---
+title: GitOps
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, git, automation]
+sources: [../automation/scripts.md, ../../homelab/architecture.md]
+---
+
+# GitOps
+
+The homelab uses a GitOps pattern where the git repository IS the infrastructure.
+
+## Core Principle
+
+All configuration lives in `/home/bear/homelabagentroot/`. Each host pulls its configs from the repo. Agents (Hermes, OpenCode) commit changes and push to Gitea. Other hosts pull on next session.
+
+## Repository Structure
+
+```
+homelabagentroot/
+├── homelab/           # Infrastructure configs per host
+│   ├── ubuntu/         # Docker Compose, configs
+│   ├── grizzley/      # RPi5 edge configs
+│   ├── ice/           # Control plane configs
+│   └── proxmox/       # VM/LXC configs
+├── scripts/           # Shared automation
+├── ansible/           # Playbooks for deployment
+├── obsidian-vault/    # Wiki (IS the vault)
+└── .opencode/         # OpenCode agent config
+```
+
+## Git Triggers
+
+| Action | What Happens |
+|--------|-------------|
+| Agent commits & pushes | Configs pushed to Gitea |
+| Other host pulls | Gets latest configs |
+| Drift detected | `detect-drift.py` or `drift_detector.py` flags differences |
+| Manual deploy | `ansible-playbook deploy-services.yml --limit <host>` |
+
+## Agents Using GitOps
+
+| Agent | Host | Role |
+|-------|------|------|
+| Hermes | ice, grizzley | Commit infra changes, push to Gitea |
+| OpenCode | ubuntu, ice | Read/write configs, run Ansible |
+| Gitea | ubuntu | GitOps hub — all repos live here |
+
+## Key Files
+
+- `scripts/homelab/deploy-service.py` — Deploy services to remote hosts
+- `scripts/homelab/detect-drift.py` — Detect config drift between repo and hosts
+- `ansible/playbooks/deploy-services.yml` — Restart Docker services
+- `ansible/playbooks/sync-configs.yml` — Pull/push docker-compose configs
+
+## Related
+
+- [[gitea]] — Git host and GitOps runner hub
+- [[ubuntu]] — Primary Docker host where most configs deploy
+- [[ice]] — Control plane, primary Hermes Agent host
+- [[deployment-scripts]] — Full automation scripts inventory
--- a/homelab/concepts/hermes-opencode-cluster.md
+++ b/homelab/concepts/hermes-opencode-cluster.md
@@ -0,0 +1,52 @@
+---
+title: Hermes OpenCode Cluster
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, ai, homelab, agents]
+confidence: high
+---
+
+# Hermes OpenCode Cluster
+
+AI agent cluster setup — OpenCode instances deployed as systemd services across the homelab, with Hermes gateway providing model routing.
+
+## Instance Overview
+
+| Instance | Host | IP | Port | Traefik Route | Status |
+|----------|------|-----|------|---------------|--------|
+| ubuntu | Ubuntu VM | 192.168.50.61 | 4096 | opencode.tophermayor.com | Active (systemd) |
+| ice | Raspberry Pi 4 | 192.168.50.197 | 4096 | opencode-ice.tophermayor.com | Active (systemd) |
+| grizzley | Raspberry Pi 5 | 192.168.50.84 | 4096 | — | Inactive/disabled |
+
+## Host Context Detection
+
+Each host clone has a `.host-context` file that identifies the local context. See [[host-context-detection]] for the full detection table.
+
+## Skills
+
+Skills are located in `.agents/skills/` and `.opencode/`:
+
+- `proxmox-management` — VM/LXC operations
+- `traefik-diagnostic` — Router/service health
+- `truenas-storage` — ZFS pool/share management
+- `authentik-sso` — SSO/OIDC configuration
+- `media-stack` — Radarr, Sonarr, Jellyfin management
+- `komodo-management` — Docker stack deployment
+- `host-power-management` — Wake-on-LAN, VM control
+- `infra-audit` — Live infrastructure verification
+
+## Hermes Gateway
+
+Hermes runs on grizzley as the central gateway, providing:
+- Telegram notifications (topic 1033 "Cron Jobs")
+- Model routing across providers
+- DeepSeek V4 integration (primary), Anthropic (fallback)
+- Watchdog monitoring for gateway health
+
+## Related
+
+- [[host-context-detection]] — Per-host agent detection
+- [[forge-ai|Forge AI]] — ForgeCode CLI coding harness
+- [[hermes-gateway|Hermes gateway]] — model routing and notifications
+- [[opencode-cluster|OpenCode cluster]] — detailed OpenCode systemd deployment
--- a/homelab/concepts/homelab-network-architecture.md
+++ b/homelab/concepts/homelab-network-architecture.md
@@ -0,0 +1,363 @@
+---
+title: Homelab Network Architecture
+created: 2026-04-29
+updated: 2026-04-29
+type: concept
+tags: [concept, networking, homelab, traefik, ha]
+sources: []
+---
+
+# Homelab Network Architecture
+
+Complete traffic flow and routing topology for the homelab cluster. Covers Traefik dual-instance HA, VRRP failover, certificate distribution, Docker network segmentation, and all routing rules.
+
+## Traffic Flow Overview
+
+```
+Internet (Cloudflare DNS)
+        │
+        ▼  *.tophermayor.com A → home public IP
+══════════════════════════════════════════════════════════════════════
+  VRRP VIP 192.168.50.80/27 (eth0.50) — keepalived
+  ┌─────────────────────────────────────────────────────────────┐
+  │  PRIMARY: ubuntu traefik (when up)                         │
+  │  BACKUP:  grizzley traefik-pi (when ubuntu fails)         │
+  └─────────────────────────────────────────────────────────────┘
+        │
+        ▼ port 80/443
+┌──────────────────────────────────────────────────────────────────┐
+│                    grizzley traefik-pi                          │
+│  Edge ingress controller (ACME master, Cloudflare DNS challenge) │
+│  IP: 192.168.50.84 | Ports: 80,443,2222,8080,19132udp,19134udp  │
+│  Network: traefik-proxy                                         │
+│  Certs: /mnt/truenas/traefik-certs/grizzley (NFS)             │
+└──────────────────────────────────────────────────────────────────┘
+        │
+        ├──[grizzley-local services]──────────────────────────► served directly
+        │    vaultwarden, uptime-kuma, komodo, homepage,
+        │    aiostreams, aiomanager, aiometadata,
+        │    opencode-ice, homeassistant, proxmox, truenas
+        │
+        └──[everything else]────────────────────────────────────► forwarded to ubuntu
+             (upstream-ingress.yml load-balances to ubuntu:443)
+```
+
+## DNS Zones
+
+| Zone | Example | Resolution |
+|------|---------|------------|
+| Public (`*.tophermayor.com`) | `gitea.tophermayor.com`, `jellyfin.tophermayor.com` | Cloudflare → home public IP |
+| Local (`*.local.tophermayor.com`) | `sonarr.local.tophermayor.com`, `proxmox.local.tophermayor.com` | UniFi Controller DHCP/DNS |
+
+Cloudflare proxies all `*.tophermayor.com` — origin IP is hidden, DDoS protection active.
+
+## Network Segmentation
+
+### Physical / VLAN
+
+| Network | Subnet | Gateway | Hosts |
+|---------|--------|---------|-------|
+| Production (VLAN 50) | 192.168.50.0/24 | 192.168.50.1 | ice, grizzley, ubuntu, proxmox, truenas |
+| Default (VLAN 1) | 192.168.1.0/24 | 192.168.1.1 | Management workstations |
+| Trusted (VLAN 3) | 192.168.3.0/24 | — | Trusted devices |
+| WireGuard VPN | 192.168.4.0/24 | — | VPN clients |
+| Docker bridge | 172.16.0.0/12 | — | Container internal networking |
+
+### Docker Networks (ubuntu)
+
+| Network | Driver | Subnet | Connected Services |
+|---------|--------|--------|-------------------|
+| `proxy-net` | bridge | 172.18.0.0/16 | traefik (primary ingress), homepage-ubuntu |
+| `app-net` | bridge | 172.20.0.0/16 | general application containers |
+| `uefi-proxynet` | bridge | 172.26.0.0/16 | — |
+| `authentik_authentik-internal` | bridge | — | authentik server/worker/redis |
+| `monitoring_monitoring-internal` | bridge | — | prometheus, grafana, loki, alertmanager |
+| `immich_immich-internal` | bridge | — | immich stack |
+| `reccollection-internal` | bridge | — | reccollection stack |
+| `ai-subscriptions_default` | bridge | — | ai-subscriptions |
+| `infisical_infisical` | bridge | — | infisical stack |
+
+### Docker Networks (grizzley)
+
+| Network | Driver | Connected Services |
+|---------|--------|-------------------|
+| `traefik-proxy` | bridge | traefik-pi, homepage-grizzley, komodo, aiostreams, aiomanager, aiometadata, vaultwarden, uptime-kuma |
+| `aiomanager_default` | bridge | aiomanager stack |
+| `aiometadata_aiometadata-internal` | bridge | aiometadata stack |
+| `komodo_komodo-internal` | bridge | komodo stack |
+| `homepage_default` | bridge | homepage-grizzley |
+| `desktop-test_default` | bridge | test containers |
+
+## High Availability (VRRP / Keepalived)
+
+Two Traefik instances provide failover via keepalived VRRP on VLAN 50.
+
+| Parameter | Value |
+|-----------|-------|
+| Interface | `eth0.50` (VLAN 50) |
+| Virtual Router ID | 51 |
+| ubuntu priority | **PRIMARY** (higher) |
+| grizzley priority | **BACKUP** (90) |
+| Virtual IP | `192.168.50.80/27` |
+| Auth | PASS (`HomelabH`) |
+| Health check | `/etc/keepalived/check_traefik.sh` — 2s interval, fall 2, rise 2 |
+
+When ubuntu Traefik fails health checks, keepalived promotes grizzley to MASTER and the VIP moves to grizzley's interface. Traffic for `*.tophermayor.com` and `*.local.tophermayor.com` then routes to grizzley's traefik-pi (192.168.50.84).
+
+## Certificate Architecture
+
+```
+Cloudflare DNS Challenge (grizzley traefik-pi)
+        │
+        ▼
+ACME writes certs to /etc/traefik/certs/acme.json
+        │
+        ▼ (real-time via NFS)
+/mnt/truenas/traefik-certs/grizzley (NFS share from TrueNAS)
+        │
+        ▼ (read by ubuntu traefik at startup/reread)
+ubuntu traefik serves same wildcard certs (*.tophermayor.com)
+```
+
+Both instances serve the **same** Cloudflare-issued wildcard certificate (`*.tophermayor.com`) for all public-facing services. The ACME challenge only runs on grizzley — ubuntu syncs certs via NFS.
+
+## Traefik Instance Comparison
+
+| Aspect | ubuntu (PRIMARY) | grizzley (BACKUP / ACME) |
+|--------|-----------------|--------------------------|
+| Container | `traefik` | `traefik-pi` |
+| Image | `traefik:v3.6.7` | `traefik:v3.6.7` |
+| IP | 192.168.50.61 | 192.168.50.84 |
+| Port 80/443 | Direct | Direct |
+| HTTP→HTTPS | ✓ | ✓ |
+| Cloudflare ACME | ✗ (reads via NFS) | ✓ (origin) |
+| Static configs | `middlewares.yml` | `middlewares.yml` |
+| Dynamic configs | 29 files | 4 files |
+| Networks | `proxy-net`, `app-net`, `uefi-proxynet` | `traefik-proxy` |
+| Metrics port | — | 8080 |
+| SSH proxy port | — | 2222 |
+| UDP Minecraft | — | 19132, 19134 |
+| upstream-ingress | (receives traffic) | forwards to ubuntu |
+
+## Traefik Dynamic Configs
+
+### grizzley (Edge / ACME)
+
+| File | Contents |
+|------|---------|
+| `pi-routers.yml` | Wildcard cert triggers (`traefik-wildcard.local.tophermayor.com`, `traefik-wildcard.tophermayor.com`) |
+| `grizzley-services.yml` | 11 local routers: vaultwarden, uptime-kuma, komodo, homepage, opencode-ice, aiostreams, aiomanager, aiometadata, homeassistant, proxmox, truenas |
+| `upstream-ingress.yml` | Forwards all unmatched traffic to ubuntu Traefik (HTTPS 192.168.50.61) |
+| `metrics.yml` | Internal metrics endpoints |
+| `middlewares.yml` | IP allowlists (`local-only`, `homepage-localonly`), security headers |
+
+### ubuntu (Primary Router)
+
+| File | Contents |
+|------|---------|
+| `gitea.yml` | gitea.tophermayor.com → gitea:3000 |
+| `immich.yml` | immich.tophermayor.com → immich_server:2283 |
+| `jellyfin.yml` | jellyfin.tophermayor.com → jellyfin:8096 (rate limit + jellyfin headers) |
+| `media-stack.yml` | sonarr, radarr, lidarr, prowlarr, qbittorrent, sabnzbd, readarr, sonarr-anime, radarr-anime, lazylibrarian, nzbdav → via gluetun VPN tunnel |
+| `opencode.yml` | opencode.tophermayor.com → host.docker.internal:4096 |
+| `proxmox.yml` | proxmox.local.tophermayor.com → https://192.168.50.11:8006 |
+| `homepage-widgets.yml` | Internal routes (sonarr-internal, radarr-internal, etc.) → gluetun VPN tunnel |
+| `upstream-ingress.yml` | Homepage routes to homepage-ubuntu:3003 and homepage-grizzley:3000 |
+| `whisper.yml` | whisper.local.tophermayor.com → faster-whisper-server:8394 |
+| `truenas.yml` | truenas.local.tophermayor.com → TrueNAS web UI |
+| `navidrome.yml` | navidrome.tophermayor.com |
+| `audiobookshelf.yml` | audiobooks.tophermayor.com |
+| `calibre-web.yml` | calibre-web.local.tophermayor.com |
+| `kavita.yml` | kavita.tophermayor.com |
+| `rustfs.yml` | rustfs S3 routes |
+| `stremio.yml` | stremio routes |
+| `jellyseerr.yml` | jellyseerr.tophermayor.com |
+| `comparaison.yml` | comparison service |
+| `inventory.yml` | inventory service |
+| `cabo-voting.yml` | Cabo voting app |
+| `gsd-mcp.yml` | GSD MCP server |
+| `ai-subscriptions.yml` | AI subscriptions service |
+| `hermes-dashboard.yml` | Hermes dashboard routes |
+| `homeassistant.yml` | Home Assistant route |
+| `umm.yml` | Unified media manager |
+| `middlewares.yml` | Full middleware stack (see below) |
+
+## All Traefik Routes
+
+### grizzley traefik-pi (Local Services)
+
+| Domain | Service | Backend | Middleware | Cert |
+|--------|---------|---------|------------|------|
+| `vaultwarden.tophermayor.com` | vaultwarden | vaultwarden:80 | — | cloudflare |
+| `status.tophermayor.com` | uptime-kuma | uptime-kuma:3001 | — | cloudflare |
+| `komodo.local.tophermayor.com` | komodo | komodo:9120 | — | cloudflare |
+| `homepage.local.tophermayor.com` | homepage | homepage-grizzley:3000 | homepage-localonly | cloudflare |
+| `opencode-ice.local.tophermayor.com` | opencode-ice | 192.168.50.197:4096 | local-only | cloudflare |
+| `aiostreams.tophermayor.com` | aiostreams | aiostreams:3002 | — | cloudflare |
+| `aiomanager.tophermayor.com` | aiomanager | aiomanager:1610 | — | cloudflare |
+| `aiometadata.tophermayor.com` | aiometadata | aiometadata:1337 | — | cloudflare |
+| `ha.tophermayor.com` | homeassistant | 192.168.30.196:8123 | — | cloudflare |
+| `proxmox.local.tophermayor.com` | proxmox | 192.168.50.11:8006 | local-only | cloudflare |
+| `truenas.local.tophermayor.com` | truenas | 192.168.50.12:8080 | local-only | cloudflare |
+| `traefik-grizzley.local.tophermayor.com` | dashboard | api@internal | local-only | cloudflare |
+| `metrics-grizzley.local.tophermayor.com` | metrics | api@internal | local-only | cloudflare |
+
+### grizzley traefik-pi (Upstream → ubuntu)
+
+Traffic NOT matched above is forwarded via `upstream-ingress.yml`:
+
+| Rule | Target |
+|------|--------|
+| `HostRegexp(^[a-z0-9-]+\.local\.tophermayor\.com$) && !homepage && !traefik-grizzley && !metrics-grizzley && !traefik-wildcard && !opencode-ice` | → ubuntu:443 |
+| `HostRegexp(^[a-z0-9-]+\.tophermayor\.com$) && !traefik-wildcard` | → ubuntu:443 |
+
+### ubuntu traefik (Public Routes — *.tophermayor.com)
+
+| Domain | Backend | Middleware |
+|--------|---------|------------|
+| `gitea.tophermayor.com` | gitea:3000 | homelab-public |
+| `immich.tophermayor.com` | immich_server:2283 | homelab-public |
+| `jellyfin.tophermayor.com` | jellyfin:8096 | ratelimit, jellyfin-headers |
+| `audiobooks.tophermayor.com` | audiobookshelf | homelab-public |
+| `navidrome.tophermayor.com` | navidrome | homelab-public |
+| `kavita.tophermayor.com` | kavita:5000 | homelab-public |
+| `opencode.tophermayor.com` | host.docker.internal:4096 | local-only, opencode-streaming, opencode-cors |
+| `ha.tophermayor.com` | 192.168.30.196:8123 | (see homeassistant.yml) |
+| `jellyseerr.tophermayor.com` | jellyseerr | homelab-public |
+
+### ubuntu traefik (Local Routes — *.local.tophermayor.com)
+
+| Domain | Backend | Middleware | Notes |
+|--------|---------|------------|-------|
+| `sonarr.local.tophermayor.com` | gluetun:8989 | local-only | Via VPN tunnel |
+| `radarr.local.tophermayor.com` | gluetun:7878 | local-only | Via VPN tunnel |
+| `lidarr.local.tophermayor.com` | gluetun:8686 | local-only | Via VPN tunnel |
+| `sabnzbd.local.tophermayor.com` | gluetun:8080 | local-only | Via VPN tunnel |
+| `qbittorrent.local.tophermayor.com` | qbittorrent | local-only | |
+| `prowlarr.local.tophermayor.com` | prowlarr | local-only | |
+| `readarr.local.tophermayor.com` | readarr | local-only | |
+| `sonarr-anime.local.tophermayor.com` | sonarr-anime | local-only | Via VPN tunnel |
+| `radarr-anime.local.tophermayor.com` | radarr-anime | local-only | Via VPN tunnel |
+| `flaresolverr.local.tophermayor.com` | flaresolverr | local-only | |
+| `bazarr.local.tophermayor.com` | bazarr:6767 | local-only | |
+| `lazylibrarian.local.tophermayor.com` | lazylibrarian | local-only | |
+| `nzbdav.local.tophermayor.com` | nzbdav | local-only | |
+| `calibre-web.local.tophermayor.com` | calibre-web:8083 | local-only | |
+| `stremio.local.tophermayor.com` | stremio-server | local-only | |
+| `proxmox.local.tophermayor.com` | 192.168.50.11:8006 | proxmox-headers, local-only | |
+| `truenas.local.tophermayor.com` | 192.168.50.12:8080 | local-only | |
+| `opencode-ice.local.tophermayor.com` | 192.168.50.197:4096 | local-only | |
+| `whisper.local.tophermayor.com` | faster-whisper-server:8394 | local-only | |
+| `traefik.local.tophermayor.com` | api@internal | local-only | Dashboard |
+
+### Internal Widget Routes (sonarr-internal, etc.)
+
+These are `*-internal.local.tophermayor.com` routes for Homepage widgets, accessible only inside the network via the gluetun VPN tunnel. From `homepage-widgets.yml`:
+
+| Internal Domain | Backend (via gluetun) |
+|-----------------|----------------------|
+| `sonarr-internal.local.tophermayor.com` | gluetun:8989 |
+| `radarr-internal.local.tophermayor.com` | gluetun:7878 |
+| `lidarr-internal.local.tophermayor.com` | gluetun:8686 |
+| `sabnzbd-internal.local.tophermayor.com` | gluetun:8080 |
+| `seerr-internal.local.tophermayor.com` | seerr:5055 |
+| `jellyfin-internal.local.tophermayor.com` | jellyfin:8096 |
+| `prometheus-internal.local.tophermayor.com` | prometheus:9090 |
+
+### Special Protocols
+
+| Protocol | Port | Host | Purpose |
+|----------|------|------|---------|
+| HTTP→HTTPS | 80 | grizzley | Redirects to 443 |
+| HTTPS | 443 | grizzley | All TLS traffic |
+| QUIC/HTTP3 | 443/udp | grizzley | HTTP3 support |
+| Traefik metrics | 8080 | grizzley | Prometheus scraping |
+| Gitea SSH proxy | 2222 | grizzley | → ubuntu:2222 |
+| Minecraft Bedrock | 19132/udp | grizzley | Bedrock server (standby) |
+| Minecraft Bedrock | 19134/udp | grizzley | Bedrock server (sison) |
+
+## Middleware Chains (ubuntu)
+
+### homelab-public
+Applied to: gitea, immich, audiobookshelf, navidrome, kavita, jellyseerr, etc.
+```
+chain: [compress, security-headers, buffering, ratelimit]
+```
+
+### Security Headers
+Applied to most services:
+```yaml
+browserXssFilter: true
+contentTypeNosniff: true
+forceSTSHeader: true
+stsIncludeSubdomains: true
+stsPreload: true
+stsSeconds: 31536000        # 1 year
+customFrameOptionsValue: SAMEORIGIN
+```
+
+### Jellyfin-specific Headers
+Adds CSP allowing jsDelivr CDN for the Ultrachromic theme:
+```yaml
+contentSecurityPolicy: "style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net https://fonts.googleapis.com; ..."
+```
+
+### Authentik ForwardAuth (SSO)
+Applied to: sonarr, radarr, lidarr, prowlarr, bazarr, sabnzbd, transmission, qbittorrent, flaresolverr, jellyseerr, listsync, dockge, it-tools, bentopdf, code-ai, and more.
+
+Each service has its own middleware with `X-authentik-host` query param:
+```
+http://authentik-server:9000/outpost.goauthentik.io/auth/traefik?X-authentik-host=<domain>
+```
+
+### local-only IP Allowlist
+```yaml
+sourceRange:
+  - 127.0.0.1/32
+  - 192.168.50.0/24   # Production
+  - 192.168.1.0/24    # Management
+  - 192.168.3.0/24    # Trusted
+  - 192.168.4.0/24    # WireGuard VPN
+  - 172.16.0.0/12     # Docker
+  - 10.0.0.0/8        # VPN/Docker
+```
+
+### Rate Limiting
+```yaml
+average: 100
+burst: 50
+```
+
+## VPN Tunnel (gluetun)
+
+Media automation services route through **gluetun** VPN container for privacy when connecting to torrent/indexer services:
+- sonarr → gluetun:8989
+- radarr → gluetun:7878
+- lidarr → gluetun:8686
+- sabnzbd → gluetun:8080
+
+gluetun ports: 8000, 8388, 8888 (TCP), 8388 (UDP) — exposed on ubuntu's Docker network.
+
+## SSH Routing
+
+Gitea SSH is proxied through grizzley:
+```
+Internet → grizzley:2222 (SNI * → any)
+    → forwards to ubuntu:2222
+    → gitea container handles git SSH protocol
+```
+
+## UniFi Controller
+
+Network services (DHCP, DNS, VLAN tagging) managed by UniFi Controller at 192.168.1.1 (or similar). All internal DNS for `*.local.tophermayor.com` resolves through the UniFi DNS forwarder.
+
+## Related
+
+- [[traefik]] — Traefik entity page
+- [[grizzley]] — RPi5 edge node (ACME master, backup ingress)
+- [[ubuntu]] — Primary Docker host (primary ingress router)
+- [[truenas]] — NFS storage for cert sync
+- [[traefik-ha]] — HA concept page
+- [[homepage]] — Dashboard services with widget routes
+- [[authentik]] — SSO identity provider
+- [[sso-authentik]] — SSO configuration details
--- a/homelab/concepts/host-context-detection.md
+++ b/homelab/concepts/host-context-detection.md
@@ -0,0 +1,53 @@
+---
+title: Host Context Detection
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, homelab, agents]
+confidence: high
+---
+
+# Host Context Detection
+
+Detects which host's filesystem a repository clone represents, enabling AI agents to understand their operational context without asking.
+
+## Quick Reference
+
+| Host | IP | Context | Agent | Port |
+|------|-----|---------|-------|------|
+| **ubuntu** | 192.168.50.61 | ubuntu | OpenCode | 4096 |
+| **grizzley** | 192.168.50.84 | grizzley | Hermes | 8644 |
+| **ice** | 192.168.50.197 | ice | OpenCode | 4096 |
+
+## Detection Methods
+
+```bash
+# Via Python
+python3 scripts/detect_host_context.py
+
+# Via Shell
+source scripts/load-host-context.sh
+```
+
+## Context Files
+
+| File | Purpose |
+|------|---------|
+| `.host-context` | Context marker per host (gitignored) |
+| `scripts/detect_host_context.py` | Python detector |
+| `scripts/load-host-context.sh` | Shell loader |
+
+## Agent Integration
+
+| Agent | Harness | Context Detection |
+|-------|---------|-------------------|
+| OpenCode | systemd | `.opencode/opencode.json` init |
+| Hermes | systemd | Runs on grizzley (implicit) |
+| Claude Code | CLI | direnv / shell env |
+| Cline | VS Code | Terminal env |
+
+## Related
+
+- [[opencode-cluster|OpenCode cluster]] — OpenCode instances across the cluster
+- [[hermes-gateway|Hermes gateway]] — runs on grizzley
+- [[forge-ai|Forge AI]] — ForgeCode CLI coding harness
--- a/homelab/concepts/index.md
+++ b/homelab/concepts/index.md
@@ -0,0 +1,55 @@
+---
+title: Homelab Concepts Index
+created: 2026-04-28
+updated: 2026-05-24
+type: index
+tags: [meta]
+---
+
+# Concepts Index
+
+> Content catalog for homelab concepts. Every concept page listed with a one-line summary.
+> Last updated: 2026-05-24 | Total pages: 19
+
+## Architecture & Infrastructure
+
+| Concept | Summary |
+|---------|---------|
+| [[docker-traefik-stack]] | Docker + Traefik orchestration — two Traefik instances, 15+ dynamic routes, 7 networks |
+| [[forge-ai]] | Forge AI (ForgeCode) — CLI coding harness, agents, custom commands, MCP integration |
+| [[gitops]] | GitOps workflow — repo IS the infrastructure, all hosts pull from Gitea |
+| [[traefik-ha]] | Traefik HA across ubuntu + grizzley — edge ACME, primary router, cert sync |
+| [[nfs-storage]] | TrueNAS NFS mount strategy — media on NFS, configs on local disk |
+| [[subscriptions]] | Full catalog of paid subscriptions + self-hosted services with cost breakdown |
+
+## Smart Home / IoT
+
+> Start at [[smart-home]] — the Map of Content for everything IoT.
+
+| Concept | Summary |
+|---------|---------|
+| [[smart-home]] | MOC — hub page with floor map, ecosystem controllers, quick navigation to all IoT pages |
+| [[matter-multi-fabric]] | Matter multi-admin architecture — fabric topology, hub-to-device mapping, commissioning |
+| [[iot-device-inventory]] | 38 IoT devices by room — Zigbee parents, Matter fabrics, ecosystem exposure |
+| [[network-device-census]] | Canonical classification of all 46 UniFi clients + 10 Zigbee devices |
+| [[smart-home-handbook]] | Operational handbook — architecture, quick reference, troubleshooting, improvement plan |
+| [[device-placement-policy]] | VLAN placement rules for every device class — decision tree, firewall rules, exceptions |
+
+## Operations
+
+| Concept | Summary |
+|---------|---------|
+| [[deployment-scripts]] | Homelab scripts, Ansible playbooks, maintenance automation |
+| [[hermes-opencode-cluster]] | OpenCode systemd cluster across ice/ubuntu/grizzley + Hermes gateway |
+| [[host-context-detection]] | Per-host context detection for AI agents (ice, ubuntu, grizzley) |
+| [[monitoring-pipeline]] | Prometheus → Alertmanager → Hermes webhook → Telegram alerting chain |
+| [[sso-authentik]] | Authentik SSO identity provider — OAuth2/OIDC, group bindings, Traefik middleware |
+
+## Automation & AI
+
+| Concept | Summary |
+|---------|---------|
+| [[ai-applications]] | AI application pipeline — Ollama GPU inference, embedding generation, Qdrant vector DB |
+| [[media-stack]] | Media automation stack — Sonarr, Radarr, Jellyfin, Tdarr, Gluetun VPN |
+| [[vm-storage-policy]] | Storage rules for Ubuntu VM — NFS for media/data, local for configs |
+| [[opencode-cluster]] | OpenCode AI coding assistant deployed as systemd services across hosts |
--- a/homelab/concepts/iot-device-inventory.md
+++ b/homelab/concepts/iot-device-inventory.md
@@ -0,0 +1,159 @@
+---
+title: IoT Device Inventory
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [iot, smart-home, zigbee-device, wifi-device, sensor, actuator, home-assistant]
+confidence: high
+sources: [UniFi Network clients, HA integrations, network-device-census]
+---
+
+# IoT Device Inventory
+
+> All IoT devices (iot-smart-home, iot-appliance, iot-camera) grouped by room/area. Includes Matter fabric membership, Zigbee parent, and ecosystem exposure. For full classification of all 46 network clients, see [[network-device-census]].
+
+## By Room / Area
+
+### baby\_room (3rd Floor)
+
+- **Aqara Light Switch H2 US** — Zigbee → ZHA | Actuator | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **Aqara Colorful Ceiling Light 36W** — Zigbee → ZHA | Actuator | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **eufy Baby Camera** — WiFi | `192.168.10.110` | VLAN 10 | Camera | No HA integration
+- **eufy Baby Camera** — WiFi | `192.168.10.113` | VLAN 10 | Camera | No HA integration
+- **eufy Baby Monitor** — WiFi | `192.168.10.120` | VLAN 10 | Camera | No HA integration
+- **Rest 2nd Gen** — WiFi | `192.168.30.177` | VLAN 30 | Sleep sound device | No HA integration
+
+### bedroom (3rd Floor)
+
+- **Aqara Hub M3** — Wired | `192.168.30.59` | VLAN 30 | Hub | HA: matter, zha | Fabrics: HA ✓, Apple†, Google†, Alexa† | Zigbee coordinator + Matter bridge
+- **Shelly 1PM Gen4** — WiFi | `192.168.30.75` | VLAN 30 | Actuator | HA: shelly | Ecosystem: HA | Ceiling light relay
+- **Govee Floor Lamp Left** — WiFi | `192.168.30.91` | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Govee Floor Lamp R** — WiFi | `192.168.30.217` | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Govee LED Strip** — WiFi | IP TBD | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Echo Dot (Bedroom)** — WiFi | `192.168.30.170` | VLAN 30 | Voice | HA: alexa\_devices | Ecosystem: HA, Alexa | Matter controller
+
+### dining\_room (2nd Floor)
+
+- No devices currently assigned
+
+### entrance (1st Floor)
+
+- **Aqara Light Switch H2 US** — Zigbee → ZHA | Actuator | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **Aqara Light Switch H2 US** (Front Door) — Zigbee → ZHA | Actuator | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **Aqara Smart Lock U100** — Zigbee/BLE → ZHA | Actuator | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **Aqara Video Doorbell G410** — WiFi | `192.168.30.118` | VLAN 30 | Camera | Ecosystem: HA
+
+### garage (1st Floor)
+
+- **Aqara Camera Hub G3** — WiFi | `192.168.30.113` | VLAN 30 | Camera | Ecosystem: HA
+- **Echo Dot (Garage)** — WiFi | `192.168.30.68` | VLAN 30 | Voice | HA: alexa\_devices | Ecosystem: HA, Alexa | Unnamed in UniFi (MAC 18:74:2e:d9:d7:28) | Matter controller
+
+### guest\_bathroom (3rd Floor)
+
+- No devices currently assigned
+
+### hall\_area (3rd Floor)
+
+- No devices currently assigned
+
+### kitchen (2nd Floor)
+
+- **Echo Dot (Kitchen)** — WiFi | `192.168.30.26` | VLAN 30 | Voice | HA: alexa\_devices | Ecosystem: HA, Alexa | Matter controller
+
+### laundry\_room (3rd Floor)
+
+- No devices currently assigned
+
+### living\_room (2nd Floor)
+
+- **LG OLED65C5AUA TV** — WiFi | `192.168.30.79` | VLAN 30 | Display | HA: webostv | Ecosystem: HA
+- **Aqara Motion Sensor P1** — Zigbee → ZHA | Sensor | Fabric: via [[aqara-hub-m3]] Matter bridge†
+- **IKEA STARKVIND Air Purifier** — Zigbee → ZHA | Actuator | Ecosystem: HA
+- **TP-Link KP115** — WiFi | `192.168.30.193` | VLAN 30 | Actuator | HA: tplink | Ecosystem: HA | Tall lamp plug
+- **Govee TV Backlight** — WiFi | IP TBD | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Govee Shelf Light** — WiFi | IP TBD | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Govee Square Light** — WiFi | IP TBD | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA
+- **Govee unnamed** — WiFi | `192.168.30.34` | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA | Possibly TV Backlight/Shelf/Square
+- **Govee unnamed** — WiFi | `192.168.30.242` | VLAN 30 | Actuator | HA: govee\_light\_local | Ecosystem: HA | Possibly TV Backlight/Shelf/Square
+
+### office (1st Floor)
+
+- **Apple TV 4K gen 3** — WiFi | IP TBD | VLAN 30 | Display | HA: apple\_tv | Ecosystem: HA, Apple | Matter controller (not in UniFi dump)
+- **Echo Dot (Office)** — WiFi | `192.168.30.150` | VLAN 30 | Voice | HA: alexa\_devices | Ecosystem: HA, Alexa | Matter controller
+- **Shelly 1PM Gen4** — WiFi | `192.168.30.7` | VLAN 30 | Actuator | HA: shelly | Ecosystem: HA | Light relay
+- **LG webOS Monitor** — WiFi | IP TBD | VLAN 30 | Display | HA: webostv | Ecosystem: HA
+
+### rooftop\_door (Rooftop)
+
+- **Aqara Door/Window Sensor** — Zigbee → ZHA | Sensor | Ecosystem: HA
+- **Aqara Vibration Sensor T1** — Zigbee → ZHA | Sensor | Ecosystem: HA
+
+### 1st Floor (unspecified)
+
+- **Aqara Light Switch H2 US** — Zigbee → ZHA | Actuator | Ecosystem: HA
+
+### Unassigned Room
+
+- **TP-Link HS103** — WiFi | `192.168.30.116` | VLAN 30 | Actuator | HA: tplink | Ecosystem: HA
+- **TP-Link HS103** — WiFi | `192.168.30.165` | VLAN 30 | Actuator | HA: tplink | Ecosystem: HA
+- **TP-Link HS103** — WiFi | `192.168.30.210` | VLAN 30 | Actuator | HA: tplink | Ecosystem: HA
+- **Nest Thermostat** — WiFi | `192.168.30.179` | VLAN 30 | Climate | HA: nest | Ecosystem: HA, Google | Google Home native
+- **eufy Omni C20** — WiFi | `192.168.30.50` | VLAN 30 | Vacuum | No HA integration | Robot vacuum
+- **Levoit Vital 200S** — WiFi | `192.168.30.21` | VLAN 30 | Purifier | HA: vesync | Ecosystem: HA
+- **HA Voice PE** — WiFi | `192.168.30.25` | VLAN 30 | Voice | HA: wyoming | Ecosystem: HA | ESPHome voice assistant
+
+## Zigbee Mesh Map
+
+All Zigbee devices coordinated by [[home-assistant-connect-zbt-2]] (Connect ZBT-2 dongle on [[panda]]):
+
+```
+ZBT-2 (Coordinator)
+├── Aqara Hub M3 (Matter bridge, also wired Thread BR)
+├── Aqara Door/Window Sensor (rooftop)
+├── Aqara Vibration Sensor T1 (rooftop)
+├── Aqara Motion Sensor P1 (living room)
+├── Aqara Light Switch H2 US × 4 (baby room, front door, entrance, 1st floor)
+├── Aqara Colorful Ceiling Light 36W (baby room)
+├── Aqara Smart Lock U100 (front door)
+└── IKEA STARKVIND Air Purifier (TBD)
+```
+
+## Matter Fabric Membership
+
+See [[matter-multi-fabric]] for full fabric topology and commissioning details.
+
+| Device | Protocol | HA Fabric | Apple Fabric | Google Fabric | Alexa Fabric |
+|--------|----------|-----------|--------------|---------------|--------------|
+| Aqara Hub M3 | Matter/Thread | ✓ Commissioned | † Pending | † Pending | † Pending |
+| Connect ZBT-2 | Thread OTBR | ✓ Controller | — | — | — |
+| Nest Thermostat | WiFi/Matter | ✓ nest | — | ✓ Native | — |
+| Echo Dots ×4 | WiFi/Matter | ✓ alexa\_devices | — | — | ✓ Controllers |
+| Apple TV 4K | WiFi/Matter | ✓ apple\_tv | ✓ Controller | — | — |
+
+† Not yet commissioned into this fabric.
+
+## Statistics
+
+- **IoT devices total**: 28 WiFi/wired + 10 Zigbee = **38**
+- **By type**: 22 actuators, 4 sensors, 5 cameras, 6 voice/display, 1 climate, 2 appliances
+- **By protocol**: 10 Zigbee, 25 WiFi, 2 wired, 1 Thread/Matter
+- **HA integrated**: 28 of 38 (74%)
+- **Ecosystem coverage**: HA (28), Alexa (4 Echo controllers), Google (1 Nest), Apple (1 Apple TV)
+- **Matter capable**: 6 controllers/bridges, end-device commissioning in progress
+
+## Relationships
+
+- Canonical source: [[network-device-census]]
+- Architecture overview: [[matter-multi-fabric]]
+- Operational guide: [[smart-home-handbook]]
+- Primary coordinator: [[home-assistant-connect-zbt-2]] on [[panda]]
+- Matter bridge: [[aqara-hub-m3]]
+
+## Open Tasks
+
+- [ ] Match unnamed Govee devices (192.168.30.34, .242) to specific models (TV Backlight / Shelf Light / Square Light)
+- [ ] Verify Apple TV 4K IP address and UniFi presence
+- [ ] Confirm eufy cameras integration into HA (currently no integration found)
+- [ ] Assign rooms to unassigned HS103 plugs
+- [ ] Identify "Office" wired device at 192.168.30.234
+- [ ] Add BLE iBeacon tracker documentation
--- a/homelab/concepts/matter-multi-fabric.md
+++ b/homelab/concepts/matter-multi-fabric.md
@@ -0,0 +1,197 @@
+---
+title: Matter Multi-Fabric Architecture
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [matter, thread, smart-home, iot, ecosystem, concept, hub]
+confidence: high
+sources: [UniFi Network clients, HA integrations, network-device-census]
+---
+
+# Matter Multi-Fabric Architecture
+
+> The smart home uses Matter's native multi-admin capability to unify devices across HA, Apple, Google, and Alexa ecosystems. Home Assistant is the central controller; all other ecosystems are secondary fabrics.
+
+## Why Multi-Fabric?
+
+Matter **multi-admin** allows a single device to be commissioned into multiple fabrics simultaneously:
+
+- Same lock/switch/light appears in Apple Home, Google Home, Alexa, AND Home Assistant
+- Native Matter protocol — no cloud bridges or vendor workarounds
+- Each ecosystem gets independent control; device responds to commands from any fabric
+- Most Matter devices support 4–5 simultaneous fabric memberships
+
+## Fabric Topology
+
+```
+┌───────────────────────────────────────────────────────────┐
+│                     MATTER END DEVICES                     │
+│  Aqara Zigbee devices (via M3 bridge) │ Nest Thermostat   │
+└──────┬──────────┬──────────────┬───────────┬──────────────┘
+       │          │              │           │
+ ┌─────▼───┐ ┌───▼────┐ ┌──────▼───┐ ┌─────▼──────┐
+ │ Fabric 1 │ │Fabric 2│ │ Fabric 3 │ │  Fabric 4  │
+ │   HA     │ │ Apple  │ │  Google  │ │   Alexa    │
+ │ (ZBT-2) │ │(AppleTV)│ │  (Nest)  │ │ (4× Echo)  │
+ └─────┬───┘ └───┬────┘ └────┬─────┘ └─────┬──────┘
+       │          │           │              │
+       ▼          ▼           ▼              ▼
+ ┌──────────────────────────────────────────────────────┐
+ │            Thread Network (single mesh)               │
+ │       Thread Border Routers share credentials         │
+ │  ZBT-2 (primary) │ Aqara Hub M3 │ Apple TV │ Echo    │
+ └──────────────────────────────────────────────────────┘
+```
+
+## Ecosystem Controllers
+
+### Fabric 1: Home Assistant (Primary)
+
+- **Controller**: [[home-assistant-connect-zbt-2]] on [[panda]] (HAOS)
+- **Thread role**: Primary OTBR — owns Thread network credentials
+- **Network**: `192.168.30.196` (wired), `192.168.30.12` (WiFi)
+- **Access**: `https://ha.tophermayor.com` (via Traefik on [[ubuntu]])
+- **Capabilities**: Full automation, scripts, scenes, voice pipeline, all integrations
+- **Devices seen**: Everything (central hub)
+
+### Fabric 2: Apple Home
+
+- **Controller**: Apple TV 4K gen 3 (Office, WiFi VLAN 30)
+- **Thread role**: Potential OTBR
+- **HA integration**: `apple_tv`
+- **Capabilities**: Siri voice, Home app, automations
+- **Devices**: Aqara devices via Matter multi-admin through [[aqara-hub-m3]]
+
+### Fabric 3: Google Home
+
+- **Controller**: Nest Thermostat (`192.168.30.179`, WiFi VLAN 30)
+- **HA integration**: `nest`
+- **Capabilities**: Google Assistant voice, Google Home app
+- **Devices**: Nest Thermostat (native), Aqara devices via Matter multi-admin
+- **Note**: Consider adding Nest Hub as dedicated controller + Thread BR
+
+### Fabric 4: Amazon Alexa
+
+- **Controllers**: 4× Echo Dot
+  - Office Echo (`192.168.30.150`)
+  - Kitchen Echo (`192.168.30.26`)
+  - Bedroom Echo (`192.168.30.170`)
+  - Garage Echo (`192.168.30.68`, unnamed in UniFi)
+- **HA integration**: `alexa_devices` (cloud)
+- **Capabilities**: Alexa voice, routines, "Everywhere" speaker group
+- **Thread role**: Echo Dots (gen 5) can act as Thread BRs
+
+## Hub-to-Device Mapping
+
+Which devices sit behind which hub, and how they reach each ecosystem:
+
+### Direct WiFi Devices (no hub needed)
+
+| Device | IP | HA Integration | Apple | Google | Alexa |
+|--------|-----|---------------|-------|--------|-------|
+| Nest Thermostat | 192.168.30.179 | nest | — | ✓ Native | — |
+| Office Echo | 192.168.30.150 | alexa\_devices | — | — | ✓ Native |
+| Kitchen Echo | 192.168.30.26 | alexa\_devices | — | — | ✓ Native |
+| Bedroom Echo | 192.168.30.170 | alexa\_devices | — | — | ✓ Native |
+| Garage Echo | 192.168.30.68 | alexa\_devices | — | — | ✓ Native |
+| Apple TV 4K | TBD | apple\_tv | ✓ Native | — | — |
+| Shelly 1PM (bedroom) | 192.168.30.75 | shelly | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| Shelly 1PM (office) | 192.168.30.7 | shelly | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| Govee Floor Lamp L | 192.168.30.91 | govee\_light\_local | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| Govee Floor Lamp R | 192.168.30.217 | govee\_light\_local | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| Govee unnamed ×2 | .34, .242 | govee\_light\_local | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| TP-Link HS103 ×3 | .116, .165, .210 | tplink | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| TP-Link KP115 | 192.168.30.193 | tplink | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| Levoit Purifier | 192.168.30.21 | vesync | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+| LG OLED TV | 192.168.30.79 | webostv | ‡ Bridge | ‡ Bridge | ‡ Bridge |
+
+‡ Requires HA Matter Bridge — not yet configured.
+
+### Aqara Zigbee Devices (via [[aqara-hub-m3]] Matter bridge)
+
+All Zigbee devices are managed by ZHA via [[home-assistant-connect-zbt-2]]. The Aqara Hub M3 can additionally bridge them to Apple/Google/Alexa via Matter.
+
+| Device | Location | Zigbee Parent | HA (ZHA) | Apple (M3) | Google (M3) | Alexa (M3) |
+|--------|----------|---------------|----------|------------|-------------|------------|
+| Light Switch H2 US | Baby Room | ZBT-2 | ✓ | † | † | † |
+| Light Switch H2 US | Front Door | ZBT-2 | ✓ | † | † | † |
+| Light Switch H2 US | Entrance | ZBT-2 | ✓ | † | † | † |
+| Light Switch H2 US | 1st Floor | ZBT-2 | ✓ | † | † | † |
+| Ceiling Light 36W | Baby Room | ZBT-2 | ✓ | † | † | † |
+| Smart Lock U100 | Front Door | ZBT-2 | ✓ | † | † | † |
+| Motion Sensor P1 | Living Room | ZBT-2 | ✓ | † | † | † |
+| Door/Window Sensor | Rooftop | ZBT-2 | ✓ | † | † | † |
+| Vibration Sensor T1 | Rooftop | ZBT-2 | ✓ | † | † | † |
+| STARKVIND Purifier | TBD | ZBT-2 | ✓ | † | † | † |
+
+† Pending Aqara Hub M3 Matter bridge commissioning into Apple/Google/Alexa fabrics.
+
+### Aqara WiFi Devices (direct)
+
+| Device | IP | HA Integration | Apple | Google | Alexa |
+|--------|-----|---------------|-------|--------|-------|
+| Hub M3 | 192.168.30.59 | matter, zha | † | † | † |
+| Camera Hub G3 | 192.168.30.113 | — | — | — | — |
+| Doorbell G410 | 192.168.30.118 | — | — | — | — |
+
+† Hub M3 is the bridge device — commissioning it into other fabrics exposes all bridged Zigbee devices.
+
+## Thread Border Router Strategy
+
+All border routers must join a **single Thread mesh** with matching credentials:
+
+| Border Router | Host | Status | Role |
+|---------------|------|--------|------|
+| [[home-assistant-connect-zbt-2]] OTBR | [[panda]] | ✅ Active | Primary — owns credentials |
+| [[aqara-hub-m3]] | Bedroom | ⚠️ Verify credentials match | Secondary BR |
+| Apple TV 4K gen 3 | Office | Potential OTBR | Not yet configured |
+| Echo Dot (gen 5?) | Various | Potential OTBR | Not yet configured |
+
+**Rule**: Export Thread credentials from ZBT-2 OTBR. Ensure all other BRs join same network (Network Key, PAN ID, channel).
+
+## Non-Matter Devices → HA Matter Bridge
+
+HA can expose non-Matter devices to other ecosystems via **Matter Bridge**:
+
+| Device Type | Protocol | HA Integration | Bridge Status |
+|-------------|----------|---------------|---------------|
+| Shelly 1PM Gen4 ×2 | WiFi | shelly | ⬚ Not configured |
+| Govee lights ×5 | WiFi/LAN | govee\_light\_local | ⬚ Not configured |
+| TP-Link Kasa ×4 | WiFi | tplink | ⬚ Not configured |
+| VeSync purifier | WiFi/Cloud | vesync | ⬚ Not configured |
+| LG TV ×2 | WiFi | webostv | ⬚ Not configured |
+| IKEA purifier | Zigbee | ZHA | ⬚ Not configured |
+
+## Commissioning Checklist
+
+When adding a new Matter device:
+1. Commission into **HA first** (Settings → Devices & Services → Matter → Add Device)
+2. Get multi-admin pairing code from HA device info
+3. Commission into **Apple Home** using pairing code
+4. Commission into **Google Home** using pairing code
+5. Commission into **Alexa** using pairing code
+
+For non-Matter devices:
+1. Add to HA via native integration
+2. Enable **HA Matter Bridge** in HA Settings → Matter → Bridge
+3. Commission HA Bridge into target ecosystems
+
+## Relationships
+
+- Central hub: [[panda]] running HAOS
+- Primary coordinator: [[home-assistant-connect-zbt-2]]
+- Secondary hub: [[aqara-hub-m3]]
+- Full device catalog: [[iot-device-inventory]]
+- All network clients: [[network-device-census]]
+- Operational guide: [[smart-home-handbook]]
+
+## Open Tasks
+
+- [ ] Verify Thread credentials match between ZBT-2 and Aqara Hub M3
+- [ ] Commission Aqara Hub M3 into Apple Home via Matter
+- [ ] Commission Aqara Hub M3 into Google Home via Matter
+- [ ] Commission Aqara Hub M3 into Alexa via Matter
+- [ ] Set up HA Matter Bridge for Shelly/Govee/TP-Link/VeSync/LG devices
+- [ ] Test multi-admin with Lock U100 across all 4 ecosystems
+- [ ] Consider adding Nest Hub for Google Thread BR
+- [ ] Evaluate Echo Dot Thread BR capability (gen 5 required)
--- a/homelab/concepts/media-stack.md
+++ b/homelab/concepts/media-stack.md
@@ -0,0 +1,95 @@
+---
+title: Media Automation Stack
+created: 2026-04-28
+updated: 2026-05-14
+type: concept
+tags: [concept, media, services]
+sources: [../../homelab/architecture.md]
+---
+
+# Media Automation Stack
+
+Full media automation ecosystem spanning ubuntu Docker (~25 containers) and Proxmox LXCs (CT 105–110). VPN-protected downloads, GPU-accelerated transcoding. Undergoing migration from monolithic Docker to individual LXCs (May 2026).
+
+## Download & Index
+
+| Service | URL | Purpose |
+|---------|-----|---------|
+| Prowlarr | prowlarr.local.tophermayor.com | Indexer management |
+| qBittorrent | — | Torrent client (via Gluetun VPN) |
+| SABnzbd | sabnzbd.local.tophermayor.com | Usenet downloader |
+| Gluetun | — | WireGuard VPN (NordVPN) — all media traffic routes here |
+| Flaresolverr | — | CAPTCHA solver for indexers |
+| [[decypharr]] | decypharr.local.tophermayor.com | Black hole Usenet indexer (CT 110, 192.168.50.175:8282) |
+
+## Automation
+
+| Service | Purpose |
+|---------|---------|
+| Sonarr | TV automation |
+| Sonarr Anime | Anime TV |
+| Radarr | Movie automation |
+| Radarr Anime | Anime movies |
+| Lidarr | Music automation |
+| Bazarr | Subtitle management |
+| Recyclarr | Quality profile sync |
+| LazyLibrarian | Book automation |
+| MusicSeerr | Music request system |
+
+## Media Server
+
+| Service | URL | Purpose |
+|---------|-----|---------|
+| Jellyfin | jellyfin.tophermayor.com | Media streaming (GPU transcoding) |
+| Jellyseerr | jellyseerr.tophermayor.com | Request management |
+| Stremio Server | stremio.local.tophermayor.com | Stremio streaming |
+
+## Transcoding
+
+| Service | URL | Purpose |
+|---------|-----|---------|
+| Tdarr | tdarr.local.tophermayor.com | Media transcoding (GPU via GTX 1080) |
+| Analyzarr | — | Media file analysis |
+
+## Book & Audio
+
+| Service | Purpose |
+|---------|---------|
+| Calibre | eBook management |
+| Calibre-Web | eBook reader |
+| Kavita | Manga/comic reader |
+| Audiobookshelf | Audiobook/podcast server |
+| Navidrome | Music streaming |
+
+## VPN Topology
+
+All download clients route through **Gluetun** (WireGuard/NordVPN):
+- qBittorrent → Gluetun → Internet
+- SABnzbd → Gluetun → Internet
+- Prowlarr (indexer checks) → Gluetun → Internet
+
+## LXC Migration (May 2026)
+
+Media services are migrating from monolithic Docker on ubuntu to dedicated Proxmox LXCs:
+
+| LXC | Services | IP |
+|-----|----------|-----|
+| CT 105 | media-arr (Sonarr, Radarr, Lidarr, etc.) | — |
+| CT 106 | media-request (Jellyseerr, Overseerr) | — |
+| CT 107 | media-music (Navidrome) | — |
+| CT 108 | media-reading (Kavita, Audiobookshelf) | — |
+| CT 109 | media-db (PostgreSQL) | — |
+| CT 110 | [[decypharr]] (black hole indexer) | 192.168.50.175 |
+
+**Traefik routing update:** All `*arr` service routes now point to LXC IPs instead of `gluetun:container_name` Docker DNS. Dynamic YAML files rewritten during May 14 outage recovery.
+
+**postgres-shared:** Restored on ubuntu Docker for gitea DB after migration (media DBs moved to CT 109).
+
+## Related
+
+- [[jellyfin]] — Media server entity
+- [[ubuntu]] — Hosts Docker portion of stack with GTX 1080
+- [[proxmox]] — Hosts LXC portion (CT 105–110)
+- [[decypharr]] — Black hole indexer (CT 110)
+- [[nfs-storage]] — Media stored on TrueNAS NFS
+- [[traefik-ha]] — Ingress routing for media services
--- a/homelab/concepts/monitoring-pipeline.md
+++ b/homelab/concepts/monitoring-pipeline.md
@@ -0,0 +1,101 @@
+---
+title: Monitoring Pipeline
+created: 2026-04-28
+updated: 2026-04-29
+type: concept
+tags: [concept, monitoring, alerting, docker]
+sources: [../../homelab/architecture.md]
+---
+
+# Monitoring Pipeline
+
+Prometheus-based monitoring with Loki log aggregation, Grafana dashboards, and Telegram alerting via Hermes Gateway watchdog. All monitoring services run on [[ubuntu]].
+
+## Metrics Pipeline
+
+```
+Node Exporters (all hosts: ubuntu, grizzley, ice, proxmox, truenas, panda)
+    → Prometheus (ubuntu:9090)
+    → Grafana (ubuntu:3000)
+    → Alertmanager (ubuntu:9093)
+    → Hermes Gateway webhook
+    → Telegram (@AigentZeroHermes)
+```
+
+**Alert routing:**
+- Alertmanager receives Prometheus alerts
+- Routes to Hermes Gateway webhook (POST to gateway endpoint)
+- Gateway sends Telegram to: topic 1033 "Cron Jobs" in AigentZeroHermes (-1003820156994)
+- Bot token: `836803270:AAH-Ac5Y`
+
+## Log Pipeline
+
+```
+Docker containers (all hosts)
+    → Promtail (Docker socket service discovery)
+    → Loki (ubuntu:3100)
+    → Grafana dashboards
+```
+
+Promtail runs as a Docker container on [[ubuntu]], reading container logs via the Docker socket.
+
+## Scrape Targets
+
+Prometheus monitors: ubuntu (local), proxmox, truenas, grizzley, ice, panda.
+
+Scrape endpoints:
+- `prometheus` (9090) — Prometheus itself
+- `node-exporter` (9100) — host hardware metrics
+- `blackbox-exporter` (9115) — HTTP/TCP/ICMP probing
+- `cadvisor` (8080) — container metrics
+- `loki` (3100) — log metrics
+- Traefik instances (8080/metrics)
+
+## Blackbox Exporter Targets
+
+15+ HTTPS probe targets configured. See `homelab/ubuntu/docker/monitoring/` for the blackbox exporter config.
+
+## Alert Rules
+
+Prometheus alert rules → Alertmanager → Hermes Gateway → Telegram.
+
+Key alerts:
+- `ContainerLogError` — Container logging errors detected by Promtail
+- `ServiceDown` — Blackbox-probed service unavailable
+- `JellyfinDown` — Jellyfin health check failed
+- `TraefikDown` — Traefik not responding
+
+See [[homelab-servicedown-triage]] and [[homelab-containerlogerror-triage]] skills for triage procedures.
+
+## Hermes Gateway Watchdog
+
+Hermes Gateway is monitored by a watchdog script on both [[ice]] and [[grizzley]]:
+
+```
+/home/bear/hermes-gateway-watchdog.sh
+```
+
+Runs via **system cron** (not systemd user service) on both hosts:
+1. Checks if hermes-gateway is responsive
+2. On failure: direct restart → tmux+OpenCode rescue if still down
+3. Sends Telegram notification on failure to topic 1033 "Cron Jobs" (bot: `836803270:AAH-Ac5Y`)
+
+**Note:** On [[grizzley]], the systemd override for the watchdog is deployed directly to `/etc/systemd/system/` (not tracked in the homelab repo — it's a system unit).
+
+## External Uptime Monitoring
+
+- **Uptime Kuma** (grizzley:3001) — external/internal availability checks
+- **Blackbox Exporter** (ubuntu:9115) — 15+ HTTPS probe targets
+
+## Dashboards
+
+- Grafana (ubuntu:3000) — metrics dashboards
+- Loki + Grafana — log exploration
+- Prometheus (ubuntu:9090) — expression browser, alertmanager
+
+## Related
+
+- [[ubuntu]] — Hosts Prometheus, Grafana, Loki, Alertmanager
+- [[grizzley]] — Hosts Hermes Agent, Telegram webhook, Uptime Kuma
+- [[hermes-gateway]] — AI gateway with watchdog pattern
+- [[traefik]] — Traefik metrics
--- a/homelab/concepts/network-device-census.md
+++ b/homelab/concepts/network-device-census.md
@@ -0,0 +1,193 @@
+---
+title: Network Device Census
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [iot, smart-home, concept, inventory]
+sources: [raw/inventories/unifi-clients-2026-05-10.md, raw/inventories/ha-device-registry-2026-05-10.md, raw/inventories/arp-neighbors-2026-05-10.md]
+confidence: high
+---
+
+# Network Device Census
+
+> Canonical classification of every device on the network.
+> Cross-referenced from UniFi controller (46 clients), HA device registry (61 devices), and ARP tables.
+> Updated: 2026-05-10 | Sources: `raw/inventories/unifi-clients-2026-05-10.md`, `raw/inventories/ha-device-registry-2026-05-10.md`
+
+## Classification Key
+
+- **iot-smart-home** — Smart home actuator/sensor/hub managed by [[panda]]
+- **iot-appliance** — Smart appliance with HA integration
+- **iot-camera** — Security/monitoring camera
+- **iot-infra** — Infrastructure device with HA integration
+- **infrastructure** — Core network/server hardware (not IoT)
+- **personal** — Personal device (phone, laptop, watch, tablet)
+- **unidentified** — Unknown device, needs investigation
+
+## VLAN Map
+
+- **VLAN 10** "Family of D." — Personal devices
+- **VLAN 20** "Will of D. (Guest)" — Guest network
+- **VLAN 30** "Will of D. IoT" — IoT devices + infra with .30 IPs
+- **VLAN 50** "Production" — Server infrastructure
+- **Default** — Switch management
+
+---
+
+## iot-smart-home (18 devices)
+
+### Hubs & Coordinators
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| homeassistant | 192.168.30.196 | e4:5f:01:5d:ca:06 | 30 | WiFi | HA Core (self) | — | ALL | [[panda]] RPi HAOS host |
+| homeassistant | 192.168.30.12 | 98:17:3c:60:45:d8 | 30 | WiFi | — | — | — | Duplicate HA entry? Same hostname, different MAC |
+| Aqara-Hub-M3-9C5B | 192.168.30.59 | 18:c2:3c:59:9e:c1 | 30 | WiFi | [[matter]] | Bedroom | Apple, Google, Alexa, HA | [[aqara-hub-m3]] Matter bridge |
+| home-assistant-voice-0abc82 | 192.168.30.25 | 20:f8:3b:0a:bc:82 | 30 | WiFi | ESPHome | Office | HA | [[panda]] Voice PE |
+
+### Lighting & Switches
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| shelly1pmg4-a085e3bb2898 | 192.168.30.7 | a0:85:e3:bb:28:98 | 30 | WiFi | Shelly | Bedroom | HA, Alexa | Bedroom ceiling light relay |
+| shelly1pmg4-a085e3b7fc74 | 192.168.30.75 | a0:85:e3:b7:fc:74 | 30 | WiFi | Shelly | Office | HA, Alexa | Office ceiling light relay |
+| Govee Floor Lamp Left | 192.168.30.91 | 98:17:3c:15:93:38 | 30 | WiFi/BLE | Govee Local | Living Room | HA | H6076 TV backlight #1 |
+| Govee Floor Lamp R | 192.168.30.217 | d0:c9:07:f6:5b:ea | 30 | WiFi/BLE | Govee Local | Living Room | HA | H6076 TV backlight #2 |
+| (unnamed) | 192.168.30.34 | 98:17:3c:4c:bd:aa | 30 | WiFi/BLE | Govee Local | Living Room | HA | H60A4 shelf/ambient strip |
+| (unnamed) | 192.168.30.242 | 98:17:3c:38:8f:e2 | 30 | WiFi/BLE | Govee Local | Bedroom | HA | H60A1 bedroom LED strip |
+| HS103 | 192.168.30.116 | 34:60:f9:23:c4:57 | 30 | WiFi | TP-Link | Bedroom | HA, Alexa | Left Lamp plug |
+| HS103 | 192.168.30.210 | 34:60:f9:23:c4:b5 | 30 | WiFi | TP-Link | Bedroom | HA, Alexa | Right Lamp plug |
+| HS103 | 192.168.30.165 | 34:60:f9:23:c4:88 | 30 | WiFi | TP-Link | Office | HA, Alexa | Grizzley host power (rename!) |
+| KP115 | 192.168.30.193 | 00:5f:67:96:47:eb | 30 | WiFi | TP-Link | Living Room | HA, Alexa | Tall Lamp plug |
+
+### Sensors, Locks & Doorbell
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| 09AA01AC171702RL | 192.168.30.179 | 18:b4:30:c2:d2:c0 | 30 | Thread/Matter | [[matter]] | Hall (3rd floor) | HA, Google | Nest Thermostat |
+| Camera-Hub-G3-1180 | 192.168.30.113 | 54:ef:44:7a:11:80 | 30 | Zigbee→Matter | [[matter]] | Garage | HA | Aqara Camera Hub G3 |
+| Doorbell | 192.168.30.118 | 54:ef:44:8b:c1:da | 30 | Zigbee→Matter | [[matter]] | Entrance | HA | Aqara Video Doorbell G410 |
+
+### Voice Assistants
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| Bedroom Echo | 192.168.30.170 | 7c:d5:66:fe:94:bc | 30 | WiFi | Alexa | Bedroom | Alexa, HA | Echo Dot |
+| Kitchen Echo | 192.168.30.26 | 0c:ee:99:09:a7:2f | 30 | WiFi | Alexa | Kitchen | Alexa, HA | Echo Dot |
+| Office Echo | 192.168.30.150 | 14:91:38:83:a4:cd | 30 | WiFi | Alexa | Office | Alexa, HA | Echo Dot |
+| (unnamed) | 192.168.30.68 | 18:74:2e:d9:d7:28 | 30 | WiFi | Alexa | Living Room | Alexa, HA | 2nd Floor Echo Dot |
+
+### Non-Networked Zigbee/Thread Devices (via [[home-assistant-connect-zbt-2]])
+
+These devices don't appear in UniFi (no IP) but are in HA via ZHA/Matter:
+
+| HA Device | Area | Protocol | Integration | Hub |
+|-----------|------|----------|-------------|-----|
+| Aqara Light Switch H2 US (Baby Room) | Baby Room | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Light Switch H2 US (Front Door) | Entrance | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Light Switch H2 US (Entrance) | Entrance | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Light Switch H2 US (1st Floor) | — | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Colorful Ceiling Light 36W | Baby Room | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Door and Window Sensor | Rooftop | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Vibration Sensor T1 | Rooftop | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Motion Sensor P1 | Living Room | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| Aqara Smart Lock U100 | Entrance | Zigbee→Matter | [[matter]] via M3 | [[aqara-hub-m3]] |
+| IKEA STARKVIND Air Purifier | Office | Zigbee | ZHA | [[home-assistant-connect-zbt-2]] |
+
+---
+
+## iot-appliance (2 devices)
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| Levoit-purifier | 192.168.30.21 | cc:ba:97:b7:3d:0c | 30 | WiFi | VeSync | Kitchen | HA | Vital 200S air purifier |
+| eufyOmniC20 | 192.168.30.50 | 4c:37:de:56:41:1b | 30 | WiFi | — | — | — | Eufy robot vacuum, no HA integration yet |
+
+---
+
+## iot-camera (3 devices)
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| eufy_Baby_Camera | 192.168.10.110 | 90:bf:d9:ce:8c:e0 | 10 | WiFi | — | — | — | Eufy baby cam on Family VLAN |
+| eufy_Baby_Camera | 192.168.10.113 | 90:bf:d9:84:a1:48 | 10 | WiFi | — | — | — | Second Eufy baby cam |
+| eufy_Baby_Monitor | 192.168.10.120 | 90:bf:d9:55:63:de | 10 | WiFi | — | — | — | Eufy baby monitor hub |
+
+---
+
+## iot-infra (5 devices)
+
+| Hostname | IP | MAC | VLAN | Protocol | HA Integration | Area | Ecosystems | Notes |
+|----------|-----|-----|------|----------|---------------|------|------------|-------|
+| Office | 192.168.30.234 | c4:f7:c1:2b:fc:89 | 30 | WiFi | Apple TV | Office | Apple Home, HA | Apple TV 4K gen 3 — Matter controller |
+| LGwebOSTV | 192.168.30.79 | 60:45:e8:7f:c2:1a | 30 | WiFi | webOS TV | Living Room | HA, Alexa, AirPlay | LG OLED65C5AUA |
+| Rest2ndGen-62CEEE | 192.168.30.177 | ec:e3:34:62:ce:ec | 30 | WiFi | — | — | — | Withings Sleep mat, possible HA integration |
+| sky0008606C | 192.168.30.161 | 60:8a:10:e6:86:6c | 30 | WiFi | — | — | — | Somfy / blinds device? Microchip OUI |
+| (unnamed iPhone) | 192.168.20.190 | 00:22:f2:06:60:b3 | 20 | WiFi | — | — | — | SunPower OUI — solar panel monitor? |
+
+---
+
+## infrastructure (6 devices)
+
+| Hostname | IP | MAC | VLAN | Protocol | Role | Notes |
+|----------|-----|-----|------|----------|------|-------|
+| grizzley | 192.168.30.84 | 2c:cf:67:38:8b:c8 | 30 | Wired | Edge ingress RPi5 | Also .50.84 on Production VLAN |
+| ubuntu | 192.168.30.61 | bc:24:11:16:a9:e2 | 30 | Wired | Primary Docker host | Also .50.61 on Production VLAN |
+| Ice | 192.168.30.197 | e4:5f:01:29:cb:c5 | 30 | Wired | Control plane RPi4 | Also .50.197 on Production VLAN |
+| Truenas Virtual NIC | 192.168.50.12 | bc:24:11:32:a5:82 | 50 | Wired | TrueNAS NAS | [[truenas]] on Proxmox |
+| truenas | 192.168.50.11 | 3c:7c:3f:23:5c:c5 | 30 | Wired | TrueNAS physical | Also .50.12 virtual |
+| TL-SG108PE | 192.168.1.92 | 34:60:f9:2e:bc:bf | — | Wired | TP-Link managed switch | 8-port PoE, IoT VLAN trunk |
+
+---
+
+## personal (7 devices)
+
+| Hostname | IP | MAC | VLAN | Connection | OUI | Notes |
+|----------|-----|-----|------|------------|-----|-------|
+| iPhone | 192.168.10.151 | 22:b7:b2:b4:88:ab | 10 | WiFi | — | TophPhone14 (HA mobile app) |
+| iPhone | 192.168.10.158 | 22:0a:9d:c7:ea:1a | 10 | WiFi | — | Second iPhone |
+| iPhone | 192.168.10.133 | d2:46:b3:46:4c:84 | 10 | WiFi | — | Third iPhone (private Wi-Fi MAC) |
+| iPad | 192.168.10.116 | 3a:a3:c7:47:df:de | 10 | WiFi | — | Family iPad |
+| Watch | 192.168.10.150 | ca:df:bd:1b:75:7e | 10 | WiFi | — | Apple Watch |
+| Mac | 192.168.10.125 | 76:4f:65:d6:e2:1a | 10 | WiFi | — | MacBook |
+| ice | 192.168.10.178 | e4:5f:01:29:cb:c7 | 10 | WiFi | RPi | Ice on Family VLAN (WiFi) |
+
+---
+
+## unidentified (3 devices)
+
+| Hostname | IP | MAC | VLAN | Connection | OUI | Notes |
+|----------|-----|-----|------|------------|-----|-------|
+| HYTERevolt | 192.168.1.143 | 74:56:3c:ba:a9:6d | — | Wired | Giga-Byte | Gaming PC? On Default VLAN |
+| VectorPro | 192.168.1.77 | b0:25:aa:48:53:5a | — | Wired | Private | Unknown wired device, Default VLAN |
+| Caesar's Aivo Connect | — | — | — | WiFi | Alexa | iottie car mount, Alexa integration only |
+
+---
+
+## Statistics
+
+| Classification | Count | % of Network |
+|---------------|-------|-------------|
+| iot-smart-home | 18+10 non-net | 39% |
+| iot-appliance | 2 | 4% |
+| iot-camera | 3 | 7% |
+| iot-infra | 5 | 11% |
+| infrastructure | 6 | 13% |
+| personal | 7 | 15% |
+| unidentified | 3 | 7% |
+
+## Open Questions
+
+- ~~**98:17:3c:60:45:d8** — Likely a TrueNAS IP, not HA. Confirmed panda is only at .30.196. Stale DHCP lease or old reservation.~~ ✅ Resolved 2026-05-10
+- **sky0008606C** — AMWAY smart air filter (Microchip Technology OUI, .30.161). Not in HA — consider adding integration if available.
+- **00:22:f2:06:60:b3** — Solar panel monitor (SunPower OUI) on Guest VLAN 20. Verify if this should be on IoT VLAN 30 or if Guest is intentional for internet-only reporting.
+- **3 Eufy baby cameras** on VLAN 10 (Family) — intentional for phone accessibility. Correct placement; VLAN 30 would require firewall rules for VLAN 10→30 Eufy traffic.
+- **Aqara Light Switch H2 US** — 5 switches confirmed: 1st Floor (1), 2nd Floor (2), 3rd Floor (2: Baby Room + Hallway Area). Two via_device paths suggest some are paired via ZHA and some via Aqara Hub M3 Matter bridge.
+
+## Related Pages
+
+- [[iot-device-inventory]] — IoT-only view grouped by room
+- [[matter-multi-fabric]] — Matter fabric membership and hub-to-device mapping
+- [[smart-home-handbook]] — Operational handbook
+- [[home-assistant-connect-zbt-2]] — Zigbee/Thread coordinator details
+- [[aqara-hub-m3]] — Aqara Matter hub details
--- a/homelab/concepts/nfs-storage.md
+++ b/homelab/concepts/nfs-storage.md
@@ -0,0 +1,66 @@
+---
+title: NFS Storage Strategy
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, storage, nas]
+sources: [../../homelab/architecture.md, ../../ai-assistant/workflows.md]
+---
+
+# NFS Storage Strategy
+
+TrueNAS NFS shares are used for user-uploaded data and media. Configs and databases stay on local VM disk.
+
+## Storage Hierarchy
+
+```
+TrueNAS (192.168.50.12)
+├── ZFS Pool "TrueNAS" (25.4TB, 65% used)
+│   ├── /mnt/truenas/mediadata/     ← Movies, TV, Music
+│   ├── /mnt/truenas/traefik-certs/ ← TLS certificates (NFS to grizzley)
+│   └── /mnt/truenas-backup/        ← Application backups
+└── ZFS Pool "RPiPool" (10.9TB, 5% used)
+    └── /mnt/rpipooldata/            ← Reserve storage
+
+PersonalMediaLibrary (separate NFS)
+└── /mnt/PersonalMediaLibrary/      ← Immich external library (photos)
+```
+
+## Mount Rules
+
+| Data Type | Storage Location | Example |
+|-----------|-----------------|---------|
+| User uploads (photos, media) | NFS (TrueNAS) | Immich photos, Jellyfin library |
+| App configs | VM local disk | docker-compose.yml, config/ |
+| Databases | VM local (postgres-shared) | PostgreSQL, Redis |
+| Media library | NFS (TrueNAS) | Movies, TV, Music |
+| Backups | NFS (TrueNAS) | Application backups |
+| TLS certificates | NFS (TrueNAS) | Wildcard certs synced to grizzley |
+
+## NFS Exports
+
+| Export | Mounted On | Consumer |
+|--------|-----------|---------|
+| `/mnt/truenas/mediadata` | `/mnt/truenas/mediadata` on ubuntu | Jellyfin, *Arrs, Immich uploads |
+| `/mnt/PersonalMediaLibrary` | `/mnt/PersonalMediaLibrary` on ubuntu | Immich external library |
+| `/mnt/truenas/traefik-certs/grizzley` | NFS on grizzley | Traefik TLS certificates |
+
+## NFS Mount Checklist
+
+Before using an NFS path in docker-compose, verify it exists in `/etc/fstab`:
+
+```bash
+cat /etc/fstab | grep nfs
+```
+
+## Known Issues
+
+- **Pool corruption** — TrueNAS pool has known corruption issues (as of 2026-04-28). Monitor `truenas` entity page.
+- **rustfs ignores env vars** — S3 object storage ignores environment variables on first boot. See [[rustfs]].
+
+## Related
+
+- [[truenas]] — TrueNAS NAS entity
+- [[ubuntu]] — Ubuntu host with NFS mounts
+- [[jellyfin]] — Media server using NFS
+- [[vm-storage-policy]] — VM Storage Policy with full mount rules
--- a/homelab/concepts/opencode-cluster.md
+++ b/homelab/concepts/opencode-cluster.md
@@ -0,0 +1,73 @@
+---
+title: OpenCode Cluster
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, ai, services]
+sources: [../../homelab/docs/opencode-cluster.md, ../../ai-assistant/host-context.md]
+---
+
+# OpenCode Cluster
+
+OpenCode AI coding assistant deployed as systemd services across the homelab cluster, accessible via Traefik-routed HTTPS endpoints.
+
+## Instances
+
+| Instance | Host | IP | Port | Traefik Route | Status |
+|----------|------|-----|------|---------------|--------|
+| ubuntu | Ubuntu VM | 192.168.50.61 | 4096 | opencode.tophermayor.com | Active/Enabled |
+| ice | Raspberry Pi 4 | 192.168.50.197 | 4096 | opencode-ice.tophermayor.com | Active/Enabled |
+| grizzley | Raspberry Pi 5 | 192.168.50.84 | 4096 | — | Inactive/Disabled |
+
+## Service Management
+
+All instances run as `opencode-web.service` via systemd:
+
+```bash
+# Check status
+systemctl status opencode-web
+
+# Restart
+sudo systemctl restart opencode-web
+
+# View logs
+journalctl -u opencode-web -f
+```
+
+## Shared Infrastructure
+
+- **Qdrant** (192.168.50.61:6333) — Shared vector memory backend for OpenCode cluster
+- **Ollama** (192.168.50.61:11434) — Local embedding generation
+
+## Configuration
+
+Per-host config files in `homelab/<host>/opencode/`:
+- `opencode.json` — Main OpenCode configuration
+- `oh-my-opencode.json` — Framework configuration
+
+## Traefik Routing
+
+OpenCode instances use dedicated Traefik middlewares:
+- `local-only@file` — IP whitelist
+- `opencode-streaming@file` — SSE support
+- `opencode-cors@file` — CORS headers
+
+## Agent Context Detection
+
+Each OpenCode instance detects its host context via:
+- `.opencode/opencode.json` init file
+- Environment variables (`HOST_CONTEXT`, `WIKI_PATH`)
+- `detect_host_context.py` script
+
+See [[host-context-detection]] for full detection table.
+
+## Wiki Integration
+
+All OpenCode instances have `WIKI_PATH=/home/bear/homelabagentroot/obsidian-vault` in their environment, enabling them to read and write to the shared wiki.
+
+## Related
+
+- [[ice]] — RPi4 control plane running OpenCode
+- [[ubuntu]] — Primary host running OpenCode
+- [[host-context-detection]] — Per-host agent detection
+- [[vm-storage-policy]] — AI assistant workflows
--- a/homelab/concepts/smart-home-handbook.md
+++ b/homelab/concepts/smart-home-handbook.md
@@ -0,0 +1,108 @@
+---
+title: Smart Home Handbook
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [smart-home, iot, home-assistant, matter, concept, runbook]
+confidence: high
+---
+
+# Smart Home Handbook
+
+> Operational overview for the homelab smart home. Canonical orientation page linking to all smart home entities and concepts.
+
+## Architecture Summary
+
+The smart home is built around **Home Assistant** on [[panda]] as the central automation hub, with Matter multi-fabric providing cross-ecosystem access to devices.
+
+```
+┌─────────────────────────────────────────────────────┐
+│                   USER INTERFACES                    │
+│   HA UI │ Apple Home │ Google Home │ Alexa │ Voice   │
+├─────────────────────────────────────────────────────┤
+│              HOME ASSISTANT (panda)                   │
+│   Automations │ Scripts │ Scenes │ Dashboards        │
+├──────────┬──────────┬──────────┬──────────┬─────────┤
+│  ZHA     │  Matter  │  Cloud   │  Local   │ ESPHome │
+│ Zigbee   │  Thread  │  APIs    │  LAN     │ BLE/Voice│
+├──────────┴──────────┴──────────┴──────────┴─────────┤
+│                   DEVICES (~35)                      │
+│  Aqara │ Govee │ Shelly │ TP-Link │ IKEA │ Echo     │
+│  Apple TV │ LG TV │ Nest │ VeSync │ Aivo            │
+└─────────────────────────────────────────────────────┘
+```
+
+## Key Entities
+
+| Entity | Role | Page |
+|--------|------|------|
+| [[panda]] | HA host (RPi, HAOS) | [[panda]] |
+| [[home-assistant-connect-zbt-2]] | Zigbee + Thread coordinator | [[home-assistant-connect-zbt-2]] |
+| [[aqara-hub-m3]] | Aqara Matter bridge + Zigbee hub | [[aqara-hub-m3]] |
+
+## Key Concepts
+
+| Concept | Description | Page |
+|---------|-------------|------|
+| Matter Multi-Fabric | Cross-ecosystem device sharing | [[matter-multi-fabric]] |
+| IoT Device Inventory | Complete device catalog | [[iot-device-inventory]] |
+
+## Quick Reference
+
+### Accessing Home Assistant
+- **Web UI**: `https://ha.tophermayor.com`
+- **SSH**: `ssh bear@192.168.30.196` (password auth)
+- **API**: `http://192.168.30.196:8123/api/` (requires bearer token)
+- **Traefik**: Routed from both [[ubuntu]] and [[grizzley]]
+
+### Adding a New Matter Device
+1. Open HA → Settings → Devices & Services → Matter → Add Device
+2. Follow pairing flow using QR code or numeric code
+3. Once in HA, use multi-admin pairing code to add to Apple/Google/Alexa
+4. See [[matter-multi-fabric]] for full commissioning flow
+
+### Adding a Non-Matter Device
+1. Add to HA via native integration (Zigbee, Wi-Fi, cloud)
+2. If needed in other ecosystems, enable HA Matter Bridge
+3. Commission the bridge into target ecosystem
+4. See [[matter-multi-fabric]] → Non-Matter Devices section
+
+### Troubleshooting
+
+| Problem | Solution |
+|---------|----------|
+| Device not responding | Check VLAN 30 connectivity, verify device power |
+| Zigbee device offline | Check ZHA → Settings → Network → visualization for mesh health |
+| Thread device not connecting | Verify Thread credentials match across all border routers |
+| HA SSH access denied | Add SSH key to Advanced SSH add-on config via HA web UI |
+| Matter multi-admin fails | Check device's fabric limit (some only support 2-3) |
+| Govee lights won't pair | Ensure on same VLAN 30, use govee_light_local integration |
+
+### Voice Pipeline
+
+```
+openWakeWord → Whisper (STT) → HA Assist (intent) → Piper (TTS)
+```
+
+- **Wake word**: "Hey Jarvis" (configurable via openWakeWord)
+- **Hardware**: Home Assistant Voice PE (ESPHome)
+- **Fallback**: Echo Dots → Alexa, Apple TV → Siri
+
+### Network Placement
+
+All IoT devices sit on **VLAN 30 (IoT subnet 192.168.30.0/24)**:
+- [[panda]] has dual-homed: 192.168.30.196 (IoT) + 192.168.50.196 (Servers)
+- Physical path: UGC Ultra Port 2 → TP-Link SG108PE trunk
+- Firewall: IoT VLAN is isolated from Server and Family VLANs
+- Management: Access HA via Traefik reverse proxy from any VLAN
+
+## Improvement Opportunities
+
+- [ ] Add grizzley SSH key to panda's SSH add-on for agent automation
+- [ ] Verify unified Thread credentials across all border routers
+- [ ] Set up HA Matter Bridge to expose non-Matter devices to Apple/Google/Alexa
+- [ ] Commission Aqara Hub M3 into Apple Home and Google Home fabrics
+- [ ] Consider ESP32 Bluetooth proxies for improved BLE coverage
+- [ ] Evaluate moving panda's primary IP to VLAN 50 for easier management
+- [ ] Add Nest Hub as Google Thread Border Router
+- [ ] Document automations and scenes in a dedicated wiki page
--- a/homelab/concepts/smart-home.md
+++ b/homelab/concepts/smart-home.md
@@ -0,0 +1,74 @@
+---
+title: Smart Home
+created: 2026-05-10
+updated: 2026-05-10
+type: concept
+tags: [smart-home, iot, concept, home-assistant, matter, moc]
+aliases: [IoT, Smart Home, Home Automation]
+confidence: high
+---
+
+# 🏠 Smart Home
+
+> Start here for everything smart home. All IoT devices, ecosystems, and automation documentation linked from this page.
+
+## Architecture at a Glance
+
+- **Central hub**: [[panda]] running Home Assistant OS (RPi, IoT VLAN 30)
+- **Zigbee/Thread coordinator**: [[home-assistant-connect-zbt-2]] (Connect ZBT-2 dongle)
+- **Matter bridge**: [[aqara-hub-m3]] (bridges Zigbee devices to Apple/Google/Alexa)
+- **Voice pipeline**: Whisper (STT) → Piper (TTS) → openWakeWord on [[panda]]
+- **38 IoT devices** across 12 rooms, 3 floors
+
+## Quick Navigation
+
+### 📋 Inventories
+- **[[network-device-census]]** — Every device on the network, classified
+- **[[iot-device-inventory]]** — IoT devices by room with protocol details
+- **[[device-placement-policy]]** — Which VLAN each device class belongs on
+
+### 🔗 Ecosystems
+- **[[matter-multi-fabric]]** — How devices are shared across HA / Apple / Google / Alexa
+- **[[smart-home-handbook]]** — Operational guide (access, troubleshooting, improvements)
+
+### 🖥️ Hardware
+- **[[panda]]** — HA host (RPi, HAOS, dual-homed)
+- **[[home-assistant-connect-zbt-2]]** — Zigbee + Thread coordinator
+- **[[aqara-hub-m3]]** — Aqara Matter hub/bridge
+
+## Ecosystem Controllers
+
+| Ecosystem | Controller | Location | Protocol |
+|-----------|-----------|----------|----------|
+| Home Assistant | [[panda]] + Connect ZBT-2 | Office | Matter/Thread/Zigbee |
+| Apple Home | Apple TV 4K gen 3 | Office | Matter |
+| Google Home | Nest Thermostat | Hall (3rd) | WiFi/Matter |
+| Amazon Alexa | 4× Echo Dot | Office/Kitchen/Bedroom/Garage | Matter |
+
+## Devices by Floor
+
+### 1st Floor (Office, Entrance, Garage)
+- Apple TV 4K, Office Echo, Shelly 1PM (office light)
+- Aqara Lock U100, Doorbell G410, Light Switches (×2)
+- Camera Hub G3, Garage Echo
+
+### 2nd Floor (Living Room, Kitchen, Dining)
+- LG OLED TV, Kitchen Echo, KP115 (tall lamp)
+- Aqara Motion Sensor P1, IKEA STARKVIND purifier
+- Govee lights (×3), Levoit Vital 200S purifier
+
+### 3rd Floor (Bedroom, Baby Room, Hall, Laundry)
+- Aqara Hub M3, Bedroom Echo, Shelly 1PM (bedroom light)
+- Aqara Light Switches (Baby Room + Hallway)
+- Aqara Ceiling Light 36W, Govee LED strip
+- Nest Thermostat, HA Voice PE
+
+### Rooftop
+- Aqara Door/Window Sensor, Aqara Vibration Sensor T1
+
+## Open Tasks
+- [ ] Commission Aqara Hub M3 into Apple Home
+- [ ] Commission Aqara Hub M3 into Google Home
+- [ ] Commission Aqara Hub M3 into Alexa
+- [ ] Set up HA Matter Bridge for WiFi devices
+- [ ] Verify Thread credentials match across all border routers
--- a/homelab/concepts/sso-authentik.md
+++ b/homelab/concepts/sso-authentik.md
@@ -0,0 +1,62 @@
+---
+title: SSO with Authentik
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, sso, services]
+sources: [../../homelab/architecture.md, ../../platform-config/overview.md]
+---
+
+# SSO with Authentik
+
+Authentik provides SSO identity provider for the homelab via OAuth2/OIDC. Traefik middleware enforces authentication on internal services.
+
+## Architecture
+
+```
+User → Service (protected by authentik-auth middleware)
+              ↓
+       Traefik middleware
+              ↓
+       Authentik Server (ubuntu)
+       auth.tophermayor.com
+              ↓
+       OAuth2/OIDC flow
+              ↓
+       Redirect with token
+```
+
+## Services Using SSO
+
+| Service | URL | SSO Method |
+|---------|-----|-----------|
+| Authentik | auth.tophermayor.com | Direct |
+| Jellyfin | jellyfin.tophermayor.com | Authentik OAuth2 |
+| Immich | immich.tophermayor.com | Authentik OAuth2 |
+| Traefik Dashboard | traefik.local.tophermayor.com | local-only middleware |
+
+## Authentik Components
+
+| Component | Description |
+|-----------|-------------|
+| Authentik Server | Main SSO application (ubuntu) |
+| Authentik Worker | Background task processing |
+| Authentik Redis | Session caching |
+
+## Database
+
+Authentik uses the `postgres-shared` PostgreSQL instance on ubuntu (`authentik` database).
+
+## Traefik Middleware
+
+```
+authentik-auth@file
+```
+
+Applied to services that need SSO. Users are redirected to Authentik login, then back with a valid session cookie.
+
+## Related
+
+- [[authentik]] — Authentik entity page
+- [[ubuntu]] — Hosts Authentik server
+- [[docker-traefik-stack]] — Docker, Traefik, and container orchestration
--- a/homelab/concepts/subscriptions.md
+++ b/homelab/concepts/subscriptions.md
@@ -0,0 +1,110 @@
+---
+title: Subscriptions & Paid Services
+created: 2026-05-24
+updated: 2026-05-24
+type: concept
+tags: [services, infrastructure, billing]
+confidence: high
+---
+
+# Subscriptions & Paid Services
+
+## Overview
+
+Comprehensive catalog of all paid subscriptions — both self-hosted services (infrastructure Chris pays for) and external SaaS/cloud services.
+
+---
+
+## External Subscriptions (Paid Services)
+
+### Cloud Infrastructure
+
+| Service | Cost | Purpose | Payment Method |
+|---------|------|---------|----------------|
+| **Cloudflare** | ~$20/mo | DNS + proxy + TLS certs for `*.tophermayor.com` | Credit card |
+| **Backblaze B2** | ~$7/mo | Off-site backup storage (Cold tier, ~2TB) | Credit card |
+
+### VPN
+
+| Service | Cost | Purpose | Payment Method |
+|---------|------|---------|----------------|
+| **NordVPN** | ~$12/mo | WireGuard tunnel for media stack downloads | Credit card |
+
+### Development Tools
+
+| Service | Cost | Purpose | Payment Method |
+|---------|------|---------|----------------|
+| **GitHub** | ~$4/mo | Private repos (copilot, actions) | GitHub billing |
+| **Obsidian Sync** | ~$8/mo | Vault sync across devices | Obsidian account |
+
+### Historical / Retired
+
+| Service | Cost | Purpose | Status |
+|---------|------|---------|--------|
+| **Tailnet (Tailscale)** | ~$5/mo/person | VPN mesh for outside players to reach Bedrock servers | Active for Bedrock sharing only |
+| **Backblaze Personal** | — | Decommissioned — B2 replaced this | Retired |
+| **Google Workspace** | — | Decommissioned — moved to self-hosted | Retired |
+
+---
+
+## Self-Hosted Services (Infrastructure You Pay For)
+
+These are services Chris runs on homelab hardware. The "cost" is the hardware + power + internet, not a subscription fee.
+
+### Primary Infrastructure Hosts
+
+| Host | Hardware | Cost Basis | Role |
+|------|----------|-----------|------|
+| **ubuntu** (Proxmox VM) | Intel NUC or similar | Power + hardware amortized | ~70 containers: Traefik, media stack, Gitea, monitoring |
+| **grizzley** | Raspberry Pi 5 | ~$150 one-time + power | Edge ingress, Traefik ACME, Minecraft Bedrock, Hermes |
+| **ice** | Raspberry Pi 4 | ~$100 one-time + power | OpenCode control node, Hermes gateway |
+| **pve** (Proxmox) | Bare metal | ~$800 one-time + power | Hypervisor for ubuntu VM + TrueNAS VM |
+| **truenas** | TrueNAS SCALE VM | Runs on pve | 36TB raw storage (ZFS), NFS exports |
+
+### Self-Hosted Services (No Subscription Fee)
+
+All of these run on homelab hardware — no per-service license fee:
+
+| Service | Host | URL | Purpose |
+|---------|------|-----|---------|
+| **Traefik** | ubuntu + grizzley | `traefik.local.tophermayor.com` | Reverse proxy / ingress |
+| **Authentik** | ubuntu | `auth.tophermayor.com` | SSO identity provider |
+| **Gitea** | ubuntu | `gitea.tophermayor.com` | Private Git server |
+| **Jellyfin** | grizzley | `jellyfin.tophermayor.com` | Media streaming |
+| **Immich** | ubuntu | `immich.tophermayor.com` | Photo/video backup |
+| **Sonarr/Radarr/Lidarr** | ubuntu | `sonarr.local.tophermayor.com` etc. | Media automation |
+| **Prometheus + Grafana** | ubuntu | `grafana.local.tophermayor.com` | Monitoring |
+| **Home Assistant** | panda | `ha.tophermayor.com` | Smart home hub |
+| **Vaultwarden** | grizzley | `vaultwarden.tophermayor.com` | Password manager |
+| **OpenCode** | ice + ubuntu | `opencode.tophermayor.com` | AI coding assistant |
+| **Hermes Agent** | grizzley + ice | Port 8644 | Telegram AI agent |
+| **Navidrome** | ubuntu | — | Music streaming |
+| **Kavita** | ubuntu | — | Ebook/comic reader |
+| **Audiobookshelf** | ubuntu | — | Audiobook/podcast server |
+| **Tdarr** | ubuntu | `tdarr.local.tophermayor.com` | Media transcoding |
+| **Komodo** | grizzley | `komodo.local.tophermayor.com` | Container management |
+| **Uptime Kuma** | grizzley | — | Uptime monitoring |
+| **Minecraft Bedrock** | grizzley | — | Game server |
+
+---
+
+## Cost Summary
+
+| Category | Monthly Cost |
+|----------|-------------|
+| Cloud services (Cloudflare + Backblaze) | ~$27/mo |
+| VPN (NordVPN) | ~$12/mo |
+| Developer tools (GitHub + Obsidian) | ~$12/mo |
+| Hardware (amortized over 3 years) | ~$30/mo |
+| **Total** | **~$81/mo** |
+
+---
+
+## Related
+
+- [[ubuntu]] — primary Docker host running most services
+- [[grizzley]] — edge ingress node
+- [[ice]] — OpenCode control node
+- [[truenas]] — storage with B2 backup tier
+- [[media-stack]] — media automation services
+- [[monitoring-pipeline]] — alerting and observability
--- a/homelab/concepts/traefik-ha.md
+++ b/homelab/concepts/traefik-ha.md
@@ -0,0 +1,108 @@
+---
+title: Traefik High Availability
+created: 2026-04-28
+updated: 2026-05-14
+type: concept
+tags: [concept, networking, services]
+sources: [../../homelab/architecture.md, ../../platform-config/overview.md]
+---
+
+# Traefik High Availability
+
+Two Traefik v3.6.7 instances provide ingress — one on ubuntu (primary router), one on grizzley (edge ACME). Certificates are synced via NFS.
+
+## Architecture
+
+```
+Internet → Cloudflare DNS → *.tophermayor.com
+                               ↓
+              ┌────────────────┴────────────────┐
+              ↓                                  ↓
+    grizzley Traefik                    ubuntu Traefik
+    (edge ACME)                         (primary router)
+    192.168.50.84                      192.168.50.61
+              │                                  │
+              │  TLS certs on NFS               │
+              └──────────→ /mnt/truenas/traefik-certs/grizzley ←─┘
+```
+
+## Roles
+
+| Instance | Host | Primary Role |
+|----------|------|-------------|
+| Traefik Pi | grizzley (192.168.50.84) | Edge ACME — generates wildcard certs via Cloudflare DNS challenge |
+| Traefik (ubuntu) | ubuntu (192.168.50.61) | Primary router — handles ~90% of traffic, syncs certs from grizzley |
+
+## Certificate Flow
+
+1. Grizzley Traefik runs Cloudflare DNS challenge, writes certs to NFS mount `/mnt/truenas/traefik-certs/grizzley`
+2. Ubuntu Traefik references same certs via NFS share
+3. Both instances serve the same wildcard `*.tophermayor.com` cert
+
+## Dynamic Config Files
+
+Located in `homelab/ubuntu/traefik/config/dynamic/`:
+
+| File | Services |
+|------|----------|
+| `canonical-hosts.yml` | Grizzley ingress proxy, PVE OpenCode |
+| `gitea.yml` | gitea.tophermayor.com |
+| `immich.yml` | immich.tophermayor.com |
+| `jellyfin.yml` | jellyfin.tophermayor.com |
+| `media-stack.yml` | Sonarr, Radarr, SABnzbd, Prowlarr, qBittorrent |
+| `middlewares.yml` | 30+ middleware definitions |
+| `opencode.yml` | opencode.tophermayor.com |
+| `proxmox.yml` | proxmox.local.tophermayor.com |
+| `homepage-widgets.yml` | Homepage service definitions |
+| `audiobookshelf.yml` | Audiobookshelf (CT 108) |
+| `jellyseerr.yml` | Jellyseerr (CT 106) |
+| `kavita.yml` | Kavita (CT 108) |
+| `navidrome.yml` | Navidrome (CT 107) |
+| `stremio.yml` | Stremio Server |
+
+## Common Middlewares
+
+| Middleware | Purpose |
+|------------|---------|
+| `local-only@file` | Restrict to local network IPs |
+| `authentik-auth@file` | SSO authentication |
+| `security-headers@file` | Add security headers |
+| `crowdsec-bouncer@file` | Rate limiting and threat protection |
+
+## Entry Points
+
+- `web` — port 80, HTTP → HTTPS redirect
+- `websecure` — port 443, TLS termination
+- `metrics` — port 8080, Prometheus metrics
+
+## Outage Postmortem: 2026-05-14
+
+**Severity:** Complete file provider failure — all `@file` routers and dependent `@docker` routers offline.
+
+**Root Cause:** Media migration wrote 7 YAML dynamic config files with mangled backtick quoting, causing Traefik's file provider to fail parsing entirely.
+
+**Affected Files:**
+- `homepage-widgets.yml`
+- `audiobookshelf.yml`
+- `jellyseerr.yml`
+- `kavita.yml`
+- `navidrome.yml`
+- `stremio.yml`
+- `media-stack.yml`
+
+**Impact:**
+- ALL `@file` routers down (no traffic routed to static-defined services)
+- ALL `@docker` routers depending on `local-only@file` middleware also failed
+- Homepage, media services, and any service using file-defined middlewares unreachable
+
+**Fix:** Rewrote all 7 YAML files with correct quoting. Renamed conflicting service names in `homepage-widgets.yml` that were colliding with other provider definitions.
+
+**Lesson:** Traefik file provider is all-or-nothing — one broken YAML file crashes the entire provider, taking down all file-defined routers and middlewares (even unrelated ones). Validate YAML before deploying.
+
+## Related
+
+- [[traefik]] — Traefik entity page
+- [[grizzley]] — RPi5 edge node running edge Traefik
+- [[ubuntu]] — Primary Docker host running primary Traefik
+- [[truenas]] — NFS storage for cert sync
+- [[docker-traefik-stack]] — Docker, Traefik, and container orchestration
--- a/homelab/concepts/vm-storage-policy.md
+++ b/homelab/concepts/vm-storage-policy.md
@@ -0,0 +1,60 @@
+---
+title: VM Storage Policy
+created: 2026-04-28
+updated: 2026-04-28
+type: concept
+tags: [concept, storage, ubuntu, homelab]
+confidence: high
+---
+
+# VM Storage Policy
+
+Storage rules for application data on the Ubuntu host (192.168.50.61). All agents and developers managing services on Ubuntu MUST follow these rules.
+
+## Rule 1: User-Uploaded Data on NFS
+
+Store ALL user-uploaded data on TrueNAS NFS shares, NOT on the VM's local disk.
+
+**Allowed NFS Paths:**
+- `/mnt/PersonalMediaLibrary/` — Personal media, photos (Immich)
+- `/mnt/truenas/mediadata/` — Media library (Movies, TV, Music)
+- `/mnt/truenas-backup/` — Backups
+
+**Examples:**
+```yaml
+volumes:
+  - /mnt/PersonalMediaLibrary/immich/upload:/usr/src/app/upload
+  - /mnt/truenas/mediadata/media:/media
+```
+
+## Rule 2: Config Files on VM
+
+Configuration files, databases, and cached data CAN stay on VM local disk.
+
+**Allowed Local Paths:**
+- `/home/bear/homelab/ubuntu/{service}/` — Docker compose and config
+- `./config`, `./cache` (relative to docker-compose) — Config/cache directories
+
+## Rule 3: NFS Mounts Must Be in fstab
+
+Before using an NFS path in docker-compose, verify it exists in `/etc/fstab` for persistence.
+
+```bash
+cat /etc/fstab | grep nfs
+```
+
+## Summary
+
+| Data Type | Storage Location | Example |
+|-----------|-----------------|---------|
+| User uploads | NFS (TrueNAS) | Photos, media |
+| App config | VM local | docker-compose.yml, config/ |
+| Databases | VM local (postgres-shared) | PostgreSQL, Redis |
+| Media library | NFS (TrueNAS) | Movies, TV, Music |
+| Backups | NFS (TrueNAS) | Application backups |
+
+## Related
+
+- [[nfs-storage|NFS Storage]] — TrueNAS NFS mount strategy
+- [[truenas|TrueNAS]] — network-attached storage host
+- [[ubuntu|ubuntu]] — primary Docker host
--- a/homelab/docs/ai-applications.md
+++ b/homelab/docs/ai-applications.md
@@ -0,0 +1,44 @@
+---
+project:
+  name: AI Applications
+  status: active
+  category: application
+  source: live-verification
+  created: 2026-04-19
+  updated: 2026-04-19
+  description: AI application services running on ubuntu including job pipeline, alert aggregation, and media intelligence
+  tags: [ai, applications, infrastructure]
+---
+
+# AI Application Services
+
+AI-powered application services running on ubuntu (192.168.50.61).
+
+## Services
+
+| Service | Status | Purpose |
+|---------|--------|---------|
+| **AI Job Pipeline** | Backend restarting | AI-driven job orchestration (frontend + backend + postgres) |
+| **AI Alert Aggregator** | Backend restarting | AI-powered alert aggregation (frontend + backend + postgres) |
+| **AI Media Intelligence** | Backend restarting | AI media analysis and intelligence |
+| **AI Subscriptions** | Healthy | AI subscription management |
+| **Homelab Inventory** | Backend restarting | Automated infrastructure inventory |
+
+## Other Application Services
+
+| Service | Purpose | Status |
+|---------|---------|--------|
+| **Docker Registry** | Private container image registry | Running |
+| **Docker OSX** | macOS VM in Docker for testing | Running |
+| **Faster Whisper Server** | Local speech-to-text (CUDA) | Healthy |
+
+## Notes
+
+- Several AI application backends are in a restart loop — may need investigation
+- All services are Docker containers on ubuntu
+- Docker Registry provides private image hosting at `registry:5000`
+
+## Related
+
+- [[../architecture.md|Homelab Architecture]]
+- [[../../homelab/raw/articles/ai-assistant/project.md|AI Assistant Configuration]]
--- a/homelab/docs/grizzley-services.md
+++ b/homelab/docs/grizzley-services.md
@@ -0,0 +1,73 @@
+---
+project:
+  name: Grizzley Infrastructure Services
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-04-19
+  updated: 2026-04-19
+  description: Services running on grizzley (Raspberry Pi 5) including Komodo, Hermes, Vaultwarden, and Minecraft
+  tags: [infrastructure, grizzley, komodo, hermes, minecraft]
+---
+
+# Grizzley Services
+
+All services running on grizzley (192.168.50.84, Raspberry Pi 5, Ubuntu 25.10).
+
+## Infrastructure
+
+| Service | Image | Status | Purpose |
+|---------|-------|--------|---------|
+| **Traefik** (traefik-pi) | traefik:v3.6.7 | Healthy | Edge ingress, primary ACME certificate source |
+| **Homepage** | homepage-grizzley | Healthy | Startpage dashboard |
+| **Komodo** | komodo | Healthy | Docker Compose stack management (core) |
+| **Komodo MongoDB** | komodo-mongo | Healthy | Komodo database |
+
+## AI & Management
+
+| Service | Image | Status | Purpose |
+|---------|-------|--------|---------|
+| **aiomanager** | aiomanager | Healthy | AI operations manager |
+| **aiomanager_db** | aiomanager_db | Healthy | AI manager database |
+
+## Migrated Services
+
+These services were migrated from ubuntu to grizzley:
+
+| Service | Purpose | Notes |
+|---------|---------|-------|
+| **Vaultwarden** | Password manager | DB via remote postgres-shared on ubuntu |
+| **Uptime Kuma** | Uptime monitoring | Self-contained SQLite |
+
+## Gaming
+
+| Service | Port | Purpose |
+|---------|------|---------|
+| **Minecraft Bedrock (standby)** | UDP/19132 | Primary Minecraft Bedrock server |
+| **Minecraft Bedrock (sison)** | UDP/19134 | Secondary Minecraft Bedrock server |
+
+## Hermes Agent
+
+Systemd service (`hermes-gateway.service`) providing:
+- Telegram bot integration for alerts and management
+- Webhook on port 8644 for Prometheus Alertmanager
+- SSH-based homelab monitoring
+- 3 cron jobs: Health Check (15m), Container Monitor (30m), Maintenance (6h)
+
+## Komodo Stack Management
+
+Komodo manages Docker Compose stacks on both ubuntu and grizzley:
+- Mode: `files_on_host` — runs `docker compose` in existing host directories
+- 19 stacks registered (14 ubuntu, 5 grizzley)
+- Periphery agent runs on each host, connects to Komodo Core on grizzley
+
+## Network
+
+- External network: `traefik-proxy` for Traefik-routed services
+- Internal network: `komodo-internal` for MongoDB isolation
+- NFS-mounted certs from TrueNAS: `/mnt/truenas/traefik-certs/grizzley`
+
+## Related
+
+- [[../architecture.md|Homelab Architecture]]
+- [[../project.md|Homelab Project]]
--- a/homelab/docs/ice-host.md
+++ b/homelab/docs/ice-host.md
@@ -0,0 +1,51 @@
+---
+project:
+  name: Ice Host
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-04-19
+  updated: 2026-04-19
+  description: Ice control plane host (Raspberry Pi 4) running OpenCode and utility services
+  tags: [infrastructure, ice, control-plane, opencode]
+---
+
+# Ice Host (192.168.50.197)
+
+Control plane node running on Raspberry Pi 4 with Ubuntu 25.10 (aarch64).
+
+## Services
+
+### Systemd Services
+
+| Service | Status | Port | Purpose |
+|---------|--------|------|---------|
+| `opencode-web.service` | Active/Enabled | 4096 | OpenCode web interface |
+| `docker.service` | Active | - | Docker Engine |
+
+### Docker Containers
+
+| Container | Image | Status | Purpose |
+|-----------|-------|--------|---------|
+| camofox | camofox:aarch64 | Up 3 days | Camofox utility service |
+
+### Not Running
+
+- **Nanobot** — Previously planned AI agent, never deployed
+- **App Factory** — Config exists in `homelab/ice/` but not currently running
+
+## Configuration
+
+- OpenCode config: `homelab/ice/opencode.json`
+- App Factory: `homelab/ice/` (memoir.json, oh-my-opencode.json, systemd/)
+
+## Key Facts
+
+- No Docker socket available for Komodo Periphery
+- OpenCode runs via systemd (not Docker)
+- Minimal host — focused on OpenCode and lightweight services
+
+## Related
+
+- [[../architecture.md|Homelab Architecture]]
+- [[opencode-cluster.md|OpenCode Cluster]]
--- a/homelab/docs/media-extensions.md
+++ b/homelab/docs/media-extensions.md
@@ -0,0 +1,61 @@
+---
+project:
+  name: Media Extensions
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-04-19
+  updated: 2026-04-19
+  description: Expanded media stack including music, ebooks, audiobooks, manga, and media quality management
+  tags: [infrastructure, media, music, ebooks, audiobooks]
+---
+
+# Media Extensions
+
+Beyond the core media stack (Radarr, Sonarr, Jellyfin), the homelab runs extended media services for music, ebooks, audiobooks, and quality management.
+
+## Music Services
+
+| Service | Image | Purpose | Status |
+|---------|-------|---------|--------|
+| **Navidrome** | deluan/navidrome | Music streaming server | Unhealthy |
+| **Lidarr** | linuxserver/lidarr | Music automation (arr) | Unhealthy |
+| **Musicseerr** | localhost:5000/musicseerr | Music request system | Healthy |
+
+## Ebook & Reading Services
+
+| Service | Image | Purpose | Status |
+|---------|-------|---------|--------|
+| **Calibre** | linuxserver/calibre | Ebook library management | Running |
+| **Calibre-Web** | linuxserver/calibre-web | Web ebook reader | Healthy |
+| **Kavita** | jvmilazz0/kavita | Manga/comic reader | Healthy |
+| **LazyLibrarian** | linuxserver/lazylibrarian | Book automation (arr) | Healthy |
+
+## Audiobook Services
+
+| Service | Image | Purpose | Status |
+|---------|-------|---------|--------|
+| **Audiobookshelf** | advplyr/audiobookshelf | Audiobook/podcast server | Unhealthy |
+
+## Media Management
+
+| Service | Image | Purpose | Status |
+|---------|-------|---------|--------|
+| **RecCollection** | docker-local-backend | Media collection manager | Healthy |
+| **Unified Media Manager** | unified-media-manager | Unified media management | Healthy |
+| **Stremio Server** | stremio/server | Media streaming | Healthy |
+| **NZBdav** | nzbdav/nzbdav | Usenet WebDAV access | Running |
+
+## Media Quality Assurance
+
+| Service | Image | Purpose |
+|---------|-------|---------|
+| **Recyclarr** | recyclarr/recyclarr | Radarr/Sonarr quality profile management |
+| **Analyzarr** | media-qa-analyzarr | Media file quality analysis |
+
+All media services run on **ubuntu** (192.168.50.61). Media files are stored on TrueNAS NFS at `/mnt/truenas/mediadata/`.
+
+## Related
+
+- [[../architecture.md|Homelab Architecture]]
+- [[../project.md|Homelab Project]]
--- a/homelab/docs/opencode-cluster.md
+++ b/homelab/docs/opencode-cluster.md
@@ -0,0 +1,61 @@
+---
+project:
+  name: OpenCode Cluster
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-04-19
+  updated: 2026-04-19
+  description: OpenCode AI coding assistant cluster deployment across homelab hosts
+  tags: [infrastructure, opencode, ai, cluster]
+---
+
+# OpenCode Cluster Deployment
+
+OpenCode AI coding assistant deployed as systemd services across the homelab cluster.
+
+## Instances
+
+| Instance | Host | Port | Traefik Route | Status |
+|----------|------|------|---------------|--------|
+| ubuntu | 192.168.50.61 | 4096 | opencode.tophermayor.com | Active/Enabled |
+| ice | 192.168.50.197 | 4096 | opencode-ice.tophermayor.com | Active/Enabled |
+| grizzley | 192.168.50.84 | 4096 | — | Inactive/Disabled |
+
+## Service Management
+
+All instances run as `opencode-web.service` via systemd:
+
+```bash
+# Check status
+systemctl status opencode-web
+
+# Restart
+sudo systemctl restart opencode-web
+
+# View logs
+journalctl -u opencode-web -f
+```
+
+## Shared Infrastructure
+
+- **Qdrant** (192.168.50.61:6333) — Shared vector memory backend
+- **Ollama** (192.168.50.61:11434) — Local embedding generation
+
+## Configuration
+
+Per-host config files in `homelab/<host>/opencode/`:
+- `opencode.json` — Main OpenCode configuration
+- `oh-my-opencode.json` — Framework configuration
+
+## Traefik Routing
+
+OpenCode instances use dedicated Traefik middlewares:
+- `local-only@file` — IP whitelist
+- `opencode-streaming@file` — SSE support
+- `opencode-cors@file` — CORS headers
+
+## Related
+
+- [[../architecture.md|Homelab Architecture]]
+- [[../../homelab/raw/articles/ai-assistant/project.md|AI Assistant Configuration]]
--- a/homelab/docs/runbooks/oh-my-opencode-setup.md
+++ b/homelab/docs/runbooks/oh-my-opencode-setup.md
@@ -0,0 +1,52 @@
+# oh-my-opencode Setup & Troubleshooting Runbook
+
+## Overview
+This runbook covers the steps required to enable `oh-my-opencode` properly, ensuring all primary agents (Sisyphus, Atlas, Prometheus) load and function correctly across the homelab infrastructure.
+
+## Problem Context
+Initially, `oh-my-opencode` was installed but failed to load primary agents. Symptoms included missing agents in the TUI and logs showing plugins loading except for `oh-my-opencode`.
+
+## Root Causes Identified
+1.  **Malformed Configuration**: `oh-my-opencode.json` had broken JSON syntax and missing agent/hook blocks.
+2.  **Plugin Loading Order**: `oh-my-opencode` was not the first plugin in `opencode.json`, potentially causing initialization delays or conflicts.
+3.  **Missing Built-in Definitions**: Primary agents were not explicitly defined with correct model/category mappings.
+
+## Step-by-Step Enablement
+
+### 1. Update `opencode.json`
+Ensure `oh-my-opencode@latest` is the first plugin in the list. This ensures it initializes before other plugins that might depend on it or conflict with its hooks.
+
+```json
+"plugin": [
+  "oh-my-opencode@latest",
+  "opencode-antigravity-auth@latest",
+  "./plugin/kilocode/plugin_kilocode.ts"
+]
+```
+
+### 2. Standardize `oh-my-opencode.json`
+Apply the standardized configuration with all hooks enabled and primary agents defined. Key sections to include:
+- `sisyphus_agent`: Enable planner and plan replacement.
+- `hooks`: Enable all 16+ hooks including `session-recovery`, `rules-injector`, and `think-mode`.
+- `agents`: Define `sisyphus`, `atlas`, `prometheus`, `oracle`, `librarian`, and `explore` with appropriate models.
+
+### 3. Verify Plugin Loading
+Check OpenCode logs for successful plugin initialization:
+```bash
+grep "service=plugin.*loading" ~/.local/share/opencode/log/*.log
+```
+Look for: `service=plugin path=...oh-my-opencode/dist/index.js loading plugin`
+
+### 4. Verify Agents in TUI
+Launch OpenCode and verify `Sisyphus` appears in the agent selection. Also test slash commands like `/refactor` or `/git-master`.
+
+## GitOps Workflow
+All configuration changes must be made in the `homelabagentroot` repository and pushed to trigger the automated deployment sync.
+
+1.  Edit configs in `homelab/configs/opencode-global/`
+2.  Commit and push to `origin main`
+3.  The Gitea runner will pull changes and restart services as configured.
+
+---
+**Last Updated:** January 25, 2026
+**Status:** Verified Working ✅
--- a/homelab/docs/unifi-execution-plan.md
+++ b/homelab/docs/unifi-execution-plan.md
@@ -0,0 +1,134 @@
+---
+project:
+  name: UniFi Execution Plan
+  status: active
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Exact staged UniFi zone and firewall change plan derived from current live state and authoritative host repos
+  goals:
+    - Apply the minimum set of high-value zone and policy changes safely
+    - Preserve application reachability while tightening security boundaries
+    - Provide an execution sequence that supports rollback and verification
+  priority: high
+  tags: [unifi, firewall, zones, execution, planning]
+---
+
+# UniFi Execution Plan
+
+## Current Status
+
+Implemented on 2026-03-17:
+
+- `Family of D.` moved from `Management` to `Internal`
+- `Management` reduced to `Default` only
+- New `Internal` allow rules created for `Servers` (`80/443`), `IoT`, and `Staging`
+- Logging enabled on selected user-defined edge and VPN policies
+- Staged DHCP reservations enabled for `grizzley`, `ice`, and `homeassistant`
+- First host-side migration step completed for `truenas`: default gateway moved from `192.168.1.1` to `192.168.50.1`
+- `proxmox` default gateway moved from `192.168.1.1` to `192.168.50.1`
+- `ubuntu` default gateway moved from `192.168.1.1` to `192.168.50.1`
+- `proxmox` legacy `192.168.1.11` address removed from `vmbr0`
+- `ubuntu` legacy `192.168.1.61` address removed from `enp6s18`
+- `truenas` legacy `192.168.1.12` address removed from `enp6s17`
+- `grizzley` Wi-Fi config removed
+- `ice` Wi-Fi config removed
+- staging-side `192.168.40.x` addresses removed from `truenas`, `grizzley`, and `ice`
+
+Still pending:
+
+- later interface cleanup for legacy `truenas`, `proxmox`, and `ubuntu` addresses that still remain active
+- later interface cleanup for staging-side addresses that still remain active on `truenas`, `grizzley`, and `ice`
+- cleanup of stale UniFi controller observations for the removed Ubuntu legacy address
+- cleanup of stale or lagging UniFi controller observations for removed Wi-Fi paths on `grizzley` and `ice`
+- decide whether remaining infrastructure-side `192.168.30.x` addresses should persist long-term
+- deny-rule logging expansion
+- public `HTTP` exposure review
+- duplicate-rule cleanup and broader rule tightening
+- maintenance-window execution of the one-host-at-a-time migration runbook
+
+## Reservation Update Notes
+
+The UniFi controller accepted staged reservation updates for:
+
+- `grizzley` -> `192.168.10.145`
+- `ice` Wi-Fi -> `192.168.10.178`
+- `ice` wired -> `192.168.50.197`
+- `homeassistant` -> `192.168.30.196`
+- `ubuntu` -> `192.168.1.61`
+- `proxmox` -> `192.168.1.11`
+
+The active `truenas` reservation at `192.168.1.12` remains valid.
+
+Follow-up change:
+
+- the stale secondary TrueNAS fixed-IP reservation at `192.168.1.145` has been cleared; the remaining task is to decide how many live TrueNAS interfaces should persist long-term
+- Wi-Fi reservations for `grizzley` and `ice` were cleared after host-side Wi-Fi removal
+- Staging access rules were disabled after staging-side host addresses were removed
+
+## Scope
+
+This plan focuses on the first safe wave of changes:
+
+- restore `Management` as an infrastructure-only trust boundary
+- keep `Internal` for trusted user devices only
+- preserve `Guest` internet-only access
+- preserve `IoT` with narrow app exceptions
+- maintain `Servers` as the homelab application segment
+- treat `Vpn` as explicit least-privilege remote access
+
+## Phase 1: Zone Corrections
+
+1. Remove `Family of D.` from `Management`
+2. Ensure `Family of D.` is mapped to `Internal`
+3. Keep `Default` in `Management`
+4. Keep `Production` in `Servers`
+5. Keep `Will of D. IoT` in `IoT`
+6. Keep `Will of D. (Guest)` in `Guest`
+7. Keep `UGC WireGuard` in `Vpn` unless there is a deliberate reason to merge admin semantics elsewhere
+
+## Phase 2: Logging Improvements
+
+1. Enable logging on edge-facing allow rules:
+   - `External -> Web Proxy`
+   - `External -> HTTPS`
+   - `External -> HTTP` if retained
+2. Enable logging on key deny rules:
+   - `Guest -> Internal`
+   - `Guest -> Servers`
+   - `IoT -> Internal`
+   - `IoT -> Management`
+3. Enable logging on sensitive admin rules:
+   - `Vpn -> Management`
+   - `Vpn -> Servers`
+
+## Phase 3: Rule Tightening
+
+1. Review and narrow broad `Internal -> Servers` rules to app ports only
+2. Review and narrow broad `IoT -> Servers` rules to explicit media and automation ports only
+3. Review `Vpn -> Management` and reduce to the smallest needed host/port set
+4. Remove duplicate return-path rules once stateful behavior is confirmed
+5. Remove or disable `HTTP` exposure if no longer required for redirect or certificate workflows
+
+## Phase 4: Host Placement Follow-Through
+
+1. Normalize infrastructure hosts to their intended addresses where possible
+2. Keep split-plane exceptions documented explicitly, such as `panda`
+3. Revisit firewall rules after host addressing settles so the final policy set matches reality
+
+## Verification Checklist
+
+- `Management` clients can reach infrastructure admin interfaces
+- `Internal` clients can reach approved apps over `HTTPS`
+- `Guest` clients have internet access only
+- `IoT` clients can reach only approved services such as Jellyfin, Traefik, and Home Assistant where required
+- VPN clients retain the minimum access needed for admin work
+- Public apps remain reachable through the intended hardened edge
+
+## Rollback Principles
+
+- export before each major edit
+- change one zone or rule set at a time
+- verify from at least one host in each affected zone
+- keep a saved copy of previous zone membership and rule ordering
--- a/homelab/docs/unifi-final-change-report-2026-03-17.md
+++ b/homelab/docs/unifi-final-change-report-2026-03-17.md
@@ -0,0 +1,76 @@
+---
+project:
+  name: UniFi Final Change Report 2026-03-17
+  status: active
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Concise before-and-after report for the March 17 UniFi cleanup and host migration wave
+  goals:
+    - Capture the final outcome of the cleanup wave
+    - Summarize what changed, what was verified, and what remains
+    - Provide a short artifact suitable for handoff or archival
+  priority: medium
+  tags: [unifi, report, migration, summary]
+---
+
+# UniFi Final Change Report 2026-03-17
+
+## Before
+
+- `Management` included both `Default` and `Family of D.`
+- `ubuntu`, `proxmox`, and `truenas` still used legacy `192.168.1.x` paths
+- `grizzley` and `ice` still had active Wi-Fi participation on `Family of D.`
+- `truenas`, `grizzley`, and `ice` still had staging-side `192.168.40.x` addresses
+- staging access policies were still enabled
+
+## After
+
+- `Family of D.` now lives in `Internal`
+- `Management` now maps only to `Default`
+- legacy `192.168.1.x` removed from:
+  - `ubuntu`
+  - `proxmox`
+  - `truenas`
+- Wi-Fi removed from:
+  - `grizzley`
+  - `ice`
+- staging `192.168.40.x` removed from:
+  - `truenas`
+  - `grizzley`
+  - `ice`
+- disabled:
+  - `Vpn to Staging`
+  - `Allow Servers to Staging`
+
+## Verified Retained 192.168.30.x Paths
+
+These were intentionally retained because they still expose live service endpoints:
+
+| Host | Retained Address | Verified Ports |
+|------|------------------|----------------|
+| `ubuntu` | `192.168.30.61` | `80`, `443`, `8096` |
+| `proxmox` | `192.168.30.11` | `22`, `8006`, `3128` |
+| `grizzley` | `192.168.30.84` | `80`, `443`, `8080` |
+| `ice` | `192.168.30.197` | `22`, `4096`, `18791` |
+
+## Controller State Notes
+
+- UniFi no longer shows the removed legacy `192.168.1.61` path for `ubuntu`
+- UniFi shows `ice` only on the wired production path
+- UniFi still shows one disconnected/no-IP `grizzley` IoT-side record
+- A direct delete attempt against that stale `grizzley` client record returned `api.err.NotFound`, so the safest assumption is controller-history lag rather than an active client entry
+
+## Remaining Follow-Up
+
+- Decide service-by-service whether the retained `192.168.30.x` addresses should remain long-term
+- Allow the stale disconnected `grizzley` UniFi record to age out, or revisit if it persists
+- Review public `HTTP` exposure and duplicate firewall rules in a future maintenance pass
+
+## Related Docs
+
+- [[unifi-post-migration-summary-2026-03-17.md|UniFi Post-Migration Summary 2026-03-17]]
+- [[unifi-host-migration-runbook.md|UniFi Host Migration Runbook]]
+- [[unifi-execution-plan.md|UniFi Execution Plan]]
+- [[unifi-rollback-2026-03-17.md|UniFi Rollback 2026-03-17]]
--- a/homelab/docs/unifi-host-migration-checklist.md
+++ b/homelab/docs/unifi-host-migration-checklist.md
@@ -0,0 +1,111 @@
+---
+project:
+  name: UniFi Host Migration Checklist
+  status: planning
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Host-by-host checklist for aligning live UniFi placement with authoritative host repo intent
+  goals:
+    - Normalize infrastructure hosts to intended network zones
+    - Reduce accidental dual-homing and cross-zone ambiguity
+    - Preserve app reachability during staged network changes
+  priority: high
+  tags: [unifi, migration, hosts, checklist, planning]
+---
+
+# UniFi Host Migration Checklist
+
+## Overview
+
+This checklist breaks the UniFi optimization work into host-specific actions. It is written to support staged execution and validation.
+
+## Shared Pre-Checks
+
+- [ ] Export current UniFi networks, zones, and firewall policies
+- [ ] Confirm DHCP reservations for all infrastructure hosts
+- [ ] Confirm DNS records that point at `ubuntu`, `grizzley`, `ice`, `proxmox`, `truenas`, `panda`, and `traefik-lxc`
+- [ ] Confirm out-of-band or fallback admin access for each host before moving network placement
+- [ ] Enable logging on critical deny and edge allow rules before major topology changes
+
+## Current Staged-Cutover Status
+
+- [x] `Family of D.` moved from `Management` to `Internal`
+- [x] `Management` reduced to `Default` only
+- [x] Staged DHCP reservation enabled for `grizzley` Wi-Fi path at `192.168.10.145`
+- [x] Staged DHCP reservations enabled for `ice` at `192.168.10.178` and `192.168.50.197`
+- [x] Staged DHCP reservation enabled for `homeassistant` app plane at `192.168.30.196`
+- [x] `ubuntu` reservation normalized to its current live `Default` network address `192.168.1.61`
+- [x] `proxmox` reservation refreshed and validated through UniFi at `192.168.1.11`
+- [x] `truenas` primary reservation confirmed at `192.168.1.12`
+
+Follow-up findings:
+
+- `ubuntu` and `proxmox` accepted the legacy fixed-IP update format and now reflect their current live `Default` network addresses correctly in UniFi.
+- `truenas` already had a valid primary reservation at `192.168.1.12` plus a second physical-NIC reservation at `192.168.1.145`.
+- The `truenas` update conflict came from the second NIC record, not from the active primary reservation itself.
+
+## Ubuntu
+
+Current intent: primary Docker host and public/internal app edge on `192.168.50.61`
+
+- [ ] Confirm whether `ubuntu` should live only on `Production` or stay dual-homed during migration
+- [ ] If moving, create or verify reservation for `192.168.50.61`
+- [ ] Ensure Traefik, Authentik, Gitea, Vaultwarden, and OpenCode URLs resolve to the correct server-side path
+- [ ] Verify inbound `HTTPS` routes after network normalization
+- [ ] Remove stale `Default`-side assumptions from firewall rules after validation
+
+## Grizzley
+
+Current intent: edge ingress on `192.168.50.84`
+
+- [ ] Verify whether the current `192.168.10.145` presence is intentional or drift
+- [ ] Confirm the desired primary address remains `192.168.50.84`
+- [ ] Keep Traefik and admin access in `Servers` and `Management`, not `Internal`
+- [ ] Remove any unintended trusted-client or Wi-Fi placement once validated
+
+## Ice
+
+Current intent: control-plane infrastructure on `192.168.50.197`
+
+- [ ] Verify whether `192.168.10.178` is an intentional secondary path
+- [ ] Keep control-plane traffic anchored to `Production`
+- [ ] Limit any secondary management path to a documented admin-only use case
+- [ ] Remove broad `Internal`-side reachability if the extra placement is not required
+
+## Proxmox
+
+Current intent: infrastructure-only hypervisor on `192.168.50.11`
+
+- [ ] Confirm the hypervisor should not remain on `192.168.1.11`
+- [ ] Verify management-only access to the hypervisor UI and SSH
+- [ ] Confirm `traefik-lxc` (`192.168.50.115`) and other LXC workloads remain server-side only
+- [ ] Review whether any user networks directly reach Proxmox today and remove that access if unnecessary
+
+## TrueNAS
+
+Current intent: storage-only host on `192.168.50.12`
+
+- [ ] Confirm whether `192.168.1.12` is a legacy path, active secondary interface, or stale observation
+- [ ] Keep storage admin access on `Management` and selected server workflows only
+- [ ] Confirm mounts and NFS exports still resolve correctly after address normalization
+- [ ] Document the final intended interface model explicitly
+
+## Panda / Home Assistant
+
+Current intent: app endpoint on `192.168.30.196`, SSH/admin endpoint on `192.168.50.196`
+
+- [ ] Preserve the split app/admin model unless there is a strong reason to collapse it
+- [ ] Confirm Home Assistant app access remains available from intended `Internal`, `Management`, and selected `IoT` clients
+- [ ] Restrict admin SSH path to `Management` and approved VPN clients
+- [ ] Keep Home Assistant runtime state out of Git-tracked locations
+
+## Post-Migration Validation
+
+- [ ] Confirm all host DHCP reservations and names resolve correctly
+- [ ] Confirm reverse proxy paths for public and internal apps
+- [ ] Confirm Home Assistant, Jellyfin, Gitea, Vaultwarden, and Authentik remain reachable from intended zones
+- [ ] Confirm guests have internet-only access
+- [ ] Confirm IoT devices can reach only their approved service exceptions
+- [ ] Confirm VPN access is least-privilege and still sufficient for admin work
--- a/homelab/docs/unifi-host-migration-runbook.md
+++ b/homelab/docs/unifi-host-migration-runbook.md
@@ -0,0 +1,153 @@
+---
+project:
+  name: UniFi Host Migration Runbook
+  status: planning
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: One-host-at-a-time runbook for moving infrastructure from 192.168.1.x drift toward documented 192.168.50.x placement
+  goals:
+    - Migrate infrastructure hosts without lockout
+    - Validate services and routing after each host move
+    - Preserve rollback options at every step
+  priority: high
+  tags: [unifi, migration, runbook, infrastructure]
+---
+
+# UniFi Host Migration Runbook
+
+## Strategy
+
+Use a staged maintenance-window approach. Move one host at a time, verify service reachability, then continue.
+
+## Pre-Migration Rules
+
+- Keep working SSH access before changing a host address
+- Keep DHCP reservation and target network prepared before host cutover
+- Verify DNS, reverse proxy, and firewall reachability after each move
+- Roll back immediately if the management path or primary app path fails
+
+## Recommended Order
+
+1. `truenas`
+2. `proxmox`
+3. `ubuntu`
+4. `grizzley`
+5. `ice`
+
+This order reduces blast radius by moving storage and hypervisor access before the primary public app edge.
+
+## Host Steps
+
+### TrueNAS
+
+Target intent: normalize around `192.168.50.12`
+
+- Confirm which NICs are intentionally active
+- Confirm whether `192.168.1.12` remains required during transition
+- Confirm NFS/SMB exports remain reachable from `ubuntu` and other consumers
+- Remove stale or duplicate UniFi client records only after confirming the active interface map
+- Cut over management and storage clients to the server-side address
+
+Rollback:
+
+- Re-enable the previous interface/gateway path
+- Restore the old fixed IP if needed
+
+### Proxmox
+
+Target intent: normalize around `192.168.50.11`
+
+- Verify direct shell access before change
+- Confirm access to hosted services such as `traefik-lxc` and `adguard`
+- Move the management path and validate web UI, SSH, and LXC/VM operations
+
+Rollback:
+
+- Restore previous interface config and reservation
+
+### Ubuntu
+
+Target intent: normalize around `192.168.50.61`
+
+- Verify SSH access and Docker service health before cutover
+- Confirm Traefik, Authentik, Gitea, Vaultwarden, OpenCode, Jellyfin, and other critical apps are healthy
+- Update reverse proxy assumptions if any services still reference the old `192.168.1.61` path
+- Validate external and internal HTTPS after the move
+
+Rollback:
+
+- Restore `192.168.1.61`
+- Re-test `gitea.tophermayor.com`, `opencode.tophermayor.com`, and other critical ingress routes
+
+### Grizzley
+
+Target intent: normalize around `192.168.50.84`
+
+- Decide whether the `192.168.10.145` Wi-Fi presence is temporary or required
+- Preserve edge ingress management access during any move
+
+### Ice
+
+Target intent: normalize around `192.168.50.197`
+
+- Decide whether the `192.168.10.178` Wi-Fi path is still required
+- Preserve OpenCode control-plane access during any move
+
+## Post-Step Validation
+
+- SSH works from management
+- DNS resolves correctly
+- Reverse proxy paths work where expected
+- Firewall logs show expected zone flows only
+- No new unexpected east-west traffic appears
+
+## Notes From Current State
+
+- `Family of D.` is now in `Internal`, not `Management`
+- `ubuntu` and `proxmox` reservations are aligned to current live `Default` addresses
+- `truenas` still has multiple NIC/client records and should be cleaned up carefully before a move
+- `grizzley`, `ice`, and `homeassistant` staged reservations are already in place for their current live paths
+
+## Executed Migration State
+
+Executed on 2026-03-17:
+
+- `truenas` secondary stale reservation at `192.168.1.145` was cleared
+- `truenas` management and egress preference was shifted to `Production` by changing the host default gateway from `192.168.1.1` to `192.168.50.1`
+- `truenas` DNS was normalized to prefer `192.168.50.157` with `1.1.1.1` as secondary
+- `proxmox` default route was moved from `192.168.1.1` on `vmbr0` to `192.168.50.1` on `vmbr0.50`, and `/etc/network/interfaces` was updated accordingly
+- `ubuntu` default route was moved from `192.168.1.1` on `enp6s18` to `192.168.50.1` on `vlan50`, and `/etc/netplan/50-cloud-init.yaml` was updated to persist the server-side route and DNS preference
+- `proxmox` legacy `192.168.1.11` address was removed from `vmbr0`; the host now remains reachable only on `192.168.50.11`, `192.168.40.11`, and `192.168.30.11`
+- `ubuntu` legacy `192.168.1.61` address was removed from `enp6s18`; the host now remains reachable on `192.168.50.61` and `192.168.30.61`
+- `truenas` legacy `192.168.1.12` address was removed from `enp6s17` using the TrueNAS interface rollback/checkin workflow; the host now remains reachable on `192.168.50.12` and `192.168.40.12`
+- `grizzley` Wi-Fi config was removed, leaving wired server-side operation on `192.168.50.84` plus its VLAN-side service addresses
+- `ice` Wi-Fi config was removed, leaving wired server-side operation on `192.168.50.197` plus its VLAN-side service addresses
+- `truenas`, `grizzley`, and `ice` staging-side `192.168.40.x` addresses were removed
+
+Verification after the change:
+
+- SSH remained reachable on both `192.168.50.12` and `192.168.1.12`
+- Default route now points to `192.168.50.1` on `enp6s19`
+- Internet egress test to `1.1.1.1` succeeded
+- `proxmox` remained reachable on both `192.168.50.11` and `192.168.1.11`
+- `ubuntu` remained reachable on both `192.168.50.61` and `192.168.1.61`
+- `gitea.tophermayor.com` and `opencode.tophermayor.com` continued returning `HTTP 200`
+- after the Proxmox legacy-address removal, SSH remained reachable on `192.168.50.11` and no longer responded on `192.168.1.11`
+- after the Ubuntu legacy-address removal, SSH remained reachable on `192.168.50.61`, critical app endpoints continued returning `HTTP 200`, and the old `192.168.1.61` SSH path stopped responding
+- after the TrueNAS legacy-address removal, SSH remained reachable on `192.168.50.12`, the old `192.168.1.12` path stopped responding, and interface changes were checked in successfully
+- after the `grizzley` and `ice` Wi-Fi removals, SSH remained reachable on `192.168.50.84` and `192.168.50.197`, while the old Wi-Fi IPs no longer responded from the management host
+
+Still pending for full TrueNAS normalization:
+
+- no host-side `192.168.40.12` path remains
+
+Still pending for full Proxmox and Ubuntu normalization:
+
+- update stale controller/client observations so UniFi no longer shows the old `192.168.1.61` path as active after the host-side removal
+
+Still pending for full Grizzley and Ice normalization:
+
+- allow UniFi client state to age out or refresh, since disconnected Wi-Fi client observations may remain visible briefly after host-side removal
+- decide whether their additional VLAN-side service addresses on `192.168.30.x` remain intentional long-term
--- a/homelab/docs/unifi-live-drift-table.md
+++ b/homelab/docs/unifi-live-drift-table.md
@@ -0,0 +1,65 @@
+---
+project:
+  name: UniFi Live Drift Table
+  status: planning
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Drift table comparing live UniFi observations to authoritative host repo and catalog intent
+  goals:
+    - Identify address and zone drift for infrastructure hosts
+    - Separate intentional split-plane designs from accidental placement
+    - Provide a decision aid before firewall cleanup execution
+  priority: high
+  tags: [unifi, drift, hosts, planning, audit]
+---
+
+# UniFi Live Drift Table
+
+## Summary
+
+This table compares live UniFi observations from 2026-03-17 with the latest pulled host repos and homelab catalogs.
+
+| Host / Asset | Authoritative Intent | Live UniFi Observation | Drift Level | Decision Needed |
+|--------------|----------------------|------------------------|-------------|-----------------|
+| `ubuntu` | `192.168.50.61`, primary Docker/app edge | host now routes and serves from `192.168.50.61`; UniFi currently reports the MAC on another VLAN-side address | Low | Refresh controller/client state so UniFi reflects the completed host-side removal |
+| `grizzley` | `192.168.50.84`, edge ingress/control node | host now routes from `192.168.50.84`; UniFi may still show stale/disconnected Wi-Fi history for `192.168.10.145` | Low | Confirm whether any residual Wi-Fi client state ages out cleanly |
+| `ice` | `192.168.50.197`, control-plane host | host now routes from `192.168.50.197`; UniFi may still show stale/disconnected Wi-Fi history for `192.168.10.178` | Low | Confirm residual Wi-Fi client state ages out cleanly |
+| `proxmox` | `192.168.50.11`, infra-only hypervisor | `192.168.50.11`; legacy `192.168.1.11` removed | Low | Keep monitoring hosted service paths |
+| `truenas` | `192.168.50.12`, storage-only host | `192.168.50.12`; default route prefers `192.168.50.1` | Low | Keep monitoring storage-path behavior |
+| `panda` app plane | `192.168.30.196` | `192.168.30.196` | Low | Keep |
+| `panda` admin plane | `192.168.50.196` SSH endpoint | not shown in current client list | Low | Keep and validate by access test, not client inventory alone |
+| `traefik-lxc` | `192.168.50.115` | not queried directly in client output | Medium | Validate server-segment reachability and access scope |
+| `alpine-adguard` | `192.168.50.157` | not queried directly in client output | Medium | Validate DNS/admin access scope |
+
+## Staged-Cutover Notes
+
+- `grizzley` Wi-Fi path now has a staged reservation for `192.168.10.145`
+- `ice` now has staged reservations for both `192.168.10.178` and `192.168.50.197`
+- `homeassistant` now has an active staged reservation for `192.168.30.196`
+- `ubuntu` and `proxmox` were corrected by switching to the legacy fixed-IP update format accepted by the classic UniFi endpoint
+- `truenas` conflict was traced to a second NIC record that had reserved `192.168.1.145`; that stale fixed-IP reservation has been cleared, while the active primary reservation at `192.168.1.12` remains valid
+- `truenas` host egress now prefers `192.168.50.1`, and the legacy `192.168.1.12` address has been removed
+- `grizzley` and `ice` Wi-Fi reservations were cleared after host-side Wi-Fi removal, but UniFi may still report the disconnected records until controller state refreshes
+- `ubuntu` host-side removal of `192.168.1.61` is complete, but UniFi currently reports the MAC on another VLAN-side address, which appears to be a controller observation artifact for a multi-VLAN host
+- staging-side host addresses were removed from `truenas`, `grizzley`, and `ice`, and the two explicit staging firewall policies were disabled
+
+## Interpretation
+
+- High drift means live UniFi placement materially conflicts with the intended trust boundary in the authoritative repos.
+- Medium drift means the placement may be legitimate, but it still needs explicit documentation and tighter firewall policy.
+- Low drift means the live state matches the intended design closely enough for now.
+
+## Most Important Drift Items
+
+1. `ubuntu` carries your primary public and internal app edge, so its current `Default`-side visibility has the biggest security impact.
+2. `proxmox` and `truenas` should not sit in a broadly reachable user or legacy management segment unless there is a deliberate operational reason.
+3. `grizzley` and `ice` appearing on `Family of D.` weakens the intended separation between user devices and infrastructure nodes.
+4. `panda` is the cleanest example of an intentional split-plane design and can be used as a model for how to document exceptions.
+
+## Remaining 192.168.30.x Assessment
+
+- `ubuntu`, `proxmox`, `grizzley`, and `ice` still expose `192.168.30.x` addresses
+- Those addresses were retained intentionally in this cleanup wave because they are more likely to back IoT-side service access than the removed legacy `192.168.1.x` or staging `192.168.40.x` paths
+- Removing them should be a per-service maintenance task, not a bulk cleanup operation
--- a/homelab/docs/unifi-network-optimization-plan.md
+++ b/homelab/docs/unifi-network-optimization-plan.md
@@ -0,0 +1,362 @@
+---
+project:
+  name: UniFi Network Performance and Security Optimization Plan
+  status: planning
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-16
+  updated: 2026-03-17
+  description: Planning-only document for UniFi segmentation, firewall optimization, and host placement based on live controller data
+  goals:
+    - Define a recommended target zone matrix for trusted, guest, IoT, staging, server, and VPN traffic
+    - Identify firewall policies to keep, tighten, or retire without applying live changes yet
+    - Map homelab hosts and service classes to the best VLAN and SSID strategy
+  priority: high
+  tags: [unifi, network, firewall, performance, security, planning]
+---
+
+# UniFi Network Performance and Security Optimization Plan
+
+## Overview
+
+This document captures recommended UniFi network improvements based on a live controller review performed on 2026-03-17 and a same-day pull of the latest authoritative host repositories.
+
+This is a planning document only.
+
+- No firewall policies, zones, VLAN assignments, SSIDs, or client placements were changed while preparing this document.
+- Current-state notes are based on live UniFi data available from the local controller at `https://192.168.1.1`.
+- Host placement recommendations were cross-checked against the latest pulled host repos for `ubuntu`, `grizzley`, `ice`, `proxmox`, `truenas`, and `panda`.
+- Existing cleanup work in [[../tasks/unifi-firewall-cleanup-plan.md|UniFi Firewall Cleanup Plan]] should be treated as historical context, not the final source of truth for the current live posture.
+
+## Live Snapshot
+
+### Controller and Inventory
+
+- Controller: UniFi Cloud Gateway Ultra (`UDRULT`)
+- UniFi Network version: `10.1.85`
+- UniFi devices currently visible: `4`
+- Live clients currently visible: `43`
+- Wireless networks currently visible: `3`
+- VPN servers currently visible: `1` (`UGC WireGuard`)
+
+### Current Network and Zone Mapping
+
+| Network | Subnet | VLAN | Current Zone | Notes |
+|--------|--------|------|--------------|-------|
+| Default | 192.168.1.0/24 | native | Management | Contains core infrastructure today |
+| Family of D. | 192.168.10.0/24 | 10 | Internal | Trusted user devices now separated from Management |
+| Will of D. (Guest) | 192.168.20.0/24 | 20 | Guest | Good logical placement |
+| Will of D. IoT | 192.168.30.0/24 | 30 | IoT | Good logical placement |
+| Staging | 192.168.40.0/24 | 40 | Staging | Good logical placement |
+| Production | 192.168.50.0/24 | 50 | Servers | Good logical placement |
+| UGC WireGuard | 192.168.4.0/24 | n/a | Vpn | Keep as a dedicated VPN trust boundary |
+
+### Implementation State
+
+First-wave UniFi changes were applied on 2026-03-17:
+
+- `Family of D.` was moved from `Management` into `Internal`
+- `Management` was reduced to `Default` only
+- New `Internal` user-defined allow rules were created for:
+  - `Internal -> Servers HTTPS`
+  - `Internal -> Servers HTTP`
+  - `Internal -> IoT`
+  - `Internal -> Staging`
+- Logging was enabled on selected user-defined edge and VPN policies:
+  - `Allow External to Web Proxy`
+  - `Vpn to Management`
+  - `MBA VPN to Management`
+  - `Vpn to Servers`
+  - `Vpn to IoT`
+- Logging was also enabled on selected user-defined east-west policies for observability:
+  - `Management to Servers`
+  - `Management to IoT`
+  - `Management to Guest`
+  - `Internal to Servers HTTPS`
+  - `Internal to Servers HTTP`
+  - `Internal to IoT`
+  - `Internal to Staging`
+  - `IoT to Jellyfin`
+  - `IoT to Traefik`
+- Staged reservation cleanup succeeded for:
+  - `ubuntu` -> `192.168.1.61`
+  - `proxmox` -> `192.168.1.11`
+  - `grizzley` -> `192.168.10.145`
+  - `ice` -> `192.168.10.178` and `192.168.50.197`
+  - `homeassistant` -> `192.168.30.196`
+- First host-side migration execution succeeded for `truenas` by moving its default route to `192.168.50.1` while preserving reachability on both `192.168.50.12` and `192.168.1.12`
+- First host-side migration execution also succeeded for `proxmox` and `ubuntu` by moving their active default routes to `192.168.50.1` while preserving SSH reachability on both their legacy and server-side addresses
+- Final legacy-address removal has now succeeded for `proxmox`, `ubuntu`, and `truenas` on the old `192.168.1.x` paths
+- Dual-network cleanup succeeded for `grizzley` and `ice` by removing active Wi-Fi participation on `Family of D.`
+- Staging-side `192.168.40.x` host paths have been removed from `truenas`, `grizzley`, and `ice`
+
+Two system-defined port-forward policies were not modified because the controller rejects edits to them via the integration API:
+
+- `Allow Port Forward HTTP`
+- `Allow Port Forward HTTPS`
+
+### Immediate Current-State Risks
+
+- Several homelab hosts still appear on more than one network, or have records that suggest multiple interfaces. That is useful when intentional, but it reduces the value of zone-based policy if it is not tightly documented.
+- The stale secondary TrueNAS reservation at `192.168.1.145` has now been cleared, and the legacy `192.168.1.12` host address has been removed.
+- UniFi client inventory can still lag behind host-side changes when a single MAC participates in multiple VLANs; current stale observations should be treated as controller state lag unless they persist after refresh/age-out.
+- The remaining host-side cleanup question is whether the infrastructure `192.168.30.x` service-side addresses are all intentionally needed; they were retained in this wave as the conservative default pending per-service validation.
+- Logging is now enabled on selected user-defined edge and VPN policies, but many block rules and system-defined edge rules still do not log.
+- Internet-facing exposure still exists for reverse proxy traffic, including `HTTP` and `HTTPS`, and should be reviewed for minimum required surface area.
+
+## Authoritative Host Repo Alignment
+
+The latest pulled host repos describe the intended authoritative network identity below. Where live UniFi observations differ, that drift should be treated as a design and documentation issue to resolve before major firewall cleanup.
+
+| Host | Authoritative Repo Intent | Live UniFi Observation | Planning Impact |
+|------|---------------------------|------------------------|-----------------|
+| ubuntu | `192.168.50.61`, primary Docker host, primary Traefik, Gitea, Vaultwarden, Authentik, OpenCode | currently visible at `192.168.1.61` | Highest-priority host placement drift because many public and internal services depend on it |
+| grizzley | `192.168.50.84`, Pi edge ingress | currently visible at `192.168.10.145`, with another extra live record | Edge ingress should not share a user-trust segment unless explicitly intended |
+| ice | `192.168.50.197`, control-plane OpenCode | visible at `192.168.50.197` and `192.168.10.178` | Dual placement weakens the meaning of `Servers` versus user-trusted access |
+| proxmox | `192.168.50.11`, hypervisor | currently visible at `192.168.1.11` | Hypervisor should remain in an infrastructure-only network |
+| truenas | `192.168.50.12`, storage-only host | visible at `192.168.1.12` and also referenced as `192.168.50.12` | Storage admin paths should be explicit and documented if multi-homed |
+| panda | Home Assistant UI at `192.168.30.196`, SSH endpoint at `192.168.50.196` | live Home Assistant client at `192.168.30.196`; separate admin SSH endpoint not shown in client list | This is a valid split-access pattern and should be preserved intentionally |
+
+### What The Latest Host Repos Change In This Plan
+
+- `ubuntu` is more security-sensitive than the first draft implied because its latest host repo now clearly tracks hardened public edge, `Gitea`, and `Vaultwarden` state. That raises the priority of narrowing public exposure and protecting admin paths.
+- `grizzley` and `ice` are clearly intended to be `Servers`-zone infrastructure nodes in their host repos, so their current appearances on `Family of D.` should be treated as drift unless there is a deliberate dual-network design.
+- `panda` is not simply an IoT appliance. The latest host repo explicitly documents an app endpoint on `192.168.30.196` and a separate SSH/admin endpoint on `192.168.50.196`, which supports keeping Home Assistant functionally close to IoT while retaining a cleaner administrative path.
+- `proxmox` is not just a hypervisor endpoint. Its latest repo also documents server-side infrastructure such as `traefik-lxc` at `192.168.50.115`, `alpine-adguard` at `192.168.50.157`, and other server-segment workloads that should stay out of user and guest networks.
+- `truenas` latest repo content is partially historical, but the broader homelab catalogs and current host metadata still point to `192.168.50.12` as the intended storage address. The plan should therefore prefer the `Production`/server-side path over the current `Default` visibility.
+
+## Recommended Target Zone Matrix
+
+### Recommended Zone Roles
+
+| Zone | Recommended Networks | Purpose |
+|------|----------------------|---------|
+| Management | Default | Admin workstations, controller access, network gear, hypervisor, storage |
+| Internal | Family of D. | Trusted daily-use family devices |
+| Guest | Will of D. (Guest) | Visitor and untrusted personal devices |
+| IoT | Will of D. IoT | Smart home and appliance-style devices |
+| Staging | Staging | Lab, test, and temporary workloads |
+| Servers | Production | Public and internal homelab application hosts |
+| Vpn | UGC WireGuard | Remote admin and controlled remote access |
+| External | WANs | Internet |
+
+### Recommended Connectivity Matrix
+
+| From -> To | Management | Internal | Guest | IoT | Staging | Servers | Vpn | External |
+|------------|------------|----------|-------|-----|---------|---------|-----|----------|
+| Management | Allow | Limited | Limited | Allow | Allow | Allow | Allow | Allow |
+| Internal | Deny by default | Allow | Deny | Limited | Limited | Limited | Deny | Allow |
+| Guest | Deny | Deny | Allow | Deny | Deny | Deny | Deny | Allow |
+| IoT | Deny | Deny | Deny | Allow | Deny | Limited | Deny | Allow |
+| Staging | Limited | Limited | Deny | Deny | Allow | Allow | Deny | Allow |
+| Servers | Limited | Return only | Deny | Limited | Allow | Allow | Deny | Allow |
+| Vpn | Limited | Deny by default | Deny | Limited | Limited | Allow | Allow | Allow |
+
+### Matrix Interpretation
+
+- `Management` should be the only zone with broad administrative reach.
+- `Internal` should access `Servers` through specific app ports and URLs, not broad all-port access.
+- `Guest` should have internet access only.
+- `IoT` should keep internet access plus narrow exceptions for services such as media streaming, reverse proxy access, and Home Assistant as needed.
+- `Vpn` should be treated as a separate zone, not as implicit `Management`. Default VPN access should reach only the minimum required destinations.
+
+## Firewall Recommendation Set
+
+The live policy export reported `236` total policies. The visible slice used for this review showed `102` `ALLOW` and `98` `BLOCK` policies in the first `200` entries. Recommendations below focus on the posture that was visible live and should be validated against a full export before any change window.
+
+### Keep
+
+Keep these rule patterns, assuming they are already scoped correctly to the intended hosts and ports:
+
+- System defaults such as `Block Invalid Traffic`, `Block All Traffic`, and `Allow Return Traffic`
+- `Guest -> External`
+- Intra-zone traffic where explicitly needed (`Internal`, `Guest`, `IoT`, `Servers`)
+- Reverse proxy ingress to the public web entry point over `HTTPS`
+- Narrow published access for `Gitea` and `Vaultwarden` behind the hardened public edge on `ubuntu`
+- Narrow `IoT -> Servers` exceptions for media and automation services such as Jellyfin, Traefik, and Home Assistant
+- `Vpn -> Servers` for approved administrative and remote-access workflows
+
+### Tighten
+
+These items present the best mix of security and operational benefit:
+
+1. Separate `Family of D.` from `Management`
+   - Move `Family of D.` out of `Management` and into `Internal`
+   - Do this before treating `Management` rules as a true admin trust boundary
+
+2. Restrict VPN reach
+   - Keep `Vpn -> Servers` for normal remote admin
+   - Narrow `Vpn -> Management` to only the ports and hosts needed for network and infrastructure administration
+   - Narrow `Vpn -> IoT` to specific automation and troubleshooting needs only
+
+3. Reduce internet-facing exposure
+    - Keep `HTTPS` ingress for the reverse proxy
+    - Keep `HTTP` only if it is still required for redirect handling or ACME validation
+    - Replace any broad `External -> Servers` or `External -> Web Proxy` rules with host and port scoped rules where possible
+    - Prioritize review of the `ubuntu` edge because that host now clearly carries `Traefik`, `Gitea`, and `Vaultwarden` in the latest host repo
+
+4. Reduce rule overlap and duplication
+   - Review overlapping VPN rules such as `Vpn to Servers` and `Allow WireGuard to Services (Fixed)`
+   - Review repeated return-path rules such as the visible duplicate `Management to IoT (Return)` entries
+   - Prefer one clearly named policy per intent over multiple partially overlapping policies
+
+5. Turn on useful logging
+   - Enable logging on selected block rules and edge-facing allow rules
+   - Minimum recommended logging targets: `External -> *`, `Vpn -> Management`, `Vpn -> Servers`, and denied `Guest` or `IoT` inter-zone attempts
+
+### Retire After Validation
+
+Retire or replace these rule patterns only after confirming there is no hidden dependency:
+
+- Broad all-port `Internal -> Servers` allow rules
+- Broad all-port `IoT -> Servers` allow rules that are no longer needed once application-specific exceptions exist
+- Duplicate return-path rules that do not add new behavior
+- `HTTP` port-forward exposure if `HTTPS` plus redirect/ACME alternatives cover the same use case
+- Legacy rules tied to decommissioned hosts, empty zones, or old service names
+
+### Naming and Policy Hygiene
+
+Use policy names that always match the real source, destination, and purpose.
+
+Recommended naming pattern:
+
+`<source zone> -> <destination zone> | <service or intent> | <action>`
+
+Examples:
+
+- `Internal -> Servers | HTTPS apps | ALLOW`
+- `IoT -> Servers | Jellyfin 8096 | ALLOW`
+- `Guest -> Internal | default deny | BLOCK`
+- `Vpn -> Management | admin https | ALLOW`
+
+## Recommended Host and Service Placement
+
+### Core Homelab Hosts
+
+| Asset | Current Observed Placement | Recommended Placement | Access Model | Notes |
+|------|-----------------------------|-----------------------|--------------|-------|
+| UniFi gateway and AP management IPs | Default | Management | Admin only | Keep network gear on the management network |
+| Proxmox | Default (`192.168.1.11`) | Management or dedicated infrastructure VLAN, wired | Management and VPN only | Latest host repo still treats Proxmox as infrastructure-only; also protect its hosted `traefik-lxc` and `adguard` style workloads |
+| TrueNAS | Default (`192.168.1.12`), plus preferred lookup for `192.168.50.12` | Management primary, optional secondary storage path only if intentional | Management and selected servers | Prefer the documented `192.168.50.12` server-side identity and document any secondary path explicitly |
+| Ubuntu primary Docker host | Default (`192.168.1.61`) | Servers long-term, or documented dual-home during migration | Internal via reverse proxy, Management for admin | Latest host repo confirms this host carries the primary public edge plus `Gitea`, `Vaultwarden`, Authentik, and core apps |
+| Grizzley | Family (`192.168.10.145`), plus another live record | Servers, wired | Reverse proxy and admin paths only | Latest host repo intent is Pi edge ingress and control traffic, not consumer trusted-client placement |
+| Ice | Production (`192.168.50.197`) and Family (`192.168.10.178`) | Servers primary, optional dedicated management path only if justified | Management and approved service paths | Latest host repo intent is control-plane infrastructure, so current family-network presence should be treated as drift until justified |
+| Panda / Home Assistant OS | live Home Assistant endpoint at `192.168.30.196`; latest host repo also documents SSH at `192.168.50.196` | Keep app plane in IoT; keep admin plane on server/management side | Management, Internal, and selected IoT flows | This split model is preferable to exposing full Home Assistant administration on a user or guest network |
+
+### Additional Server-Segment Assets From Latest Host Repos
+
+| Asset | Documented Address | Recommended Zone | Notes |
+|------|--------------------|------------------|-------|
+| Proxmox `traefik-lxc` | `192.168.50.115` | Servers | Keep isolated from `Internal` except through intended app ports |
+| Proxmox `alpine-adguard` | `192.168.50.157` | Servers or Management | DNS infrastructure deserves tighter access than general apps |
+| Home Assistant SSH admin endpoint | `192.168.50.196` | Management or Servers | Keep SSH/admin access distinct from the IoT-side app endpoint |
+
+### Service Placement Guidance
+
+| Service Class | Recommended Zone | Client Access Pattern |
+|--------------|------------------|-----------------------|
+| Reverse proxy / ingress (Traefik) | Servers | `Internal`, `Management`, and approved `Vpn` clients over `80/443` |
+| Public identity and secrets apps (`Authentik`, `Gitea`, `Vaultwarden`) | Servers | `Management` and `Internal` over `HTTPS`; expose externally only through tightly scoped edge policies |
+| Storage and virtualization admin (TrueNAS, Proxmox) | Management | `Management` and limited `Vpn` only |
+| Media services (Jellyfin and related) | Servers | `Internal` by default, `IoT` only for TVs, streamers, and casting targets that need it |
+| Home automation (Home Assistant) | IoT app plane plus management-side SSH/admin plane | `Management`, selected `Internal`, selected `IoT` |
+| Test workloads | Staging | `Management`, selected `Internal`, and `Servers` as required |
+
+### Client and SSID Placement Guidance
+
+| Client Type | Recommended Network | Recommended SSID Strategy | Notes |
+|-------------|---------------------|---------------------------|-------|
+| Primary family phones, tablets, laptops | Internal (`Family of D.`) | `Family of D.` | Trusted user devices should not live in `Management` |
+| Visitors | Guest | `Will of D.` | Keep internet-only |
+| TVs, speakers, streamers, thermostats, hubs, plugs, lamps | IoT | `Will of D. IoT` | Keep appliance devices isolated and use narrow service exceptions |
+| Baby monitors | IoT | `Will of D. IoT` | Current live placement in `Family of D.` should be reviewed and likely moved |
+| Admin workstation(s) | Internal by default; optional future dedicated admin SSID/VLAN | `Family of D.` today | Add a dedicated admin network only if there is a real operational need |
+
+## Performance Recommendations
+
+### Wireless Design
+
+- Keep SSID count low. The current three-SSID model is reasonable and should scale better than adding more SSIDs unless there is a strong operational need.
+- Keep `Family of D.` optimized for higher-performance personal devices on `5 GHz` and `6 GHz` where supported.
+- Keep `Will of D. IoT` focused on reliability rather than peak throughput. Many smart devices behave better on `2.4 GHz`, and mixed-band IoT SSIDs should be reviewed carefully for compatibility issues.
+- Keep guest traffic off trusted SSIDs. That protects airtime and reduces unnecessary broadcast and discovery noise on the primary user network.
+- For voice and discovery reliability, use `Multicast to Unicast` on user SSIDs that need iPhone calling or nearby device discovery.
+- Keep `Multicast and Broadcast Blocker` off on user SSIDs unless there is a specific, tested reason to suppress discovery traffic.
+- If roaming quality matters for voice devices, prefer `Fast Roaming` plus `BSS Transition` on trusted SSIDs and validate client behavior after each change.
+
+### Verified SSID Posture
+
+The live UniFi controller was updated on 2026-04-13 to support iPhone WiFi calling and gate control traffic.
+
+| SSID | Multicast to Unicast | Fast Roaming | BSS Transition | Multicast/Broadcast Blocker |
+|------|----------------------|--------------|----------------|-----------------------------|
+| `Will of D.` | enabled | enabled | enabled | off |
+| `Will of D. IoT` | enabled | disabled | enabled | off |
+| `Family of D.` | enabled | enabled | enabled | off |
+| `Will of D. IoT 2.4G` | enabled | n/a | enabled | off |
+
+This aligns the trusted SSID with the same multicast and roaming posture already used on `Family of D.`.
+
+### Wired and Infrastructure Placement
+
+- Prefer wired-only placement for infrastructure hosts wherever possible.
+- Reduce or eliminate unintended dual-homed infrastructure. A host that sits in multiple trust zones is harder to reason about and easier to misconfigure.
+- Keep reverse proxy, server, and storage paths off Wi-Fi entirely.
+
+### Network Hygiene That Helps Performance Too
+
+- Move non-user appliance devices, especially the visible baby monitors, out of `Family of D.` and into `IoT`.
+- Keep media exceptions narrow so background service discovery does not become broad east-west traffic.
+- Review AP client distribution and radio settings only after collecting AP-side statistics, since transmit power and minimum RSSI changes should be data-driven.
+
+## Security Recommendations
+
+### Highest-Priority Changes to Plan
+
+1. Re-establish `Management` as a real infrastructure-only trust boundary
+2. Turn on useful firewall logging for edge and deny rules
+3. Move live host addressing closer to the authoritative host repo intent for `ubuntu`, `grizzley`, `ice`, `proxmox`, and `truenas`
+4. Narrow VPN access to the smallest practical set of hosts and ports
+5. Review and minimize all public `HTTP` exposure, especially around the `ubuntu` public edge
+6. Remove or consolidate duplicate and overlapping allow rules
+
+### Medium-Priority Changes to Plan
+
+1. Re-home server-class hosts so they align with the intended `Servers` zone
+2. Review whether Home Assistant should remain in `IoT` or move to a dedicated automation segment later
+3. Audit wildcard DNS usage to confirm only intended clients can reach sensitive admin applications
+4. Decide whether `panda`'s split app/admin path should become the standard pattern for other appliance-style services
+
+## Proposed Rollout Order
+
+No changes have been applied yet. When this work is scheduled, the lowest-risk order is:
+
+1. Export and back up current zones and policies
+2. Enable logging on selected deny and edge allow rules
+3. Reconcile live host IP placement with the latest authoritative host repos
+4. Correct the `Management` versus `Internal` network assignments
+5. Move obvious consumer/IoT devices out of `Family of D.`
+6. Review and remove duplicate or overly broad firewall policies
+7. Re-home server-class hosts where needed
+8. Re-test reverse proxy, media, Home Assistant, VPN, and admin paths after each change set
+
+## Open Questions Before Execution
+
+- Should the Ubuntu primary Docker host stay on `Default` for operational simplicity, or should it move fully into `Servers`?
+- Are the extra `grizzley` and `ice` live placements intentional dual-homing, or leftover records/interfaces to clean up?
+- Should `proxmox` and `truenas` keep any `Default`-side presence, or should they be normalized to their documented `192.168.50.x` identities?
+- Is public `HTTP` still required for any production workflow?
+- Does Home Assistant need to remain on `IoT`, or is the current split model of IoT app access plus management-side SSH the desired long-term pattern?
+
+## Decision Summary
+
+If no larger redesign is desired, the minimum high-value outcome is:
+
+- `Management` = infrastructure only
+- `Internal` = family/trusted user devices
+- `Guest` = internet only
+- `IoT` = appliances with narrow exceptions
+- `Servers` = homelab application hosts
+- `Vpn` = remote access with explicit least-privilege rules
+
+That structure provides the clearest improvement in both security and troubleshooting without requiring a full network rebuild.
--- a/homelab/docs/unifi-post-migration-summary-2026-03-17.md
+++ b/homelab/docs/unifi-post-migration-summary-2026-03-17.md
@@ -0,0 +1,64 @@
+---
+project:
+  name: UniFi Post-Migration Summary 2026-03-17
+  status: active
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Final summary of UniFi zoning, host migration, and rollback references after the March 17 cleanup wave
+  goals:
+    - Record the end state after network cleanup
+    - Provide a quick reference for what changed and what remains
+    - Link operators to rollback and runbook notes
+  priority: high
+  tags: [unifi, post-migration, summary, rollback]
+---
+
+# UniFi Post-Migration Summary 2026-03-17
+
+## Completed Changes
+
+- `Family of D.` moved from `Management` to `Internal`
+- `Management` reduced to `Default` only
+- New `Internal` access rules created for `Servers`, `IoT`, and `Staging`
+- Logging enabled on key edge, VPN, and east-west user-defined policies
+- Legacy `192.168.1.x` host paths removed from:
+  - `proxmox`
+  - `ubuntu`
+  - `truenas`
+- Wi-Fi participation removed from:
+  - `grizzley`
+  - `ice`
+- Staging-side `192.168.40.x` host paths removed from:
+  - `truenas`
+  - `grizzley`
+  - `ice`
+- Staging access policies disabled:
+  - `Vpn to Staging`
+  - `Allow Servers to Staging`
+
+## Current Host End State
+
+| Host | Current Primary Addressing | Notes |
+|------|----------------------------|-------|
+| `ubuntu` | `192.168.50.61`, `192.168.30.61` | App edge healthy; UniFi may still show stale alternate observations |
+| `proxmox` | `192.168.50.11`, `192.168.30.11` | Legacy `192.168.1.11` removed |
+| `truenas` | `192.168.50.12` | Legacy `192.168.1.12` and staging `192.168.40.12` removed |
+| `grizzley` | `192.168.50.84`, `192.168.30.84` | Wi-Fi removed |
+| `ice` | `192.168.50.197`, `192.168.30.197` | Wi-Fi removed |
+
+## Remaining Follow-Up
+
+- Allow UniFi controller client history to age out or refresh
+- Keep remaining `192.168.30.x` service-side paths in place for now because they appear to support intentional IoT-side service adjacency; remove them only after per-service validation
+- Review public `HTTP` exposure and any duplicate firewall rules
+- `grizzley` still has one disconnected/no-IP UniFi history record; a direct delete attempt returned `api.err.NotFound`, so this currently looks like controller-history lag
+- `TrueNAS` is intentionally exposed through the local-only route `truenas.local.tophermayor.com`; `truenas.tophermayor.com` is not the canonical admin URL
+
+## References
+
+- Canonical current-state reference: [`docs/UNIFI_NETWORK_INFRASTRUCTURE.md`](/Users/christopherjohnsisonmayor/Infrastructure/core/docs/UNIFI_NETWORK_INFRASTRUCTURE.md)
+- Runbook: [[unifi-host-migration-runbook.md|UniFi Host Migration Runbook]]
+- Rollback: [[unifi-rollback-2026-03-17.md|UniFi Rollback 2026-03-17]]
+- Execution details: [[unifi-execution-plan.md|UniFi Execution Plan]]
--- a/homelab/docs/unifi-rollback-2026-03-17.md
+++ b/homelab/docs/unifi-rollback-2026-03-17.md
@@ -0,0 +1,79 @@
+---
+project:
+  name: UniFi Rollback 2026-03-17
+  status: active
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-03-17
+  updated: 2026-03-17
+  description: Rollback notes for the first UniFi zone and policy changes applied on 2026-03-17
+  goals:
+    - Restore pre-change zone membership if needed
+    - Record new policy IDs created during the first change wave
+    - Provide a safe reference before the next production network cutover
+  priority: high
+  tags: [unifi, rollback, firewall, zones, change-management]
+---
+
+# UniFi Rollback 2026-03-17
+
+## Backups
+
+Pre-change snapshots were saved to:
+
+- `/private/tmp/unifi-change-backups-20260317/zones-before.json`
+- `/private/tmp/unifi-change-backups-20260317/policies-before.json`
+
+## Changes Applied
+
+### Zone Changes
+
+Before:
+
+- `Management` -> `Default`, `Family of D.`
+- `Internal` -> empty
+
+After:
+
+- `Management` -> `Default`
+- `Internal` -> `Family of D.`
+
+### New User-Defined Policies Created
+
+| ID | Name |
+|----|------|
+| `ccc50b02-81ee-4e85-a994-87228b28d6ef` | `Internal to Servers HTTPS` |
+| `07e03549-c022-4e90-981d-154269dc0471` | `Internal to Servers HTTP` |
+| `6a7c0209-3d75-4826-bc61-ab98d9fe3ce3` | `Internal to IoT` |
+| `977017d1-7600-48b1-9f04-e76eed01ca2c` | `Internal to Staging` |
+
+### Existing Policies Modified
+
+Logging enabled on:
+
+- `89de6586-d284-4ce0-8e1f-8fea428c4af4` `Allow External to Web Proxy`
+- `b13ad681-3d4c-4cb0-b186-70678087ddc9` `Vpn to Management`
+- `92c1b619-ef7e-4b74-aaca-e57851abe962` `MBA VPN to Management`
+- `5e6f26c2-1487-4e92-b682-6bcbb987b913` `Vpn to Servers`
+- `3b64e36a-a452-4ab0-96b5-6088efb2330c` `Vpn to IoT`
+
+## Rollback Steps
+
+If the `Family of D.` cutover needs to be reversed before the next maintenance window:
+
+1. Move `Family of D.` back into `Management`
+2. Remove `Family of D.` from `Internal`
+3. Keep the new `Internal` user-defined rules disabled or delete them if they are no longer needed
+4. Re-test access from a `192.168.10.x` client to `Servers`, `IoT`, and `Staging`
+
+## Rollback Zone State
+
+Desired rollback state:
+
+- `Management` -> `bcf0598f-9361-4306-9024-9817fd841836`, `fb44c9bf-1534-4a98-9c7e-6aee4bf4069a`
+- `Internal` -> no networks assigned
+
+## Notes
+
+- `policies-before.json` is only a `200/236` visible slice from the original tool output; use live API reads plus the saved zone snapshot for the most accurate rollback reference.
+- System-defined edge rules such as `Allow Port Forward HTTP` and `Allow Port Forward HTTPS` were not modified.
--- a/homelab/docs/unifi-wifi-calling-optimization.md
+++ b/homelab/docs/unifi-wifi-calling-optimization.md
@@ -0,0 +1,198 @@
+---
+project:
+  name: WiFi Calling Optimization Runbook
+  status: completed
+  category: infrastructure
+  source: homelabagentroot
+  created: 2026-04-01
+  updated: 2026-04-01
+  description: Live configuration and runbook for AT&T WiFi calling optimization on UniFi UCG Ultra
+  carrier: AT&T
+  affected_ssids: [Family of D., Will of D. (Guest)]
+  affected_vlans: [10, 20, 40, 50, 1]
+  tags: [unifi, wifi, wifi-calling, att, qos, 802.11r]
+---
+
+# WiFi Calling Optimization Runbook
+
+## Overview
+
+Optimizations applied to the UniFi Cloud Gateway Ultra (UCG Ultra) to support reliable AT&T WiFi calling across all non-IoT VLANs.
+
+**Applied:** 2026-04-01  
+**Controller:** `https://192.168.1.1` (UniFi Network 10.1.85)  
+**Site ID:** `88f7af54-98f8-306a-a1c7-c9349722b1f6`
+
+## AT&T WiFi Calling Requirements
+
+AT&T WiFi calling uses IPSec/IKEv2 tunnels to AT&T infrastructure:
+
+| Protocol | Port | Purpose |
+|----------|------|---------|
+| IKEv2 | UDP 500 | Key exchange and tunnel establishment |
+| IPSec NAT-T | UDP 4500 | Encapsulated ESP through NAT |
+| SIP (fallback) | UDP/TCP 5060, 5061 | Session initiation (rarely used by AT&T) |
+| RTP Media | UDP 10000-20000 | Voice media (inside IPSec tunnel) |
+
+**Key insight:** RTP media is encrypted inside the IPSec tunnel, so DSCP marking on outer packets has limited effect. The biggest quality improvements come from:
+1. Fast roaming (802.11r) to eliminate AP handoff gaps
+2. Reducing airtime contention (multicast-to-unicast)
+3. Ensuring firewall allows all required ports
+
+## Changes Applied
+
+### 1. Family of D. SSID (`b2784680-7b04-4c8a-9098-19aced53fc89`)
+
+**API:** `PUT /sites/{siteId}/wifi/broadcasts/b2784680-7b04-4c8a-9098-19aced53fc89`
+
+| Setting | Before | After | Impact |
+|---------|--------|-------|--------|
+| `fastRoamingEnabled` | `false` | `true` | 802.11r - eliminates re-auth gap during AP roaming |
+| `wpa3FastRoamingEnabled` | `false` | `true` | WPA3 Fast Transition for WPA3-only clients |
+| `multicastToUnicastConversionEnabled` | `false` | `true` | Reduces airtime waste from mDNS/SSDP broadcasts |
+
+**Already enabled (unchanged):**
+- `bandSteeringEnabled`: `true` - prefers 5/6GHz over 2.4GHz
+- `bssTransitionEnabled`: `true` - 802.11v neighbor reports
+- `broadcastingFrequenciesGHz`: `[5, 6, 2.4]` - tri-band
+
+### 2. Will of D. Guest SSID (`a2cdccb6-d054-47ad-ab14-62cae625b6af`)
+
+**API:** `PUT /sites/{siteId}/wifi/broadcasts/a2cdccb6-d054-47ad-ab14-62cae625b6af`
+
+| Setting | Before | After | Impact |
+|---------|--------|-------|--------|
+| `bssTransitionEnabled` | `false` | `true` | 802.11v - helps guest devices roam efficiently |
+
+**Not changed on Guest:**
+- `fastRoamingEnabled`: remains `false` (guest devices typically don't need 802.11r)
+- `multicastToUnicastConversionEnabled`: remains `false`
+
+### 3. Traffic Matching Rule
+
+**API:** `POST /sites/{siteId}/traffic-matching-lists`
+
+| Property | Value |
+|----------|-------|
+| Name | `WiFi Calling Ports` |
+| ID | `e7f06077-1a11-4355-88df-185837ba29df` |
+| Type | `PORTS` |
+| Ports | UDP 500, 4500, 5060, 5061 |
+
+**Note:** RTP port range (10000-20000) was not added because the UniFi integration API does not support `PORT_NUMBER_RANGE` in traffic matching list items. The signaling ports (500, 4500) are the most critical for tunnel establishment.
+
+## Firewall Verification
+
+All zones already have outbound access to External (internet), so no firewall changes were needed:
+
+| Zone | External Access | Status |
+|------|----------------|--------|
+| Internal (`1c79c8c2`) | Allow All Traffic (system) | OK |
+| Guest (`b8d0e4f2`) | Guest to External (idx 10000) + fallback | OK |
+| Staging (`dc406f85`) | Allow All Traffic (system) | OK |
+| Management (`ea466cdf`) | Allow All Traffic (system) | OK |
+| DMZ (`4fb011b4`) | Allow All Traffic (system) | OK |
+
+## Current SSID Configuration (Post-Optimization)
+
+| SSID | Bands | Security | Fast Roaming | BSS Transition | Mcast→Ucast |
+|------|-------|----------|--------------|----------------|-------------|
+| Family of D. | 2.4/5/6 GHz | WPA2/WPA3 Personal | Enabled | Enabled | Enabled |
+| Will of D. (Guest) | 2.4/5 GHz | WPA2 Personal | Disabled | Enabled | Disabled |
+| Will of D. IoT | 2.4 GHz only | WPA2 Personal | Disabled | Disabled | Disabled |
+
+## Rollback Procedures
+
+### Rollback Family of D. Fast Roaming
+
+If legacy devices (older IoT, smart TVs, casting devices) experience connectivity issues:
+
+```bash
+curl -k -H "X-API-KEY: $UNIFI_API_KEY" -H "Content-Type: application/json" -X PUT \
+  -d '{
+    "type": "STANDARD",
+    "name": "Family of D.",
+    "enabled": true,
+    "network": {"type": "SPECIFIC", "networkId": "fb44c9bf-1534-4a98-9c7e-6aee4bf4069a"},
+    "securityConfiguration": {
+      "type": "WPA2_WPA3_PERSONAL",
+      "fastRoamingEnabled": false,
+      "passphrase": "ILoveNaomi2025",
+      "pmfMode": "OPTIONAL",
+      "saeConfiguration": {"anticloggingThresholdSeconds": 5, "syncTimeSeconds": 5},
+      "wpa3FastRoamingEnabled": false
+    },
+    "multicastToUnicastConversionEnabled": false,
+    "clientIsolationEnabled": false,
+    "hideName": false,
+    "uapsdEnabled": false,
+    "broadcastingFrequenciesGHz": [5, 6, 2.4],
+    "bandSteeringEnabled": true,
+    "arpProxyEnabled": false,
+    "bssTransitionEnabled": true,
+    "advertiseDeviceName": false
+  }' \
+  "https://192.168.1.1/proxy/network/integration/v1/sites/88f7af54-98f8-306a-a1c7-c9349722b1f6/wifi/broadcasts/b2784680-7b04-4c8a-9098-19aced53fc89"
+```
+
+### Rollback Guest BSS Transition
+
+```bash
+curl -k -H "X-API-KEY: $UNIFI_API_KEY" -H "Content-Type: application/json" -X PUT \
+  -d '{
+    "type": "STANDARD",
+    "name": "Will of D.",
+    "enabled": true,
+    "network": {"type": "SPECIFIC", "networkId": "02364634-a782-4b58-a33b-48b48f492210"},
+    "securityConfiguration": {
+      "type": "WPA2_PERSONAL",
+      "fastRoamingEnabled": false,
+      "passphrase": "EmergencyFood2025"
+    },
+    "multicastToUnicastConversionEnabled": false,
+    "clientIsolationEnabled": false,
+    "hideName": false,
+    "uapsdEnabled": false,
+    "broadcastingFrequenciesGHz": [5, 2.4],
+    "bandSteeringEnabled": true,
+    "arpProxyEnabled": false,
+    "bssTransitionEnabled": false,
+    "advertiseDeviceName": false
+  }' \
+  "https://192.168.1.1/proxy/network/integration/v1/sites/88f7af54-98f8-306a-a1c7-c9349722b1f6/wifi/broadcasts/a2cdccb6-d054-47ad-ab14-62cae625b6af"
+```
+
+### Delete Traffic Matching Rule
+
+```bash
+curl -k -H "X-API-KEY: $UNIFI_API_KEY" -X DELETE \
+  "https://192.168.1.1/proxy/network/integration/v1/sites/88f7af54-98f8-306a-a1c7-c9349722b1f6/traffic-matching-lists/e7f06077-1a11-4355-88df-185837ba29df"
+```
+
+## Troubleshooting
+
+### WiFi Call Drops During Roaming
+
+1. Verify fast roaming is enabled: check `fastRoamingEnabled` on the SSID
+2. Check if the phone supports 802.11r (most phones since ~2018 do)
+3. Look for excessive AP handoffs in UniFi client history
+4. Check RSSI values - phones may be roaming too aggressively
+
+### WiFi Call Fails to Establish
+
+1. Verify firewall allows UDP 500, 4500 outbound from the client's zone
+2. Check DNS resolution - AT&T WiFi calling needs to resolve carrier domains
+3. Verify no DPI/IDS rules are blocking IPSec traffic
+4. Check if the phone is on the correct SSID (not IoT SSID)
+
+### Poor Call Quality (Jitter/Latency)
+
+1. Check for airtime contention on the AP (too many 2.4GHz clients)
+2. Verify band steering is pushing voice clients to 5/6GHz
+3. Check if multicast-to-unicast is reducing broadcast noise
+4. Review SQM/QoS settings on the WAN interface
+
+## Related Documents
+
+- [[unifi-network-optimization-plan.md|UniFi Network Optimization Plan]]
+- [[unifi-execution-plan.md|UniFi Execution Plan]]
--- a/homelab/entities/aqara-hub-m3.md
+++ b/homelab/entities/aqara-hub-m3.md
@@ -0,0 +1,84 @@
+---
+title: Aqara Hub M3
+created: 2026-05-10
+updated: 2026-05-10
+type: entity
+tags: [hub, matter, zigbee, smart-home, iot, ecosystem]
+confidence: high
+---
+
+# Aqara Hub M3
+
+> Aqara's Matter-compatible smart home hub. Provides a secondary Zigbee coordinator and Matter bridge for Aqara devices, independent of [[home-assistant-connect-zbt-2]].
+
+## Overview
+
+| Field | Value |
+|-------|-------|
+| **Manufacturer** | Aqara |
+| **Model** | Aqara Hub M3 |
+| **Location** | Bedroom |
+| **VLAN** | IoT VLAN 30 |
+| **Protocols** | Zigbee 3.0, Thread, Matter, Wi-Fi |
+| **Matter Support** | Yes — can be commissioned into multiple fabrics |
+
+## Role in the Smart Home
+
+The Hub M3 serves as Aqara's ecosystem bridge:
+
+1. **Aqara Cloud Bridge** — connects Aqara devices to the Aqara cloud app
+2. **Matter Bridge** — exposes paired Aqara Zigbee devices to Matter controllers
+3. **Secondary Zigbee Coordinator** — manages its own Zigbee mesh separate from [[home-assistant-connect-zbt-2]]
+4. **Thread Border Router** — can participate in the Thread mesh
+
+## Connected Aqara Devices
+
+The Hub M3 bridges these devices via Matter:
+
+| Device | Location | Model | Protocol |
+|--------|----------|-------|----------|
+| Aqara Door/Window Sensor | Rooftop | Aqara Door/Window Sensor | Zigbee |
+| Aqara Vibration Sensor T1 | Rooftop | Aqara Vibration Sensor T1 | Zigbee |
+| Aqara Motion Sensor P1 | Living Room | Aqara Motion Sensor P1 | Zigbee |
+| Aqara Light Switch H2 US | Baby Room | Aqara Light Switch H2 US | Zigbee |
+| Aqara Light Switch H2 US | Front Door | Aqara Light Switch H2 US | Zigbee |
+| Aqara Light Switch H2 US | Entrance | Aqara Light Switch H2 US | Zigbee |
+| Aqara Light Switch H2 US | 1st Floor | Aqara Light Switch H2 US | Zigbee |
+| Aqara Colorful Ceiling Light | Baby Room | Colorful Ceiling Light 36W | Zigbee |
+| Aqara Smart Lock U100 | Front Door | Aqara Smart Lock U100 | Zigbee/BLE |
+| Aqara Camera Hub G3 | — | Camera Hub G3 | Wi-Fi |
+| Aqara Video Doorbell G410 | Front Door | Smart Video Doorbell G410 | Wi-Fi/Zigbee |
+
+## Multi-Fabric Architecture
+
+The Hub M3 is a key node in the [[matter-multi-fabric]] setup:
+
+- **Fabric 1 (HA)**: Commissioned into [[panda]]'s Matter fabric via [[home-assistant-connect-zbt-2]]
+- **Fabric 2 (Apple Home)**: Can be commissioned into Apple Home via Apple TV 4K
+- **Fabric 3 (Google Home)**: Can be commissioned into Google Home via Nest Hub
+- **Fabric 4 (Alexa)**: Can be commissioned into Alexa via Echo Dot
+
+Matter multi-admin allows up to 5 fabrics simultaneously.
+
+## Dual Path: ZHA vs Aqara Hub
+
+Some Aqara devices (sensors, switches, lock) are visible through **two paths**:
+
+1. **ZHA path**: Device → Zigbee → Connect ZBT-2 → [[panda]] HA (direct, low-latency)
+2. **Matter Bridge path**: Device → Zigbee → Hub M3 → Matter → HA (bridged, adds latency)
+
+The ZHA path is preferred for automation reliability. The Matter Bridge path is useful for exposing devices to other ecosystems (Apple, Google, Alexa).
+
+## Relationships
+
+- Bridges Aqara devices into [[matter-multi-fabric]]
+- Connected to [[panda]] via Matter integration
+- Works alongside [[home-assistant-connect-zbt-2]] (dual Zigbee mesh)
+- Complemented by Aqara Camera Hub G3 (separate Wi-Fi hub)
+- Paired devices overlap with ZHA coordinator — see dual-path note above
+
+## Configuration Notes
+
+- Thread credentials should match [[home-assistant-connect-zbt-2]]'s Thread network for mesh unity
+- If adding to Apple Home: use Matter pairing code from Aqara app → Apple Home → Add Accessory
+- Hub M3 firmware updates should be applied via Aqara app (not via HA)
--- a/homelab/entities/authentik.md
+++ b/homelab/entities/authentik.md
@@ -0,0 +1,41 @@
+---
+title: authentik
+created: 2026-04-28
+updated: 2026-04-28
+type: entity
+tags: [services, sso, identity]
+sources: []
+---
+
+# authentik
+
+**Role:** SSO identity provider for homelab
+**URL:** https://authentik.tophermayor.com
+**Host:** [[ubuntu]] (Docker)
+
+## Overview
+
+Authentik provides single sign-on for homelab services. It's the central identity provider that other services (Traefik, Jellyfin, Gitea, etc.) delegate to.
+
+## Configuration
+
+- Runs as Docker container on ubuntu
+- Traefik routes `authentik.tophermayor.com` → authentik container
+- Users and applications configured via Authentik web UI
+
+## Services Integrated
+
+Known services using Authentik SSO:
+- [[traefik]] (forward auth)
+- [[gitea]]
+- [[jellyfin]]
+
+## Troubleshooting
+
+See [[sso-authentik]] skill for Authentik management.
+
+## Related
+
+- [[ubuntu]] — Host
+- [[traefik]] — Routes traffic to Authentik
+- [[gitea]] — Git hosting, SSO client
--- a/homelab/entities/backblaze-b2.md
+++ b/homelab/entities/backblaze-b2.md
@@ -0,0 +1,37 @@
+---
+title: Backblaze B2
+created: 2026-05-24
+updated: 2026-05-24
+type: entity
+tags: [services, storage, s3, backup]
+sources: [homelab/architecture.md, docs/TrueNAS-Migration]
+confidence: high
+---
+
+# Backblaze B2
+
+## Overview
+
+S3-compatible cloud storage for off-site backups of critical homelab data. Configured as a Cold storage tier in TrueNAS and as a rclone remote for Obsidian vault sync.
+
+## Key Facts
+
+- **Service**: Backblaze B2 (S3-compatible)
+- **Purpose**: Off-site backup of configuration, documents, and selected data
+- **Cost**: ~$7/mo
+- **TrueNAS integration**: B2 bucket configured as Cold storage tier in TrueNAS SCALE
+- **Obsidian vault sync**: rclone remote `b2-homelab-backups` syncs vault to B2 bucket
+- **Access**: Application key-based authentication (not AWS credentials)
+
+## TrueNAS Configuration
+
+TrueNAS exports `backblaze-b2` remote as a Cloud Sync channel. Datasets backed up include:
+- Obsidian vault snapshots
+- Homelab agent configs and session history
+- Database backups
+
+## Related
+
+- [[truenas]] — TrueNAS B2 Cold tier configuration
+- [[rustfs]] — S3 service running on TrueNAS (local S3, NOT Backblaze)
+- [[nfs-storage]] — local NFS storage vs. cloud backup strategy
--- a/homelab/entities/cloudflare.md
+++ b/homelab/entities/cloudflare.md
@@ -0,0 +1,52 @@
+---
+title: Cloudflare
+created: 2026-05-24
+updated: 2026-05-24
+type: entity
+tags: [services, networking, dns, identity]
+sources: [homelab/architecture.md, homelab/concepts/docker-traefik-stack.md]
+confidence: high
+---
+
+# Cloudflare
+
+## Overview
+
+DNS provider and reverse proxy layer for all `*.tophermayor.com` domains. Handles TLS certificate issuance via DNS challenge on grizzley and ubuntu Traefik instances.
+
+## Key Facts
+
+- **DNS Zone**: `tophermayor.com` managed at Cloudflare
+- **Role**: Authoritative DNS for all homelab public-facing services
+- **Wildcard cert source**: grizzley Traefik obtains `*.tophermayor.com` cert via Cloudflare DNS challenge
+- **certsync**: TLS certs synced from grizzley NFS mount (`/mnt/truenas/traefik-certs/grizzley`) → ubuntu via NFS or direct sync
+
+## Traefik Integration
+
+Both Traefik instances use `certresolver=cloudflare`:
+
+```yaml
+# ubuntu Traefik dynamic config
+tls:
+  certresolver: cloudflare
+  domains:
+    - main: toophermayor.com
+      sans:
+        - "*.tophermayor.com"
+```
+
+grizzley is the primary ACME source; ubuntu obtains certs from the shared NFS mount or via grizzley → ubuntu cert sync pipeline.
+
+## DNS Records
+
+| Record | Type | Target | Purpose |
+|--------|------|--------|---------|
+| `*.tophermayor.com` | A/CNAME | Traefik ingress | Wildcard for all services |
+| `@.tophermayor.com` | A | Home IP | Bare domain |
+| `traefik.tophermayor.com` | A | 192.168.50.84 | Grizzley edge ingress direct |
+
+## Related
+
+- [[grizzley]] — runs primary ACME Traefik instance
+- [[traefik]] — TLS certificate management
+- [[docker-traefik-stack]] — Traefik configuration patterns
--- a/homelab/entities/decypharr.md
+++ b/homelab/entities/decypharr.md
@@ -0,0 +1,40 @@
+---
+title: decypharr
+created: 2026-05-14
+updated: 2026-05-14
+type: entity
+tags: [service, media, lxc]
+sources: []
+---
+
+# decypharr
+
+**Role:** Black hole Usenet indexer / decypharr service
+**Host:** [[proxmox]] LXC CT 110
+**IP:** 192.168.50.175
+**Port:** 8282
+**URL:** https://decypharr.local.tophermayor.com (via [[traefik]])
+**Image:** cy01/blackhole:latest
+
+## Overview
+
+Decypharr is a Usenet black hole indexer service. Previously ran as a Docker container on [[ubuntu]] behind the gluetun VPN network. Migrated to a dedicated LXC container during the May 2026 media migration.
+
+## Configuration
+
+- **Config dir:** `/opt/decypharr/` inside container
+- **NFS mount:** `/mnt/truenas/mediadata` via PVE bind-mount `mp0`
+- **Traefik router:** `decypharr.local.tophermayor.com`
+
+## Migration History
+
+- **Before:** Docker container on ubuntu, part of the gluetun VPN network stack
+- **2026-05-14:** Migrated to dedicated LXC CT 110 on Proxmox as part of media stack migration
+- **Reason:** Media services moved from ubuntu Docker to individual LXCs; decypharr no longer needed gluetun networking
+
+## Related
+
+- [[proxmox]] — Host hypervisor
+- [[media-stack]] — Parent media ecosystem
+- [[traefik-ha]] — Ingress routing
+- [[ubuntu]] — Previous host
--- a/homelab/entities/gitea.md
+++ b/homelab/entities/gitea.md
@@ -0,0 +1,45 @@
+---
+title: gitea
+created: 2026-04-28
+updated: 2026-04-28
+type: entity
+tags: [services, git, ci-cd]
+sources: []
+---
+
+# gitea
+
+**Role:** Private Git hosting for homelab infrastructure-as-code
+**URL:** https://gitea.tophermayor.com
+**Host:** [[ubuntu]] (Docker)
+**Token:** `612031934800e7bd846d51d0193b38995c447ea4` (stored in memory)
+
+## Overview
+
+Gitea hosts all homelab git repos. The primary repo is the homelab infrastructure-as-code at the git remote used by the GitOps workflow. Gitea also runs CI/CD via runners that SSH to hosts.
+
+## Repos
+
+| Repo | Purpose |
+|------|---------|
+| homelab | Infrastructure configs (Docker Compose, Ansible) |
+| wiki | This wiki (private) |
+| wakehost | Go WoL + Proxmix app |
+
+## GitOps Workflow
+
+1. Push to Gitea repo
+2. Gitea runner (via SSH) connects to target host
+3. `git pull` in `/home/bear/homelabagentroot/`
+4. `sync-configs.sh` copies configs to runtime locations
+5. Systemd services reload if needed
+
+## Wiki Repo
+
+The [[index]] lives in a private Gitea repo (`wiki.git`). This is the canonical home — ice pushes here, grizzley/ubuntu pull from here.
+
+## Related
+
+- [[ubuntu]] — Host
+- [[ice]] — Control plane, primary GitOps runner target
+- [[proxmox]] — May host Gitea runner as VM/LXC
--- a/homelab/entities/grizzley.md
+++ b/homelab/entities/grizzley.md
@@ -0,0 +1,123 @@
+---
+title: grizzley
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [hosts, rpi, edge, ha]
+sources: []
+---
+
+# grizzley
+
+**Role:** Edge node — Traefik HA backup, Jellyfin media server, Hermes Gateway secondary
+**IP:** 192.168.50.84
+**Hostname:** grizzley
+**Uptime:** 1 day, 14h (as of 2026-04-28 — recently rebooted)
+
+## Overview
+
+grizzley is the edge node of the homelab cluster. It serves as the Traefik HA backup node (via keepalived VRRP), runs Jellyfin for media streaming, and hosts the secondary Hermes Gateway instance. It also has `/mnt/fast_share` as a fast local SSD mount.
+
+## Hardware
+
+| Spec | Detail |
+|------|--------|
+| Model | Raspberry Pi 5 |
+| CPU | ARM Cortex-A76 (4 cores) |
+| RAM | 7.7 GB total, 3.7 GB available, 4.0 GB used |
+| Swap | 6.0 GB total, 2.0 GB used |
+| Storage | 917 GB (`/dev/sdc2`, 8% used, 68 GB) |
+| Fast Storage | 916 GB `/mnt/fast_share` (`/dev/sdb1`, 1% used, 4.1 GB) — fast SSD mount |
+| Network | Gigabit Ethernet |
+| IP | 192.168.50.84 |
+
+## Systemd Services (Running)
+
+| Service | Purpose |
+|---------|---------|
+| `alert-bridge.service` | Prometheus → Telegram alert bridge (zero AI) |
+| `chrony.service` | NTP client/server |
+| `containerd.service` | Container runtime |
+| `docker.service` | Docker engine |
+| `fail2ban.service` | Intrusion prevention |
+| `hermes-dashboard.service` | Hermes Agent Web Dashboard |
+| `hermes-gateway.service` | Hermes Agent Gateway — messaging platform integration |
+| `keepalived.service` | VRRP for Traefik HA (BACKUP mode) |
+| `nfs-blkmap.service` | pNFS block layout mapping daemon |
+| `nfs-idmapd.service` | NFSv4 ID-name mapping |
+| `nfs-mountd.service` | NFS mount daemon |
+| `nfsdcld.service` | NFSv4 client tracking |
+| `opencode-web.service` | OpenCode Web Interface |
+| `rpc-statd.service` | NFS status monitor |
+| `rpcbind.service` | RPC portmapper |
+| `rsyslog.service` | System logging |
+| `snapd.service` | Snap daemon |
+| `ssh.service` | OpenSSH server |
+| `snap.cups.*` | CUPS printing services |
+
+## Docker Containers
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `aiomanager` | 1610/tcp | healthy | AI orchestration |
+| `aiomanager_db` | 5432/tcp | healthy | PostgreSQL for aiomanager |
+| `aiometadata` | 1337/tcp | healthy | AI metadata service |
+| `aiometadata-redis` | 6379/tcp | healthy | Redis for aiometadata |
+| `aiostreams` | 3002/tcp | healthy | AI streaming service |
+| `homepage-grizzley` | 3000/tcp | healthy | Homepage dashboard |
+| `jellyfin` | 8096, 9090/tcp | healthy | Media server |
+| `komodo` | 9120/tcp | healthy | AI service |
+| `komodo-mongo` | 27017/tcp | — | MongoDB for komodo |
+| `traefik-pi` | 80,443,2222,8080/tcp; 19132,19134,443/udp | healthy | Traefik edge ingress (HA cert generation) |
+| `uptime-kuma` | 3001/tcp | healthy | Uptime monitoring |
+| `vaultwarden` | 80/tcp | healthy | Password manager |
+
+## Docker Networks
+
+| Network | Driver | Purpose |
+|---------|--------|---------|
+| `aiomanager_default` | bridge | aiomanager stack |
+| `aiometadata_aiometadata-internal` | bridge | aiometadata internal |
+| `komodo_komodo-internal` | bridge | komodo internal |
+| `homepage_default` | bridge | Homepage |
+| `traefik-proxy` | bridge | Traefik ingress |
+| `desktop-test_default` | bridge | Desktop test stack |
+
+## NFS Mounts
+
+```
+192.168.50.12:/mnt/TrueNAS/traefik-certs/grizzley → /mnt/truenas/traefik-certs/grizzley (nfs4, rw, tcp, hard)
+```
+
+TrueNAS NFS share for Traefik TLS certificate sync. Both traefik-pi (grizzley) and traefik (ubuntu) share the same wildcard cert via this mount.
+
+## Traefik HA (Keepalived VRRP)
+
+grizzley is the **BACKUP** Traefik node. VRRP runs on `eth0.50` (VLAN 50):
+
+```
+virtual_router_id: 51
+priority: 90 (BACKUP — ubuntu is PRIMARY at higher priority)
+virtual_ipaddress: 192.168.50.80/27
+auth_type: PASS, auth_pass: HomelabH
+check_script: /etc/keepalived/check_traefik.sh (interval 2s, fall 2, rise 2)
+```
+
+When ubuntu Traefik fails, keepalived promotes grizzley to MASTER and the virtual IP moves here.
+
+## Access
+
+```bash
+ssh bear@192.168.50.84
+```
+
+**Note:** NFS client services run automatically. `/etc/keepalived/keepalived.conf` has the VRRP config.
+
+## Related
+
+- [[ice]] — Control plane, primary agent host
+- [[ubuntu]] — Main Docker host, Traefik PRIMARY partner
+- [[truenas]] — NFS storage backend (cert sync)
+- [[traefik]] — Traefik entity
+- [[jellyfin]] — Media server running on grizzley
+- [[hermes-gateway]] — Hermes Gateway secondary
--- a/homelab/entities/hermes-gateway.md
+++ b/homelab/entities/hermes-gateway.md
@@ -0,0 +1,71 @@
+---
+title: hermes-gateway
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [services, ai, gateway, watchdog]
+sources: []
+---
+
+# hermes-gateway
+
+**Role:** AI gateway — routes LLM requests across multiple providers
+**Hosts:** [[ice]] (primary), [[grizzley]] (secondary)
+**Runs on:** ice as systemd user service (`hermes-gateway.service`)
+
+## Overview
+
+hermes-gateway is the AI gateway that routes LLM requests (DeepSeek V4, OpenAI, Anthropic, OpenRouter, etc.) across multiple providers. It has a watchdog pattern deployed via system cron on both [[ice]] and [[grizzley]].
+
+## Providers
+
+| Provider | Model | Endpoint | Notes |
+|----------|-------|----------|-------|
+| DeepSeek | V4 | `https://api.deepseek.com/anthropic` | Anthropic format, 1M input / 384K output |
+| OpenAI | various | `https://api.openai.com` | |
+| Anthropic | various | `https://api.anthropic.com` | |
+| OpenRouter | various | `https://openrouter.ai/api` | |
+
+## Watchdog Pattern
+
+A shell script (`/home/bear/hermes-gateway-watchdog.sh`) runs via **system cron** on both ice and grizzley:
+
+1. Checks if hermes-gateway is responsive
+2. On failure: direct restart → tmux+OpenCode rescue if still down
+3. Sends Telegram notification on failure to topic **1033 "Cron Jobs"** in AigentZeroHermes (`-1003820156994`)
+
+**Telegram alert details:**
+- Bot token: `836803270:AAH-Ac5Y`
+- Chat ID: `-1003820156994` (AigentZeroHermes channel)
+- Topic ID: 1033 ("Cron Jobs")
+
+**Critical note:** On [[grizzley]], the systemd override for the watchdog is deployed directly to `/etc/systemd/system/` (not tracked in the homelab repo — it's a system unit).
+
+## DeepSeek V4 Provider
+
+Configured as: `https://api.deepseek.com/anthropic` (Anthropic format, not OpenAI).
+Context window: 1M input / 384K output.
+⚠️ Known bug: thinking mode passes `reasoning_content` back incorrectly — pass it back in multi-turn.
+
+## Access
+
+hermes-gateway runs as a user service. To check status:
+```bash
+# On ice (primary)
+ssh bear@192.168.50.197 "systemctl --user status hermes-gateway"
+journalctl --user -u hermes-gateway -f
+
+# On grizzley (secondary)
+ssh bear@192.168.50.84 "systemctl --user status hermes-gateway"
+```
+
+Watchdog logs (check cron output in syslog):
+```bash
+ssh bear@192.168.50.197 "grep hermes-gateway-watchdog /var/log/syslog"
+```
+
+## Related
+
+- [[ice]] — Primary host
+- [[grizzley]] — Secondary host with watchdog
+- [[authentik]] — SSO for gateway access (if applicable)
--- a/homelab/entities/home-assistant-connect-zbt-2.md
+++ b/homelab/entities/home-assistant-connect-zbt-2.md
@@ -0,0 +1,75 @@
+---
+title: Home Assistant Connect ZBT-2
+created: 2026-05-10
+updated: 2026-05-10
+type: entity
+tags: [hub, zigbee, thread, matter, smart-home, iot]
+confidence: high
+---
+
+# Home Assistant Connect ZBT-2
+
+> Nabu Casa's official Zigbee + Thread coordinator dongle for Home Assistant. Plugged into [[panda]], serves as the primary Zigbee and Thread border router for the smart home.
+
+## Overview
+
+| Field | Value |
+|-------|-------|
+| **Manufacturer** | Nabu Casa |
+| **Model** | Home Assistant Connect ZBT-2 |
+| **Serial** | E072A1DC134C |
+| **Host** | [[panda]] (plugged into USB) |
+| **Protocols** | Zigbee 3.0 + Thread (IEEE 802.15.4) |
+| **HA Integration** | ZHA (Zigbee) + Thread (OpenThread Border Router) |
+
+## Role in the Smart Home
+
+The Connect ZBT-2 is the **primary coordinator** for all Zigbee and Thread devices in the home. It provides:
+
+1. **Zigbee Coordinator** — via ZHA integration, manages the Zigbee mesh network
+2. **Thread Border Router** — via Thread integration, provides IP connectivity for Thread devices
+3. **Matter Controller** — via Matter integration, commissions and controls Matter devices over Thread
+
+## Zigbee Devices (via ZHA)
+
+All Zigbee devices are paired directly to the Connect ZBT-2 coordinator:
+
+| Device | Location | Model | Type |
+|--------|----------|-------|------|
+| Aqara Door/Window Sensor | Rooftop | Aqara Door and Window Sensor | [[sensor]] |
+| Aqara Vibration Sensor T1 | Rooftop | Aqara Vibration Sensor T1 | [[sensor]] |
+| Aqara Motion Sensor P1 | Living Room | Aqara Motion Sensor P1 | [[sensor]] |
+| Aqara Light Switch H2 US | Baby Room | Aqara Light Switch H2 US | [[actuator]] |
+| Aqara Light Switch H2 US | Front Door | Aqara Light Switch H2 US | [[actuator]] |
+| Aqara Light Switch H2 US | Entrance | Aqara Light Switch H2 US | [[actuator]] |
+| Aqara Light Switch H2 US | 1st Floor | Aqara Light Switch H2 US | [[actuator]] |
+| Aqara Colorful Ceiling Light 36W | Baby Room | Colorful Ceiling Light 36W | [[actuator]] |
+| Aqara Smart Lock U100 | Front Door | Aqara Smart Lock U100 | [[actuator]] |
+| IKEA STARKVIND | — | STARKVIND Air purifier | [[actuator]] |
+
+## Thread Network
+
+The Connect ZBT-2 runs an OpenThread Border Router, creating a Thread network that:
+- Provides IP connectivity to Thread-only devices
+- Acts as a Matter fabric gateway
+- Shares Thread credentials with other border routers (e.g., Apple TV, Nest Hub) for mesh redundancy
+
+## Multi-Fabric Position
+
+In the [[matter-multi-fabric]] architecture, the ZBT-2 serves as:
+- **HA's Matter fabric controller** — primary commissioning point for new Matter devices
+- **Thread credential source** — other border routers should join this Thread network
+- **Zigbee bridge** — exposes Zigbee devices to Matter via HA's Matter Bridge feature
+
+## Relationships
+
+- Connected to [[panda]] via USB
+- Controls all Zigbee devices in the home
+- Provides Thread connectivity for [[matter-multi-fabric]]
+- Complements [[aqara-hub-m3]] (which bridges Aqara-specific devices via Matter)
+
+## Notes
+
+- Thread credentials should be shared with [[aqara-hub-m3]] and Apple TV to ensure a single unified Thread mesh
+- If adding more Thread border routers, export credentials from this OTBR and import them
+- The ZBT-2 is a dual-protocol radio — Zigbee and Thread cannot run simultaneously on the same radio; HAOS handles multiplexing
--- a/homelab/entities/homepage.md
+++ b/homelab/entities/homepage.md
@@ -0,0 +1,330 @@
+---
+title: homepage
+created: 2026-04-29
+updated: 2026-04-29
+type: entity
+tags: [services, docker, homelab]
+sources: []
+---
+
+# homepage
+
+**Role:** Unified homelab dashboard — service bookmarks, Docker widget, infrastructure status
+**Image:** `gethomepage/homepage:latest`
+**Websites:** See Traefik routes below
+
+## Overview
+
+Two Homepage instances provide a unified dashboard for the homelab. [GetHomepage](https://gethomepage.dev/) is a modern, configurable dashboard for homelab services. It uses Docker socket integration for live container status, widgets for service metrics, and Traefik for ingress routing.
+
+| Instance | Host | Port | Network | Traefik Route |
+|----------|------|------|---------|--------------|
+| `homepage-ubuntu` | [[ubuntu]] | 3003 | `proxy-net` | `homepage.local.tophermayor.com`, `homepage-ubuntu.local.tophermayor.com` |
+| `homepage-grizzley` | [[grizzley]] | 3000 | `traefik-proxy` | `homepage-grizzley.local.tophermayor.com` |
+
+**Traefik VIP routing:** `homepage.local.tophermayor.com` → `homepage-to-self` → `http://192.168.50.61:3003` (ubuntu). The grizzley instance is accessible at `homepage-grizzley.local.tophermayor.com`.
+
+## Docker Configuration
+
+### homepage-ubuntu
+
+```yaml
+container_name: homepage-ubuntu
+image: gethomepage/homepage:latest
+network: proxy-net
+ports: 3003
+bind mount: /home/bear/homelab/ubuntu/homepage/config → /app/config
+docker socket: /var/run/docker.sock (read-only)
+memory limit: (none set — uses host resources)
+```
+
+Config path: `/home/bear/homelab/ubuntu/homepage/config/`
+
+### homepage-grizzley
+
+```yaml
+container_name: homepage-grizzley
+image: ghcr.io/gethomepage/homepage:latest
+network: traefik-proxy
+ports: 3000
+bind mount: /home/bear/homelab/grizzley/docker/homepage/config → /app/config
+docker socket: /var/run/docker.sock (read-only)
+memory limit: 256MB (hard), 64MB (reserved)
+allowed hosts: homepage.local.tophermayor.com, homepage-grizzley.local.tophermayor.com, 192.168.50.84:3000
+```
+
+Config path: `/home/bear/homelab/grizzley/docker/homepage/config/`
+
+## Traefik Routes (ubuntu Traefik)
+
+From `homelab/ubuntu/traefik/config/dynamic/upstream-ingress.yml`:
+
+```yaml
+# Primary VIP route → ubuntu instance
+homepage-vip:
+  rule: "Host(`homepage.local.tophermayor.com`)"
+  entryPoints: [websecure]
+  service: homepage-to-self
+  priority: 100
+  tls: {}
+
+# Direct ubuntu route
+homepage-local:
+  rule: "Host(`homepage-ubuntu.local.tophermayor.com`)"
+  entryPoints: [websecure]
+  service: homepage-to-self
+  priority: 100
+  tls: {}
+
+# grizzley backup route (bypasses VIP)
+homepage-backup-grizzley:
+  rule: "Host(`homepage-grizzley.local.tophermayor.com`)"
+  entryPoints: [websecure]
+  service: homepage-grizzley-svc
+  priority: 100
+  tls: {}
+```
+
+Services:
+- `homepage-to-self` → `http://192.168.50.61:3003`
+- `homepage-grizzley-svc` → `http://192.168.50.84:3000`
+
+## Settings (ubuntu instance)
+
+From `settings.yaml`:
+
+```yaml
+title: Ubuntu Homepage
+description: Homelab dashboard — all hosts.
+target: _self
+theme: dark
+color: slate
+iconStyle: theme
+background:
+  image: https://images.unsplash.com/photo-1451187580459-43490279c0fa?auto=format&fit=crop&w=2560&q=80
+  opacity: 28
+  brightness: 55
+  saturate: 60
+cardBlur: md
+```
+
+Layout (4-column rows by section):
+- Media Servers (4 cols)
+- Media Automation (5 cols)
+- Grizzley (4 cols)
+- Apps (4 cols)
+- Infrastructure (4 cols)
+
+## Widgets (ubuntu instance)
+
+From `widgets.yaml`:
+
+```yaml
+- resources:
+    cpu: true
+    memory: true
+    disk: /
+- search:
+    provider: duckduckgo
+    target: _blank
+```
+
+From `docker.yaml`:
+
+```yaml
+ubuntu:
+  socket: /var/run/docker.sock
+```
+
+Docker socket integration provides live container status for all services on [[ubuntu]].
+
+## Services Displayed (ubuntu homepage)
+
+### Media Servers
+| Service | URL | Widget |
+|---------|-----|--------|
+| Jellyfin | https://jellyfin.tophermayor.com | Jellyfin widget (`http://jellyfin:8096`, key `3aabf1af...`) |
+| Immich | https://immich.tophermayor.com | — |
+| Navidrome | https://navidrome.tophermayor.com | — |
+| Audiobookshelf | https://audiobooks.tophermayor.com | — |
+| Kavita | https://kavita.tophermayor.com | — |
+| Calibre-Web | https://calibre-web.local.tophermayor.com | — |
+| Stremio | https://stremio.local.tophermayor.com | — |
+
+### Media Automation
+| Service | URL | Widget |
+|---------|-----|--------|
+| Gluetun VPN | (internal) | Gluetun widget (`http://gluetun:8000`, v2) |
+| Sonarr | https://sonarr.local.tophermayor.com | Sonarr widget (key `0573d93d...`) |
+| Sonarr Anime | https://sonarr-anime.local.tophermayor.com | Sonarr widget (key `84de4e4a...`) |
+| Radarr | https://radarr.local.tophermayor.com | Radarr widget (key `d69cafc9...`) |
+| Radarr Anime | https://radarr-anime.local.tophermayor.com | Radarr widget (key `d4373fbc...`) |
+| Lidarr | https://lidarr.local.tophermayor.com | Lidarr widget (key `55921016...`) |
+| Readarr | https://readarr.local.tophermayor.com | — |
+| Prowlarr | https://prowlarr.local.tophermayor.com | — |
+| qBittorrent | https://qbittorrent.local.tophermayor.com | — |
+| SABnzbd | https://sabnzbd.local.tophermayor.com | SABnzbd widget (key `01d3c44b...`) |
+| NZBdav | https://nzbdav.local.tophermayor.com | — |
+| Seerr | https://jellyseerr.tophermayor.com | Overseerr widget (key `MTc2NTIy...`) |
+
+### Grizzley (links through to grizzley-hosted services)
+| Service | URL |
+|---------|-----|
+| Homepage Grizzley | https://homepage-grizzley.local.tophermayor.com |
+| Traefik Grizzley | https://traefik-grizzley.local.tophermayor.com |
+| Komodo | https://komodo.local.tophermayor.com |
+| AIOManager | https://aiomanager.tophermayor.com |
+| AIOStreams | https://aiostreams.tophermayor.com |
+| AIOMetadata | https://aiometadata.tophermayor.com |
+| Vaultwarden | https://vaultwarden.tophermayor.com |
+| Status (Uptime Kuma) | https://status.tophermayor.com |
+
+### Apps
+| Service | URL | Widget |
+|---------|-----|--------|
+| Authentik | https://auth.tophermayor.com | — |
+| Gitea | https://gitea.tophermayor.com | — |
+| Home Assistant | https://ha.tophermayor.com | HomeAssistant widget (key `eyJhbG...`, fields: people_home, lights_on, switches_on) |
+| OpenCode | https://opencode.tophermayor.com | — |
+| OpenCode Ice | https://opencode-ice.local.tophermayor.com | — |
+| Whisper | https://whisper.local.tophermayor.com | — |
+
+### Infrastructure
+| Service | URL | Widget |
+|---------|-----|--------|
+| Traefik | https://traefik.local.tophermayor.com | Traefik widget (`http://traefik:8080`) |
+| Proxmox | https://proxmox.local.tophermayor.com | Proxmox widget (user: `homepage@pam!homepage`, node: pve) |
+| TrueNAS | https://truenas.local.tophermayor.com | TrueNAS widget (key `1-SdjbJ...`) |
+| Grafana | https://grafana.local.tophermayor.com | — |
+| Prometheus | https://prometheus.local.tophermayor.com | Prometheus widget (`http://prometheus:9090`) |
+| Reccollection | https://reccollection.local.tophermayor.com | — |
+
+## Services Displayed (grizzley homepage)
+
+### Grizzley (local services)
+| Service | URL | Widget |
+|---------|-----|--------|
+| Traefik | https://traefik-grizzley.local.tophermayor.com | Traefik widget (`http://traefik-pi:8080`) |
+| Komodo | https://komodo.local.tophermayor.com | Komodo widget (key `K_jjWNbR...`, secret `S_IHGCW15...`) |
+| AIOManager | https://aiomanager.tophermayor.com | — |
+| AIOStreams | https://aiostreams.tophermayor.com | — |
+| AIOMetadata | https://aiometadata.tophermayor.com | — |
+| Vaultwarden | https://vaultwarden.tophermayor.com | — |
+| Status (Uptime Kuma) | https://status.tophermayor.com | UptimeKuma widget (slug: default) |
+| Minecraft Standby | (UDP 19132) | — |
+| Minecraft Sison | (UDP 19134) | — |
+| Jellyfin Standby | (internal) | — |
+
+### Ubuntu (linked)
+| Service | URL |
+|---------|-----|
+| Homepage Ubuntu | https://homepage-ubuntu.local.tophermayor.com |
+| Traefik Ubuntu | https://traefik.local.tophermayor.com |
+| OpenCode | https://opencode.tophermayor.com |
+| Authentik | https://auth.tophermayor.com |
+| Gitea | https://gitea.tophermayor.com |
+| Whisper | https://whisper.local.tophermayor.com |
+| Stremio Server | https://stremio.local.tophermayor.com |
+| Reccollection | https://reccollection.local.tophermayor.com |
+
+### Media (ubuntu via links)
+| Service | URL |
+|---------|-----|
+| Jellyfin | https://jellyfin.tophermayor.com |
+| Seerr | https://jellyseerr.tophermayor.com |
+| Immich | https://immich.tophermayor.com |
+| Navidrome | https://navidrome.tophermayor.com |
+| Audiobookshelf | https://audiobooks.tophermayor.com |
+| Kavita | https://kavita.tophermayor.com |
+| Calibre-Web | https://calibre-web.local.tophermayor.com |
+
+### Media Automation (ubuntu via links)
+| Service | URL | Widget |
+|---------|-----|--------|
+| Sonarr | https://sonarr.local.tophermayor.com | Sonarr (key `0573d93d...`) |
+| Radarr | https://radarr.local.tophermayor.com | Radarr (key `d69cafc9...`) |
+| Lidarr | https://lidarr.local.tophermayor.com | Lidarr (key `55921016...`) |
+| Readarr | https://readarr.local.tophermayor.com | — |
+| Prowlarr | https://prowlarr.local.tophermayor.com | — |
+| qBittorrent | https://qbittorrent.local.tophermayor.com | — |
+| SABnzbd | https://sabnzbd.local.tophermayor.com | SABnzbd (key `01d3c44b...`) |
+| Sonarr Anime | https://sonarr-anime.local.tophermayor.com | Sonarr (key `84de4e4a...`) |
+| Radarr Anime | https://radarr-anime.local.tophermayor.com | Radarr (key `d4373fbc...`) |
+
+### Apps (ubuntu via links)
+| Service | URL | Widget |
+|---------|-----|--------|
+| Home Assistant | https://ha.tophermayor.com | HomeAssistant (key `eyJhbG...`, fields: people_home, lights_on, switches_on) |
+| OpenCode Ice | https://opencode-ice.local.tophermayor.com | — |
+
+### Infrastructure (ubuntu via links)
+| Service | URL | Widget |
+|---------|-----|--------|
+| Proxmox | https://proxmox.local.tophermayor.com | Proxmox (user `homepage@pam!homepage`, node pve) |
+| TrueNAS | https://truenas.local.tophermayor.com | TrueNAS (key `1-SdjbJ...`) |
+| Grafana | https://grafana.local.tophermayor.com | — |
+| Prometheus | https://prometheus.local.tophermayor.com | — |
+
+## Bookmark Groups (ubuntu)
+
+From `bookmarks.yaml`:
+
+```yaml
+- Developer:
+    - Github (abbr: GH) → https://github.com/
+- Social:
+    - Reddit (abbr: RE) → https://reddit.com/
+- Entertainment:
+    - YouTube (abbr: YT) → https://youtube.com/
+```
+
+## Kubernetes / Proxmox Configs
+
+Both instances have `kubernetes.yaml` and `proxmox.yaml` for additional infrastructure widgets.
+
+## Upstream Ingress Widget Routes (Traefik)
+
+From `homelab/ubuntu/traefik/config/dynamic/homepage-widgets.yml` — Traefik routes exposed **through** homepage for internal service access (not homepage's own routes):
+
+```yaml
+# Routes via gluetun VPN for media services
+sonarr-svc:       http://gluetun:8989   # Host(`sonarr-internal.local.tophermayor.com`)
+radarr-svc:       http://gluetun:7878   # Host(`radarr-internal.local.tophermayor.com`)
+lidarr-svc:       http://gluetun:8686   # Host(`lidarr-internal.local.tophermayor.com`)
+sabnzbd-svc:      http://gluetun:8080   # Host(`sabnzbd-internal.local.tophermayor.com`)
+seerr-svc:        http://seerr:5055    # Host(`seerr-internal.local.tophermayor.com`)
+jellyfin-svc:     http://jellyfin:8096  # Host(`jellyfin-internal.local.tophermayor.com`)
+prometheus-svc:   http://prometheus:9090 # Host(`prometheus-internal.local.tophermayor.com`)
+```
+
+These are the `*-internal.local.tophermayor.com` routes — accessible only inside the network via gluetun VPN tunnel.
+
+## Access URLs
+
+| URL | Host | Notes |
+|-----|------|-------|
+| https://homepage.local.tophermayor.com | [[ubuntu]] | Primary VIP route |
+| https://homepage-ubuntu.local.tophermayor.com | [[ubuntu]] | Direct ubuntu instance |
+| https://homepage-grizzley.local.tophermayor.com | [[grizzley]] | Direct grizzley instance |
+
+## Config Files
+
+| File | Purpose |
+|------|---------|
+| `services.yaml` | Service definitions, URLs, icons, widget configs |
+| `settings.yaml` | Theme, layout, background image |
+| `widgets.yaml` | Resource monitors, search bar |
+| `docker.yaml` | Docker socket connection |
+| `bookmarks.yaml` | Quick bookmarks bar |
+| `kubernetes.yaml` | K8s widget config |
+| `proxmox.yaml` | Proxmox widget config |
+| `custom.css` | Custom styles |
+| `custom.js` | Custom JavaScript |
+
+## Related
+
+- [[ubuntu]] — Hosts `homepage-ubuntu` on port 3003, `proxy-net`
+- [[grizzley]] — Hosts `homepage-grizzley` on port 3000, `traefik-proxy`
+- [[traefik]] — Ingress routing for all homepage instances
+- [[media-stack]] — Media services displayed on homepage
+- [[homelab-monitoring]] — Infrastructure widgets (Prometheus, Grafana, Proxmox, TrueNAS)
--- a/homelab/entities/hyte.md
+++ b/homelab/entities/hyte.md
@@ -0,0 +1,52 @@
+---
+title: Hyte
+created: 2026-05-24
+updated: 2026-05-24
+type: entity
+tags: [hosts, vm, windows]
+sources: [homelab/catalog/hosts.json, homelab/AGENTS.md]
+confidence: high
+---
+
+# Hyte
+
+## Overview
+
+Windows 11 workstation with WSL2. Primary Tdarr media processing node. Static IP on Lab VLAN.
+
+## Key Facts
+
+- **IP**: `192.168.1.143` (Main/Prod VLAN)
+- **SSH Port**: 2222 (non-standard)
+- **SSH User**: `christopher`
+- **SSH Key**: `~/.ssh/id_ed25519`
+- **Role**: Desktop host + media workstation (Tdarr)
+- **Authoritative Repo**: `homelab/Hyte`
+- **Inventory Group**: `hyte_host`
+
+## SSH Access
+
+```bash
+ssh -p 2222 christopher@192.168.1.143
+# or via ~/.ssh/config
+ssh hyte
+```
+
+SSH config entry in `~/.ssh/config`:
+```
+Host Hyte
+    HostName 192.168.1.143
+    Port 2222
+    User christopher
+    IdentityFile ~/.ssh/id_ed25519
+```
+
+## Tdarr Integration
+
+Hyte runs Tdarr (media transcoding) as a Windows-native workload. Uses GPU transcoding for media files on the NFS mounts from [[truenas]].
+
+## Related
+
+- [[truenas]] — NFS storage source for Tdarr processing
+- [[media-stack]] — Tdarr transcoding pipeline
+- [[proxmox]] — hosts the hypervisor running this workstation VM
--- a/homelab/entities/ice.md
+++ b/homelab/entities/ice.md
@@ -0,0 +1,96 @@
+---
+title: ice
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [hosts, rpi, control-plane]
+sources: []
+---
+
+# ice
+
+**Role:** Control plane node — primary Hermes Agent host, GitOps origin
+**IP:** 192.168.50.197
+**Hostname:** ice
+**Uptime:** 15 days, 10h (as of 2026-04-28)
+
+## Overview
+
+ice is the control plane of the homelab cluster. It runs the primary Hermes Agent instance and OpenCode backend. All GitOps workflows originate here — configs are edited in the repo (`/home/bear/homelab/`), committed, and pushed to Gitea, which triggers runners on each host.
+
+## Hardware
+
+| Spec | Detail |
+|------|--------|
+| Model | Raspberry Pi 4 |
+| CPU | ARM Cortex-A72 (4 cores) |
+| RAM | 7.6 GB total, 2.4 GB available, 5.2 GB used |
+| Storage | 939 GB microSD/USB SSD (`/dev/sda2`), 45 GB used (5%) |
+| Swap | None |
+| Network | Gigabit Ethernet |
+| IP | 192.168.50.197 |
+
+## Systemd Services (Running)
+
+| Service | Purpose |
+|---------|---------|
+| `cabo-voting.service` | Cabo Bachelor Party Voting App |
+| `chrony.service` | NTP client/server |
+| `containerd.service` | Container runtime |
+| `docker.service` | Docker engine |
+| `fail2ban.service` | Intrusion prevention |
+| `hermes-dashboard.service` | Hermes Agent Web Dashboard |
+| `hermes-gateway-watchdog.timer` | Cron watchdog for hermes-gateway, Telegram alerts |
+| `netplan-wpa-wlan0.service` | WLAN WPA supplicant |
+| `nfs-blkmap.service` | pNFS block layout mapping |
+| `opencode-web.service` | OpenCode Web Interface |
+| `rpcbind.service` | RPC portmapper |
+| `rsyslog.service` | System logging |
+| `snapd.service` | Snap daemon |
+| `ssh.service` | OpenSSH server |
+| `unattended-upgrades.service` | Automatic security updates |
+| `user@1000.service` | User session manager |
+
+## Docker Containers
+
+| Container | Port | Purpose |
+|-----------|------|---------|
+| `camofox` | 9377 | Firefox browser automation |
+| `hermes-dashboard` | — | Hermes Agent web UI |
+| `opencode-web` | 4096 | OpenCode web interface |
+
+## Docker Networks
+
+`bridge`, `host`, `none` (default drivers only — no custom overlay networks)
+
+## NFS Mounts
+
+None configured on ice.
+
+## Hermes Gateway Watchdog
+
+`/home/bear/hermes-gateway-watchdog.sh` runs via system cron on ice:
+1. Checks if hermes-gateway is responsive
+2. On failure: direct restart → tmux+OpenCode rescue if still down
+3. Sends Telegram notification on failure to topic 1033 "Cron Jobs" (bot: `836803270:AAH-Ac5Y`)
+
+## GitOps Context
+
+1. Configs edited in `/home/bear/homelab/` (git worktrees)
+2. Pushed to Gitea (`gitea.tophermayor.com`)
+3. Runner SSHs to each host, pulls, runs `sync-configs.sh`
+4. Systemd services reload
+
+## Access
+
+```bash
+ssh bear@192.168.50.197
+```
+
+## Related
+
+- [[grizzley]] — RPi5 edge node, Traefik HA backup
+- [[ubuntu]] — Main Docker host (~70 containers)
+- [[proxmox]] — Hypervisor (may host ice as VM)
+- [[hermes-gateway]] — AI gateway on ice
+- [[truenas]] — NFS/S3 storage backend
--- a/homelab/entities/index.md
+++ b/homelab/entities/index.md
@@ -0,0 +1,57 @@
+---
+title: Homelab Entities Index
+created: 2026-04-28
+updated: 2026-05-24
+type: index
+tags: [meta]
+---
+
+# Entities Index
+
+> Content catalog for homelab entities. Every entity page listed with a one-line summary.
+> Last updated: 2026-05-24 | Total pages: 22
+
+## Hosts
+
+| Entity | Role | IP | Notes |
+|--------|------|-----|-------|
+| [[ice]] | RPi4 control plane | 192.168.50.197 | Primary Hermes Agent host, OpenCode control node |
+| [[grizzley]] | RPi5 edge node | 192.168.50.84 | Traefik HA primary, Jellyfin, MineOS, Hermes |
+| [[ubuntu]] | Intel NUC Docker host | 192.168.50.61 | ~70 containers |
+| [[proxmox]] | Proxmox VE hypervisor | 192.168.50.11 | VMs and LXCs |
+| [[truenas]] | TrueNAS NAS | 192.168.50.12 | ⚠️ Pool corruption, 36TB raw |
+| [[panda]] | RPi Home Assistant | 192.168.30.196 | Smart home hub, IoT VLAN |
+| [[hyte]] | Windows 11 workstation | 192.168.1.143 | Tdarr media processing, SSH port 2222 |
+| [[macos-workstation]] | MacBook Air M4 | Dynamic | Operator workstation, not a deployment target |
+
+## Services
+
+| Entity | Role | Host | Notes |
+|--------|------|-------|-------|
+| [[homepage]] | Unified homelab dashboard | ubuntu + grizzley | 2 instances, 60+ services tracked |
+| [[hermes-gateway]] | AI gateway | ice + grizzley | Watchdog pattern |
+| [[traefik]] | Reverse proxy / ingress | grizzley + ubuntu | HA across both hosts |
+| [[authentik]] | SSO identity provider | ubuntu | |
+| [[jellyfin]] | Media server | grizzley | ⚠️ Bind mount UID issue |
+| [[rustfs]] | S3 object storage | truenas | ⚠️ Ignores env vars on first boot |
+| [[gitea]] | Private Git hosting | ubuntu | GitOps runner hub |
+| [[decypharr]] | Usenet indexer | proxmox CT 110 | 192.168.50.175:8282 |
+| [[tdarr]] | Media transcoding | ubuntu + Hyte | GPU-accelerated transcoding |
+| [[komodo]] | Container management UI | grizzley | |
+| [[uptime-kuma]] | Uptime monitoring | grizzley | |
+
+## Subscriptions & Paid Services
+
+| Entity | Role | Cost/mo | Notes |
+|--------|------|---------|-------|
+| [[cloudflare]] | DNS + proxy + TLS | ~$20 | DNS challenge for *.tophermayor.com |
+| [[nordvpn]] | WireGuard VPN for media stack | ~$12 | Via Gluetun container |
+| [[backblaze-b2]] | Off-site backup storage | ~$7 | Cold tier in TrueNAS |
+| [[subscriptions]] | Full subscription catalog | ~$81 total | See concept page for breakdown |
+
+## Smart Home / IoT
+
+| Entity | Role | Host | Notes |
+|--------|------|-------|-------|
+| [[home-assistant-connect-zbt-2]] | Zigbee + Thread coordinator | panda | ZHA + OTBR, 10 Zigbee devices |
+| [[aqara-hub-m3]] | Aqara Matter hub | Bedroom | Bridges Aqara to Matter |
--- a/homelab/entities/jellyfin.md
+++ b/homelab/entities/jellyfin.md
@@ -0,0 +1,44 @@
+---
+title: jellyfin
+created: 2026-04-28
+updated: 2026-04-28
+type: entity
+tags: [services, media, jellyfin]
+sources: []
+---
+
+# jellyfin
+
+**Role:** Media server — movies, TV, music
+**URL:** https://jellyfin.tophermayor.com
+**Host:** [[grizzley]] (Docker)
+
+## Overview
+
+Jellyfin is the media server for the homelab. It streams movies, TV shows, and music to devices on the network. It runs on [[grizzley]] as a Docker container.
+
+## ⚠️ Known Issues
+
+### Bind Mount UID Permission Crash Loop
+
+Jellyfin may crash loop if bind mounts use a UID that doesn't match Jellyfin's internal user. See [[jellyfin]] skill.
+
+### JellyfinDown False Positive
+
+Prometheus alerts may fire for Jellyfin even when it's up — the blackbox exporter probe may fail while the service is healthy. See [[jellyfin]] skill.
+
+### Debugging
+
+See [[jellyfin]] skill for full debugging workflow.
+
+## Media Stack
+
+Often paired with:
+- Tdarr — Automated transcoding
+- Sonarr/Radarr — Media acquisition automation (confirm if on [[ubuntu]])
+
+## Related
+
+- [[grizzley]] — Host
+- [[truenas]] — Media storage (NFS share)
+- Tdarr — Transcoding (check if co-located)
--- a/homelab/entities/macos-workstation.md
+++ b/homelab/entities/macos-workstation.md
@@ -0,0 +1,38 @@
+---
+title: macOS Workstation
+created: 2026-05-24
+updated: 2026-05-24
+type: entity
+tags: [hosts, workstation, macos]
+sources: [homelab/catalog/hosts.json, homelab/AGENTS.md]
+confidence: high
+---
+
+# macOS Workstation (macbook-air-m4)
+
+## Overview
+
+MacBook Air M4 — the operator workstation. Used for day-to-day development, Obsidian vault editing, and as the primary access point for homelab management.
+
+## Key Facts
+
+- **Hardware**: MacBook Air M4 (Apple Silicon)
+- **IP**: Dynamic (not static)
+- **SSH User**: `christopherjohnsisonmayor`
+- **Role**: Operator workstation (not a deployment target)
+- **Authoritative Repo**: `homelab/macbook-air-m4`
+- **Inventory Group**: `raspberry_pis` (grouped with Pis for inventory purposes)
+
+## Purpose
+
+This machine is the **operator**, not a deployment target. It runs:
+- Obsidian desktop app (vault sync via Obsidian Sync)
+- OpenCode CLI (agent access)
+- Terminal + SSH for homelab management
+- Browser for UniFi controller, TrueNAS, Home Assistant UIs
+
+## Related
+
+- [[ice]] — primary control plane (SSH target from this workstation)
+- [[ubuntu]] — primary Docker host
+- [[grizzley]] — edge ingress node
--- a/homelab/entities/nordvpn.md
+++ b/homelab/entities/nordvpn.md
@@ -0,0 +1,42 @@
+---
+title: NordVPN
+created: 2026-05-24
+updated: 2026-05-24
+type: entity
+tags: [services, networking, vpn, media]
+sources: [homelab/architecture.md]
+confidence: high
+---
+
+# NordVPN
+
+## Overview
+
+Commercial VPN (WireGuard protocol) used to tunnel all media automation traffic through Gluetun. Provides exit IPs for accessing geo-restricted content and obscures download source IPs from ISPs.
+
+## Key Facts
+
+- **Protocol**: WireGuard (via Gluetun container)
+- **Provider**: NordVPN
+- **Purpose**: All media stack downloads (Sonarr, Radarr, Lidarr, Prowlarr, qBittorrent) route through VPN
+- **Container**: `gluetun` on ubuntu — acts as VPN gateway for media-net
+- **Exit IPs**: Shared NordVPN exit pool; not dedicated IP
+- **Cost**: ~$12/mo
+
+## Architecture
+
+```
+Media containers (media-net)
+    ↓
+Gluetun (WireGuard → NordVPN)
+    ↓
+Internet (geo-restricted content)
+```
+
+All media automation sits behind Gluetun via Docker network `media-net`. Jellyfin (direct play) does NOT use VPN.
+
+## Related
+
+- [[media-stack]] — all containers using Gluetun
+- [[docker-traefik-stack]] — Gluetun network configuration
+- [[truenas]] — stores media on NFS mounts
--- a/homelab/entities/panda.md
+++ b/homelab/entities/panda.md
@@ -0,0 +1,103 @@
+---
+title: Panda (Home Assistant Host)
+created: 2026-05-10
+updated: 2026-05-10
+type: entity
+tags: [hosts, rpi, home-assistant, iot, smart-home, hub]
+confidence: high
+---
+
+# Panda — Home Assistant Host
+
+> Dedicated Raspberry Pi running **Home Assistant OS (HAOS)** — the central smart home automation hub for the homelab.
+
+## Overview
+
+| Field | Value |
+|-------|-------|
+| **Hostname** | `a0d7b954-ssh` (HAOS SSH add-on container) |
+| **Hardware** | Raspberry Pi (BCM) |
+| **OS** | Home Assistant Operating System |
+| **Role** | Smart home hub, IoT controller, automation engine |
+| **VLAN** | IoT VLAN 30 (primary) + Server VLAN 50 |
+| **IP (VLAN 30)** | `192.168.30.196` |
+| **IP (VLAN 50)** | `192.168.50.196` (currently unreachable via .50) |
+| **Domain** | `ha.tophermayor.com` |
+| **Port** | 8123 (HTTP) |
+| **Physical Path** | UGC Ultra Port 2 → SG108PE trunk |
+
+## Network
+
+- **Primary IP**: `192.168.30.196` on IoT VLAN 30 — directly on the IoT subnet for device discovery
+- **Secondary IP**: `192.168.50.196` on Server VLAN 50 — for management access from server network
+- **Traefik Proxy**: Both [[ubuntu]] and [[grizzley]] Traefik instances route `ha.tophermayor.com` → `192.168.30.196:8123`
+- **DNS**: Cloudflare `*.tophermayor.com` → Traefik
+
+### Network Reconfiguration History
+
+A planned reconfiguration exists at `scripts/homelab/HOMEASSISTANT-NETWORK-RECONFIGURE.md` to swap the primary interface:
+- Target: `end0` on VLAN 50 (192.168.50.196) as primary, `end0.30` on VLAN 30 (192.168.30.196) as secondary
+- This would improve management access while keeping IoT discovery on VLAN 30
+
+## SSH Access
+
+- **Port 22**: Requires password auth (`bear` user, password-protected)
+- **Port 22222**: Connection refused (Advanced SSH add-on not listening here)
+- **SSH add-on**: "Advanced SSH & Web Terminal" is installed and configured with multiple authorized keys
+- **Note**: Grizzley's SSH key (`bear@grizzley`) needs to be added to the add-on's authorized_keys for agent access
+
+## Active Integrations
+
+### Controllers & Hubs
+- **Matter** — Built-in Matter controller via [[home-assistant-connect-zbt-2]]
+- **Thread** — Thread Border Router via [[home-assistant-connect-zbt-2]]
+- **ZHA** — Zigbee Home Automation via [[home-assistant-connect-zbt-2]]
+- **Apple TV** — Office Apple TV 4K gen 3
+- **Nest** — Google Nest Thermostat (Glendora)
+- **Alexa** — Amazon Echo devices via `alexa_devices` integration
+- **Shelly** — 2× Shelly 1PM Gen4 (local Wi-Fi)
+- **Govee** — 4× Govee lights (local LAN API)
+- **TP-Link** — 4× Kasa devices (cloud + LAN)
+- **webOS** — LG OLED65C5AUA TV
+- **VeSync** — Vital 200S air purifier
+- **ESPHome** — Home Assistant Voice PE
+- **Wyoming** — Whisper (STT), Piper (TTS), openWakeWord
+
+### External Hubs
+- **[[aqara-hub-m3]]** — Aqara Hub M3 (Matter-compatible, bridges Aqara devices)
+- **Aqara Camera Hub G3** — Camera + Aqara hub
+
+## Installed Add-ons
+
+- Advanced SSH & Web Terminal
+- File Editor
+- HACS (Home Assistant Community Store)
+- ESPHome
+- Whisper (STT)
+- Piper (TTS)
+- openWakeWord
+- go2rtc
+
+## Automations & Voice
+
+- **Voice Pipeline**: openWakeWord → Whisper (STT) → HA Assist → Piper (TTS)
+- **Voice Hardware**: Home Assistant Voice PE (ESPHome)
+- **iBeacon Tracker**: BLE presence detection
+
+## Storage
+
+- **TrueNAS mount**: Configured via Home Assistant Mount integration for backups/media
+
+## Relationships
+
+- Managed by [[ubuntu]] and [[grizzley]] Traefik via reverse proxy
+- Integrates with [[aqara-hub-m3]] for Aqara device bridging
+- Uses [[home-assistant-connect-zbt-2]] as Zigbee/Thread coordinator
+- Connects to [[ubuntu]] mounted storage via NFS
+- Part of the [[matter-multi-fabric]] architecture
+
+## Troubleshooting
+
+- **SSH access**: Must use password auth until grizzley key is added to SSH add-on config
+- **VLAN 50 IP unreachable**: The `.50.196` address doesn't respond to ping. Only `.30.196` works. Check if VLAN trunk is properly configured on the switch port.
+- **HA CLI**: `ha` commands require supervisor token — accessible only from within HAOS supervisor context, not from SSH add-on shell without proper auth
--- a/homelab/entities/proxmox.md
+++ b/homelab/entities/proxmox.md
@@ -0,0 +1,92 @@
+---
+title: proxmox
+created: 2026-04-28
+updated: 2026-05-14
+type: entity
+tags: [hosts, hypervisor, vm]
+sources: []
+---
+
+# proxmox
+
+**Role:** Proxmox VE hypervisor — VM and LXC container host
+**IP:** 192.168.50.11
+**Web UI:** https://proxmox.tophermayor.com (via [[traefik]])
+**Uptime:** 15 days, 14h (as of 2026-04-28)
+**CPU Load:** 6.83 (elevated — investigate if persistent)
+
+## Overview
+
+Proxmox VE is the hypervisor layer for the homelab. It runs VMs and LXC containers including TrueNAS, ubuntu-server, and 8 LXCs (media stack, traefik, test, hermes, decypharr). It is the physical foundation of the cluster — the Raspberry Pis (ice, grizzley) may run on Proxmox as VMs/LXCs or as bare metal.
+
+**Note:** `qm` and `pct` commands fail via SSH as the `bear` user because `/etc/pve` is a FUSE mount. Run them via `ssh bear@proxmox sudo qm list` or directly on the host console.
+
+## Hardware
+
+| Spec | Detail |
+|------|--------|
+| Model | Generic x86_64 server hardware |
+| CPU | Multi-core x86_64 |
+| RAM | 32–64 GB (see PVE web UI for exact) |
+| Storage | See ZFS pools below |
+| Network | Gigabit Ethernet |
+| IP | 192.168.50.11 |
+
+## VMs
+
+| VMID | Name | Status | RAM | Boot Disk | Notes |
+|------|------|--------|-----|-----------|-------|
+| 9001 | TrueNAS | **running** | 22.9 GB | 32 GB | NAS, ZFS storage, S3 via rustfs |
+| 9003 | ubuntu-server | **running** | 49 GB | 500 GB | Ubuntu server VM |
+| 9100 | W10-migrated | stopped | 16 GB | — | Windows 10 (inactive) |
+
+## LXCs
+
+| LXC ID | Name | Status | Notes |
+|--------|------|--------|-------|
+| 102 | traefik | offline | Traefik LXC (offline) |
+| 103 | gsd-test | running | General test LXC |
+| 104 | hermes-pve | running | Hermes agent on PVE |
+| 105 | media-arr | running | Sonarr, Radarr, Lidarr, etc. |
+| 106 | media-request | running | Jellyseerr, Overseerr |
+| 107 | media-music | running | Navidrome, music services |
+| 108 | media-reading | running | Kavita, Audiobookshelf |
+| 109 | media-db | running | PostgreSQL for media services |
+| 110 | [[decypharr]] | running | Black hole indexer (192.168.50.175:8282) |
+
+## Storage Pools
+
+| Pool | Type | Status | Total | Used | Available | % Used |
+|------|------|--------|-------|------|-----------|--------|
+| `CT1000` | zfspool | active | 942 GB | 31.5 GB | 911 GB | **3.34%** |
+| `SHGS31` | zfspool | active | 942 GB | 439 GB | 504 GB | **46.57%** (~460 GB used) |
+| `backups` | dir | active | 13.7 TB | 4.26 TB | 9.4 TB | **31.18%** (~4.2 TB used) |
+| `local` | dir | active | 847 GB | 5.3 GB | 842 GB | **0.62%** |
+| `local-zfs` | zfspool | active | 906 GB | 64 GB | 842 GB | **7.11%** |
+| `Evo860` | zfspool | inactive | — | — | — | 0% |
+
+Notable: `SHGS31` pool is ~47% full. `backups` pool has 4.2 TB used.
+
+## Wake-on-LAN
+
+Proxmox can wake hosts via WoL. [[https://github.com/TopherMayor/wakehost|wakehost]] integrates Proxmox VMs with Wake-on-LAN for homelab automation.
+
+## DNS / Network
+
+After UniFi network controller changes, Proxmox's `systemd-resolved` may lose DNS. See [[nfs-storage]] skill for the fix.
+
+## Access
+
+```bash
+ssh bear@192.168.50.11
+sudo qm list        # list VMs
+sudo pct list       # list LXCs
+sudo pvesm status   # storage pools
+```
+
+## Related
+
+- [[truenas]] — NAS storage (VM 9001 on Proxmox)
+- [[ubuntu]] — Docker host (VM 9003 on Proxmox)
+- [[ice]] — Control plane (may be VM or bare metal)
+- [[grizzley]] — Edge node (may be VM or bare metal)
--- a/homelab/entities/rustfs.md
+++ b/homelab/entities/rustfs.md
@@ -0,0 +1,41 @@
+---
+title: rustfs
+created: 2026-04-28
+updated: 2026-04-28
+type: entity
+tags: [services, storage, s3]
+sources: []
+confidence: medium
+---
+
+# rustfs
+
+**Role:** S3-compatible object storage
+**Host:** [[truenas]] (Docker with bind mount)
+**Data dir:** `/mnt/TrueNAS/rustfs/`
+
+## Overview
+
+rustfs provides S3-compatible object storage backed by [[truenas]] ZFS pool. It runs as a Docker container on the host that has access to the TrueNAS NFS share.
+
+## ⚠️ Critical Gotcha
+
+rustfs **ignores** `RUSTFS_S3_ACCESS_KEY` and `RUSTFS_S3_SECRET_KEY` environment variables on first boot — it uses hardcoded defaults:
+- Access key: `rustfsadmin`
+- Secret key: `rustfsadmin`
+
+This means whatever's passed via env vars is silently discarded on first start.
+
+## Reset Procedure
+
+If you need to reset rustfs (change credentials, recover from misconfiguration):
+1. Stop the rustfs container
+2. Wipe the data directory: `rm -rf /mnt/TrueNAS/rustfs/*`
+3. Restart the container
+4. rustfs re-initializes with the env vars now taking effect
+
+**Wiping the data dir is required** — just stopping the container is not enough.
+
+## Related
+
+- [[truenas]] — Storage backend
--- a/homelab/entities/traefik.md
+++ b/homelab/entities/traefik.md
@@ -0,0 +1,127 @@
+---
+title: traefik
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [services, networking, reverse-proxy, ha, docker]
+sources: []
+---
+
+# traefik
+
+**Role:** Reverse proxy / ingress controller — HA across grizzley + ubuntu
+**Instances:** 2 (ubuntu = PRIMARY, grizzley = BACKUP)
+**Ports:** 80 (HTTP), 443 (HTTPS), 2222 (SSH proxy), 8080 (metrics)
+**Dashboard:** traefik dashboard on each instance
+
+## Overview
+
+Traefik is the reverse proxy for the homelab. It runs in HA mode across [[grizzley]] and [[ubuntu]], handling TLS termination for all incoming traffic. Cloudflare routes DNS to Traefik. Two separate Docker Compose stacks manage each instance independently.
+
+## Instances
+
+| Instance | Host | Role | Ports | Cert Source |
+|----------|------|------|-------|-------------|
+| `traefik` (ubuntu) | ubuntu (192.168.50.61) | **PRIMARY** — handles majority of traffic | 80, 443 | Syncs from grizzley via NFS |
+| `traefik-pi` (grizzley) | grizzley (192.168.50.84) | **BACKUP** + ACME cert generation | 80, 443, 2222, 8080 | Cloudflare DNS challenge |
+
+### Ubuntu (Primary)
+
+Docker Compose: `homelab/ubuntu/traefik/`
+- Network: `proxy-net` (bridge)
+- Reads TLS certs from NFS mount at `/mnt/truenas/traefik-certs/`
+- Prometheus metrics: port 8080
+- Connects via `authentik_authentik-internal` for SSO middleware
+
+### Grizzley (Backup + ACME)
+
+Docker Compose: `homelab/grizzley/traefik-pi/`
+- Network: `traefik-proxy` (bridge)
+- Generates wildcard certs via Cloudflare DNS challenge
+- Writes certs to NFS mount `/mnt/truenas/traefik-certs/grizzley`
+- Prometheus metrics: port 8080
+
+## HA Configuration (Keepalived VRRP)
+
+| Parameter | Value |
+|-----------|-------|
+| Interface | `eth0.50` (VLAN 50) |
+| Virtual Router ID | 51 |
+| grizzley State | BACKUP (priority 90) |
+| ubuntu State | PRIMARY (higher priority) |
+| Virtual IP | 192.168.50.80/27 |
+| Auth | PASS (`HomelabH`) |
+| Check Script | `/etc/keepalived/check_traefik.sh` (2s interval, fall 2, rise 2) |
+
+When ubuntu Traefik fails health checks, keepalived promotes grizzley to MASTER and traffic to 192.168.50.80 fails over automatically.
+
+## Certificate Flow
+
+```
+Cloudflare DNS Challenge
+        ↓
+traefik-pi on grizzley (ACME DNS challenge)
+        ↓
+Writes certs to /mnt/TrueNAS/traefik-certs/grizzley (NFS)
+        ↓
+traefik on ubuntu reads same certs from NFS mount
+        ↓
+Both serve *.tophermayor.com wildcard cert
+```
+
+## Routes (Known)
+
+| Service | URL | Host |
+|---------|-----|------|
+| Authentik | authentik.tophermayor.com | ubuntu |
+| Gitea | gitea.tophermayor.com | ubuntu |
+| OpenCode (ice) | opencode-ice.tophermayor.com | ubuntu → ice:4096 |
+| Jellyfin | jellyfin.tophermayor.com | grizzley |
+| Proxmox | proxmox.tophermayor.com | ubuntu → proxmox |
+| Immich | immich.tophermayor.com | ubuntu |
+| Homepage | home.tophermayor.com | ubuntu |
+
+Dynamic config files in `homelab/ubuntu/traefik/config/dynamic/`:
+
+| File | Services |
+|------|---------|
+| `canonical-hosts.yml` | Grizzley ingress proxy, PVE OpenCode |
+| `gitea.yml` | gitea.tophermayor.com |
+| `immich.yml` | immich.tophermayor.com |
+| `jellyfin.yml` | jellyfin.tophermayor.com |
+| `media-stack.yml` | Sonarr, Radarr, SABnzbd, Prowlarr, qBittorrent |
+| `middlewares.yml` | 30+ middleware definitions |
+| `opencode.yml` | opencode.tophermayor.com |
+| `proxmox.yml` | proxmox.local.tophermayor.com |
+
+## Middlewares
+
+| Middleware | Purpose |
+|------------|---------|
+| `local-only@file` | Restrict to local network IPs |
+| `authentik-auth@file` | SSO authentication |
+| `security-headers@file` | Add security headers |
+| `crowdsec-bouncer@file` | Rate limiting and threat protection |
+
+## Prometheus Monitoring
+
+Both Traefik instances expose Prometheus metrics at `:8080/metrics`. The monitoring stack scrapes:
+- Request rates
+- Error rates
+- Backend health
+
+## Troubleshooting
+
+- ServiceDown alerts: see [[homelab-servicedown-triage]] skill
+- DNS issues: see [[homelab-systemd-resolved-dns]] skill
+- VRRP failover: check `systemctl status keepalived` on grizzley
+- Certificate issues: check NFS mount `/mnt/truenas/traefik-certs/` on both hosts
+- traefik-pi not starting: check `docker logs traefik-pi` on grizzley
+
+## Related
+
+- [[ubuntu]] — Primary Traefik node
+- [[grizzley]] — Backup Traefik node + ACME generation
+- [[truenas]] — NFS storage for cert sync
+- [[authentik]] — SSO behind Traefik
+- [[traefik-ha]] — Full HA concept page
--- a/homelab/entities/truenas.md
+++ b/homelab/entities/truenas.md
@@ -0,0 +1,91 @@
+---
+title: truenas
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [hosts, nas, storage, s3]
+sources: []
+confidence: medium
+---
+
+# truenas
+
+**Role:** NAS — ZFS storage, NFS shares, S3 via [[rustfs]]
+**IP:** 192.168.50.12
+**Hostname:** TrueNAS
+**Running on:** Proxmox VM 9001 (22.9 GB RAM, 32 GB boot disk, **running**)
+**Web UI:** TrueNAS web interface (via browser)
+
+## Overview
+
+TrueNAS provides network storage for the homelab. It serves NFS shares to proxmox and the cluster nodes, and runs [[rustfs]] for S3-compatible object storage. It runs as VM 9001 on [[proxmox]].
+
+## ⚠️ Pool Corruption
+
+**Status:** Pool has known corruption issues. Monitor pool health via TrueNAS web UI.
+
+Monitor for:
+- Pool import failures on boot
+- Checksum errors on disk
+- NFS share timeouts
+
+If the pool becomes unavailable, data on `SHGS31` (47% full, ~460 GB used) and `backups` (31% full, ~4.2 TB used) is at risk.
+
+See [[nfs-storage]] skill for ZFS troubleshooting.
+
+## SSH Access
+
+⚠️ SSH access as `bear` user is **blocked** (Permission denied, publickey). The `bear` user's SSH key is not authorized on TrueNAS.
+
+Options:
+- Use the TrueNAS web UI for management
+- Add `bear`'s SSH key to TrueNAS via the web UI
+- Use `admin` or `root` account if keys are configured
+
+## ZFS Pools
+
+| Pool | Purpose | % Used | Notes |
+|------|---------|--------|-------|
+| `SHGS31` | General storage | 47% (~460 GB) | Main data pool |
+| `backups` | Backup storage | 31% (~4.2 TB) | Large backup volume |
+| `CT1000` | (unknown) | 3% | Smaller pool |
+
+TrueNAS runs with these pools visible in the web UI under Storage.
+
+## Shares
+
+Known NFS exports:
+- `/mnt/TrueNAS/traefik-certs/grizzley` — mounted by [[grizzley]] at `/mnt/truenas/traefik-certs/grizzley` (nfs4, rw)
+
+Other shares to confirm via TrueNAS web UI:
+- `/mnt/TrueNAS/` — main pool mount point
+- May serve to: proxmox, ubuntu, ice
+
+## rustfs (S3)
+
+[[rustfs]] runs on TrueNAS via Docker (on TrueNAS itself or via bind mount) or on [[ubuntu]] as a Docker container connecting to TrueNAS storage.
+
+**Current config on ubuntu:** rustfs Docker container on ubuntu binds to TrueNAS storage path for S3 bucket `obsidian-vault`:
+- Endpoint: `http://192.168.50.12:9000`
+- Access Key: `rustfsadmin`
+- Secret Key: (stored in env or .env file)
+- Bucket: `obsidian-vault`
+
+On first boot, rustfs ignores env vars `RUSTFS_S3_ACCESS_KEY` and `RUSTFS_S3_SECRET_KEY` — uses hardcoded defaults (`rustfsadmin/rustfsadmin`). To reset: stop container, wipe data dir, restart.
+
+## Access
+
+```bash
+# ⚠️ bear user SSH fails — use web UI or fix SSH keys
+ssh admin@192.168.50.12  # may not work
+ssh root@192.168.50.12   # may not work
+# Best: use TrueNAS web UI
+```
+
+## Related
+
+- [[proxmox]] — Proxmox hypervisor (hosts TrueNAS as VM 9001)
+- [[rustfs]] — S3 storage layer
+- [[grizzley]] — NFS client (traefik certs)
+- [[ubuntu]] — NFS client, rustfs container
+- [[ice]] — May NFS mount TrueNAS
--- a/homelab/entities/ubuntu.md
+++ b/homelab/entities/ubuntu.md
@@ -0,0 +1,168 @@
+---
+title: ubuntu
+created: 2026-04-28
+updated: 2026-04-29
+type: entity
+tags: [hosts, docker, primary]
+sources: []
+---
+
+# ubuntu
+
+**Role:** Primary Docker host — runs ~70 containers for the homelab
+**IP:** 192.168.50.61
+**Hostname:** ubuntu
+**Uptime:** 5 days, 11h (as of 2026-04-28)
+**CPU Load:** 7.44 (elevated — investigate if persistent)
+
+## Overview
+
+ubuntu is the workhorse of the homelab — a beefy Intel NUC or server-class machine running Ubuntu with Docker. It hosts approximately 70 containers including authentik SSO, the full monitoring stack, media automation (Sonarr/Radarr/Prowlarr), AI services (whisper, qdrant, reccollection), and the primary Traefik reverse proxy.
+
+## Hardware
+
+| Spec | Detail |
+|------|--------|
+| Model | Intel NUC or server-class x86_64 |
+| CPU | Multi-core x86_64 |
+| RAM | 47 GB total, 31 GB available |
+| Storage | NVMe/SSD (check `df -h` for details) |
+| Network | Gigabit Ethernet |
+| IP | 192.168.50.61 |
+
+## Docker Containers (Live)
+
+### Git & CI/CD
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `gitea` | 2222, 3000/tcp | healthy | Git hosting at gitea.tophermayor.com |
+| `gitea-runner` | 3010/tcp | healthy | Gitea Actions self-hosted runner |
+| `registry` | 5000/tcp | healthy | Private Docker registry |
+
+### Identity & SSO
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `authentik-server` | — | healthy | SSO identity provider |
+| `authentik-worker` | — | healthy | Background worker |
+| `authentik-redis` | 6379/tcp | healthy | Redis for authentik |
+| `postgres-shared` | 5432/tcp (127.0.0.1 + 192.168.50.61) | healthy | Shared PostgreSQL |
+
+### Media Stack
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `jellyfin` | 8096/tcp | healthy | Media server |
+| `sonarr` | — | healthy | TV management |
+| `sonarr-anime` | — | healthy | Anime TV management |
+| `radarr` | — | healthy | Movie management |
+| `radarr-anime` | — | healthy | Anime movie management |
+| `prowlarr` | — | healthy | Indexer aggregation |
+| `lidarr` | — | healthy | Music management |
+| `readarr` | — | healthy | E-book management |
+| `bazarr` | 6767/tcp | healthy | Subtitles |
+| `ombi` | 3579/tcp | healthy | Media request UI |
+| `lazylibrarian` | 5299/tcp | healthy | eBook downloader |
+| `flaresolverr` | 8191-8192/tcp | healthy | Proxy forflare solver |
+| `sabnzbd` | — | healthy | Usenet downloader |
+| `qbittorrent` | — | healthy | BitTorrent downloader |
+| `gluetun` | 8000,8388,8888/tcp; 8388/udp | healthy | VPN (WireGuard/OpenVPN) |
+| `stremio-server` | 11470, 12470/tcp | healthy | Streaming server |
+| `navidrome` | 4533/tcp | healthy | Music streaming |
+| `audiobookshelf` | 80/tcp | healthy | Audiobook streaming |
+| `kavita` | 5000/tcp | healthy | Comic/ebook reader |
+| `calibre` | 3000-3001/tcp | healthy | eBook management |
+| `calibre-web` | 8083/tcp | healthy | Calibre web UI |
+
+### AI & ML Services
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `faster-whisper-server` | 8394/tcp | healthy | Whisper speech-to-text |
+| `qdrant-qdrant-1` | 6333-6334/tcp | healthy | Vector database |
+| `ai-subscriptions` | 8020/tcp | healthy | AI subscription management |
+| `ai-alert-aggregator-frontend-1` | 3002/tcp | healthy | Alert aggregator UI |
+| `ai-alert-aggregator-backend-1` | — | restarting | Alert aggregator backend |
+| `ai-job-pipeline-frontend-1` | 3000/tcp | healthy | Job pipeline UI |
+| `ai-job-pipeline-backend-1` | — | restarting | Job pipeline backend |
+| `ai-media-intelligence-backend-1` | — | restarting | Media AI backend |
+| `reccollection-backend-local` | 3001/tcp | healthy | Recommendation collection backend |
+| `reccollection-frontend-local` | 8081/tcp | healthy | Recommendation collection frontend |
+| `reccollection-postgres-local` | 5432/tcp | healthy | reccollection PostgreSQL |
+| `comparaison` | 3000/tcp | healthy | Comparison service |
+
+### Monitoring Stack
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `prometheus` | 9090/tcp | healthy | Metrics database |
+| `grafana` | 3000/tcp | healthy | Dashboards |
+| `loki` | 3100/tcp | healthy | Log aggregation |
+| `alertmanager` | 9093/tcp | healthy | Alert routing |
+| `blackbox-exporter` | 9115/tcp | healthy | Blackbox probing |
+| `node-exporter` | 9100/tcp | healthy | Host metrics |
+| `cadvisor` | 8080/tcp | healthy | Container metrics |
+| `promtail` | — | healthy | Log scraping |
+
+### Infrastructure & Utility
+
+| Container | Port(s) | Status | Purpose |
+|-----------|---------|--------|---------|
+| `traefik` | 80,443/tcp | healthy | Primary reverse proxy (HA primary) |
+| `homepage-ubuntu` | 3003/tcp | healthy | Homepage dashboard |
+| `rustfs` | 9000-9001/tcp | healthy | S3-compatible storage (TrueNAS backend) |
+| `infisical-backend` | 8080,443/tcp | — | Secrets management |
+| `infisical-db` | 5432/tcp | healthy | Infisical PostgreSQL |
+| `infisical-redis` | 6379/tcp | — | Infisical Redis |
+| `docker-osx` | 5901,50922/tcp | healthy | macOS VM in Docker |
+| `immich_server` | 2283/tcp | healthy | Photo/video backup |
+| `immich_redis` | 6379/tcp | healthy | Immich Redis |
+| `immich_postgres` | 5432/tcp | healthy | Immich PostgreSQL |
+| `immich_machine_learning` | — | healthy | ML for photos |
+| `analyzarr` | 4310/tcp | healthy | Media analysis |
+| `recyclarr` | — | — | Automated arr config sync |
+| `musicseerr` | 8688/tcp | healthy | Music request server |
+| `seerr` | 5055/tcp | healthy | Media request server |
+| `open-computer-use` | 8080/tcp | healthy | Computer use agent (OpenComputerUse) |
+| `unified-media-manager-*` | 80,3000/tcp | healthy | Multi-variant media manager UI |
+
+**Note:** `ai-alert-aggregator-backend-1`, `ai-job-pipeline-backend-1`, `ai-media-intelligence-backend-1` are in a restart loop — investigate.
+
+## Docker Networks
+
+| Network | Driver | Connected services |
+|---------|--------|-------------------|
+| `proxy-net` | bridge | traefik (primary ingress) |
+| `app-net` | bridge | general app containers |
+| `uefi-proxynet` | bridge | — |
+| `authentik_authentik-internal` | bridge | authentik stack |
+| `monitoring_monitoring-internal` | bridge | prometheus, grafana, loki, etc. |
+| `immich_immich-internal` | bridge | immich stack |
+| `reccollection-internal` | bridge | reccollection stack |
+| `ai-subscriptions_default` | bridge | ai-subscriptions |
+| `calibre-web_default` | bridge | calibre-web |
+| `faster-whisper-service_default` | bridge | faster-whisper |
+| `homepage_default` | bridge | homepage |
+| `comparaison_default` | bridge | comparaison |
+| `infisical_infisical` | bridge | infisical stack |
+| `reccollection_default` | bridge | reccollection |
+
+## Traefik Role
+
+ubuntu runs the **primary** Traefik instance (HA mode). It handles the majority of ingress traffic. Certificate sync via NFS from grizzley's traefik-pi. See [[traefik-ha]] for full architecture.
+
+## Access
+
+```bash
+ssh bear@192.168.50.61
+```
+
+## Related
+
+- [[ice]] — Control plane
+- [[grizzley]] — Edge node, Traefik HA backup
+- [[authentik]] — SSO running on ubuntu
+- [[traefik]] — Traefik entity
+- [[proxmox]] — Hosts ubuntu as a VM (VMID 9003)
+- [[truenas]] — NFS/S3 storage backend
--- a/homelab/log.md
+++ b/homelab/log.md
@@ -0,0 +1,133 @@
+---
+title: Homelab Wiki Log
+created: 2026-04-28
+updated: 2026-05-14
+type: log
+tags: [meta]
+---
+
+# Wiki Log
+
+> Chronological record of all wiki actions. Append-only.
+> Format: `## [YYYY-MM-DD] action | subject`
+> Actions: ingest, update, query, lint, create, archive, delete
+> When this file exceeds 500 entries, rotate: rename to `log-YYYY.md`, start fresh.
+
+## [2026-04-28] create | Wiki initialized
+- Domain: Homelab infrastructure (ice, grizzley, ubuntu, proxmox, truenas)
+- Structure created with SCHEMA.md, index.md, log.md
+- Owner: ice (control plane)
+
+## [2026-04-28] migrate | Migrated from ~/wiki to obsidian-vault
+- Merged 11 entity pages from `~/wiki/entities/` into `homelab/entities/`
+- Pages: authentik, gitea, grizzley, hermes-gateway, ice, jellyfin, proxmox, rustfs, traefik, truenas, ubuntu
+- Created SCHEMA.md with Karpathy LLM Wiki conventions
+- Created entities index
+- WIKI_PATH now set to `/home/bear/homelabagentroot/obsidian-vault` on all hosts
+- ~/wiki retired — content unified into Obsidian vault
+
+## [2026-04-28] lint | Vault audit — 103 duplicate/noise files identified
+- agents/forge/ was full duplicate of homelab/raw/articles/forge/
+- 77 blog-tag index files were noise, no wiki value
+- 2 docs files (ai-applications, opencode-cluster) superseded by concept versions
+
+## [2026-04-28] restructure | Phase 1 — forge content deduplication
+- DELETED 101 files from agents/forge/: 23 blog duplicates + 78 blog-tag noise files
+- DELETED 2 superseded docs: homelab/docs/ai-applications.md, homelab/docs/opencode-cluster.md
+- ARCHIVED 38 forge product reference docs to homelab/raw/articles/forge/reference/
+- CREATED homelab/concepts/forge-ai.md — consolidated concept page (agents, commands, MCP, config)
+- Net: 103 files removed, 1 new concept page, 0 duplication
+- Vault: 353 → 249 .md files
+
+## [2026-04-28] restructure | Phase 2 — non-wiki content removed, 5 new concepts
+- Agent memory files → repo .hermes/agents/ (ubuntu-memory/, grizzley-memory/)
+- OpenCode product docs (35 files) → homelab/raw/articles/opencode/docs/
+- ai-assistant/ → 3 concept pages: hermes-opencode-cluster, host-context-detection, vm-storage-policy
+- automation/scripts.md → homelab/concepts/deployment-scripts.md
+- platform-config/overview.md → homelab/concepts/docker-traefik-stack.md
+- Archived 4 old project wrappers to homelab/raw/articles/{ai-assistant,automation,platform-config}/
+- Archived IoT Device Reorganization Plan to homelab/raw/articles/
+- DELETED 6 outdated root docs: vault-readme, repo-readme, opencode-home, opencode-obsidian-integration, AGENTS.md, infrastructure-config
+- Cleaned empty dirs: agents/, ai-assistant/, automation/, platform-config/
+- Updated concepts/index.md (now 14 pages) and root index.md
+- Vault: 249 → 240 .md files
+
+## [2026-04-29] restructure | Phase 3 — break S3 sync cycle, finalize wiki structure
+- CREATED homelab/queries/index.md (was missing)
+- DELETED stale root-level files: AGENTS.md, repo-readme.md, vault-readme.md, opencode-*.md, infrastructure-config.md, IoT Device Reorganization Plan.md
+- DELETED legacy dirs: ai-assistant/, automation/, platform-config/ (content archived to homelab/raw/articles/)
+- ADDED stale files to .gitignore to prevent re-sync from S3 (bidirectional sync was pulling them back)
+- Vault structure now fully aligns with three-layer LLM Wiki schema
+
+## [2026-04-29] lint | Full vault audit — fixed 46 broken wikilinks, updated taxonomy
+- Ran comprehensive lint across layer2 wiki (entities/, concepts/, comparisons/, queries/)
+- Fixed 46 broken wikilinks: .md extensions, relative paths to deleted dirs (ai-assistant/, automation/, platform-config/), homelab/ prefixed skill links
+- Fixed 13 files: authentik, gitea, gitops, jellyfin, media-stack, monitoring-pipeline, nfs-storage, opencode-cluster, proxmox, sso-authentik, traefik, traefik-ha, truenas
+- Updated SCHEMA.md taxonomy: added 10 new tags (vm, identity, docker, reverse-proxy, jellyfin, traefik, ubuntu, proxmox, s3, ci-cd, homelab, control-plane, edge, primary, agents, watchdog, ha, cli, scripts, tools, alerting, automation)
+- All wikilinks now clean (0 broken), 0 orphans, 0 frontmatter issues, 0 stale pages, 0 large pages
+
+## [2026-04-29] update | Host entity pages updated with live configuration data
+- SSH'd to all hosts to capture current state (docker ps, systemctl, df, free, pvesh)
+- Updated entities: ice.md, grizzley.md, ubuntu.md, proxmox.md, truenas.md, traefik.md, hermes-gateway.md
+- Updated concepts: monitoring-pipeline.md (corrected alerting chain to topic 1033 in AigentZeroHermes)
+- Key corrections:
+  - ice: RAM 7.6GB, full systemd service list, no NFS mounts, Docker containers (camofox, hermes-dashboard, opencode-web)
+  - grizzley: RAM 7.7GB + /mnt/fast_share 916GB, VRRP keepalived BACKUP priority 90, NFS mount from truenas, all Docker containers listed
+  - ubuntu: RAM 47GB, full ~70 container list with ports/status, all Docker networks, high CPU load noted (7.44)
+  - proxmox: VMID 9001 TrueNAS running, VMID 9003 ubuntu-server running, PCT 102 traefik, PCT 103 gsd-test; storage pools CT1000/SHGS31/backups/local-zfs
+  - truenas: bear SSH access blocked (Permission denied), pool corruption noted, SHGS31 47% full, backups 31% full
+  - traefik: dual-instance (ubuntu PRIMARY + grizzley BACKUP), keepalived VRRP VI_1 virtual IP 192.168.50.80
+  - hermes-gateway: watchdog via system cron on both ice+grizzley, Telegram topic 1033 in AigentZeroHermes
+
+
+## [2026-04-29] create | homepage entity documented — dual instances, Traefik routes, all widgets
+- Created homelab/entities/homepage.md (12.5KB)
+- Documented both instances: homepage-ubuntu (port 3003, proxy-net) and homepage-grizzley (port 3000, traefik-proxy)
+- All Traefik routes documented: homepage.local.tophermayor.com → ubuntu:3003, homepage-grizzley.local.tophermayor.com → grizzley:3000
+- All 60+ services across both instances catalogued with URLs, icons, and widget configs
+- Widgets documented: Jellyfin, Gluetun, Sonarr (x2), Radarr (x2), Lidarr, SABnzbd, Overseerr, Traefik (x2), Proxmox, TrueNAS, Prometheus, HomeAssistant, UptimeKuma, Komodo
+- Settings (dark theme, Unsplash bg, 4-col layout), bookmarks, docker socket config
+- upstream-ingress.yml gluetun tunnel routes (sonarr-internal, radarr-internal, etc.) documented
+- Updated entities/index.md (total: 11 → 12)
+
+## [2026-05-10] create | Smart home / IoT wiki pages — initial batch
+- CREATED homelab/entities/panda.md — HA host (RPi HAOS, dual-homed, IoT VLAN)
+- CREATED homelab/entities/home-assistant-connect-zbt-2.md — ZBT-2 coordinator (Zigbee + Thread)
+- CREATED homelab/entities/aqara-hub-m3.md — Aqara Matter hub/bridge
+- CREATED homelab/concepts/matter-multi-fabric.md — Multi-admin fabric architecture
+- CREATED homelab/concepts/iot-device-inventory.md — Device inventory by room
+- CREATED homelab/concepts/smart-home-handbook.md — Operational handbook
+- Updated SCHEMA.md with 14 new IoT/smart-home tags
+- Updated entities index (12 → 15) and concepts index (14 → 17)
+- Added SSH key auth to panda for Hermes agent access
+
+## [2026-05-10] ingest | Network device census — Layer 1 raw sources collected
+- INGESTED UniFi controller clients: 46 active devices across 4 VLANs
+  - Source: https://192.168.50.1/proxy/network/api/s/default/stat/sta
+  - Auth: cookie-based (TOKEN), credentials stored
+  - Written to raw/inventories/unifi-clients-2026-05-10.md
+- INGESTED HA device registry: 61 active + 12 deleted devices
+  - Source: http://192.168.30.196:8123 (core.device_registry, core.entity_registry, core.config_entries)
+  - 39 config entries across 26 integration domains
+  - Written to raw/inventories/ha-device-registry-2026-05-10.md
+- INGESTED ARP neighbor tables from grizzley + ubuntu
+  - Written to raw/inventories/arp-neighbors-2026-05-10.md
+- DNS/hosts: No local DHCP server — UniFi controller handles DHCP. Ubuntu has loopback overrides for auth+gitea domains.
+
+## [2026-05-10] create | Network device census — Layer 2 canonical classification
+- CREATED homelab/concepts/network-device-census.md — THE source of truth for all 46+ network devices
+- Classification system: iot-smart-home (28), iot-appliance (2), iot-camera (3), iot-infra (5), infrastructure (6), personal (7), unidentified (3)
+- Cross-referenced UniFi clients with HA device registry and config entries
+- Identified 5 open questions (duplicate HA hostname, unidentified Govee/Somfy devices, Eufy VLAN placement)
+- Updated iot-device-inventory.md with reconciled UniFi↔HA data, Zigbee mesh map, Matter fabric membership table
+- Updated matter-multi-fabric.md with hub-to-device mapping, Thread BR strategy, Matter Bridge plan
+- Updated SCHEMA.md: added `inventory` and `vlan` tags
+- Updated concepts index (17 → 19 pages)
+
+## [2026-05-14] update | Infrastructure recovery + decypharr LXC deployment
+- Traefik outage: 7 broken YAML files fixed (homepage-widgets, audiobookshelf, jellyseerr, kavita, navidrome, stremio, media-stack)
+- postgres-shared container restored on ubuntu for gitea
+- CT 110 decypharr deployed (192.168.50.175:8282, cy01/blackhole)
+- New entity: [[decypharr]]
+- Updated: [[proxmox]] (CT 110 + all LXCs), [[media-stack]] (LXC routing, migration section), [[traefik-ha]] (outage postmortem)
+- Media migration milestone: all *arr services route to LXC IPs, decypharr moved from ubuntu Docker/gluetun to dedicated LXC
--- a/homelab/project.md
+++ b/homelab/project.md
@@ -0,0 +1,73 @@
+---
+project:
+  name: Homelab Infrastructure
+  status: active
+  category: infrastructure
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-23
+  description: Core homelab configuration including DNS, Traefik, Authentik SSO, Proxmox, and container orchestration
+  tags: [infrastructure, homelab, documentation]
+---
+
+# Homelab Infrastructure
+
+## Overview
+
+Multi-host homelab cluster managed via GitOps. 8 hosts across 3 VLANs running ~70 containers and systemd services.
+
+## Architecture
+
+- [[architecture.md|Full Architecture]] — Comprehensive infrastructure documentation with diagrams
+- [[proxmox-setup.md|Proxmox]] — Hypervisor and VM management
+- [[truenas-config.md|TrueNAS]] — ZFS storage configuration
+
+## Hosts
+
+| Host | IP | Role | Services |
+|------|-----|------|----------|
+| [[entities/ice|ubuntu]] | 192.168.50.61 | Primary Docker | ~70 containers, Authentik, Traefik, Gitea, monitoring |
+| [[entities/grizzley|grizzley]] | 192.168.50.84 | Edge Ingress | 14 containers, Traefik HA, Jellyfin, hermes-dashboard |
+| [[entities/ice|ice]] | 192.168.50.197 | Control Plane | Hermes Agent primary, OpenCode backend |
+| [[entities/proxmox|proxmox]] | 192.168.50.11 | Hypervisor | ⚠️ OFFLINE |
+| [[entities/truenas|truenas]] | 192.168.50.12 | NAS | ⚠️ POOL CORRUPTION |
+
+**Full entity docs:** [[entities/index|homelab/entities/]] — detailed host and service pages with runbooks, gotchas, and cross-references.
+
+
+## Services by Category
+
+### Media
+Jellyfin, Radarr, Sonarr, Lidarr, Prowlarr, Jellyseerr, qBittorrent, SABnzbd, Bazarr, Navidrome, Calibre, Kavita, Audiobookshelf, Lazylibrarian, Musicseerr, RecCollection, Unified Media Manager, Tdarr, Stremio
+
+### Auth & SSO
+Authentik (server + worker + redis)
+
+### Monitoring
+Prometheus, Grafana, Loki, Promtail, Alertmanager, Node Exporter, cAdvisor, Blackbox Exporter
+
+### AI/Dev
+Ollama, Gitea, Faster Whisper Server, Docker OSX, Qdrant, Registry
+
+### AI Applications
+AI Job Pipeline, AI Alert Aggregator, AI Media Intelligence, AI Subscriptions, Homelab Inventory
+
+### Infrastructure
+Traefik (ubuntu + grizzley), Gluetun VPN, CrowdSec
+
+### Grizzley Services
+Komodo (stack management), Hermes (Telegram agent), aiomanager, Vaultwarden, Uptime Kuma, Homepage, Minecraft Bedrock
+
+## Related
+
+- [[../automation/|Automation Scripts]]
+- [[../platform-config/|Platform Config]]
+- [[../ai-assistant/|AI Assistant]]
+
+## Tasks
+```dataview
+TASK
+FROM "homelab/tasks"
+WHERE !completed
+SORT file.name ASC
+```
--- a/homelab/proxmox-setup.md
+++ b/homelab/proxmox-setup.md
@@ -0,0 +1,143 @@
+---
+project:
+  name: Proxmox VE Setup
+  status: active
+  category: infrastructure
+  source: infra-config
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Proxmox VE 9.1.4 hypervisor configuration — VMs, LXC containers, GPU passthrough, and storage
+  priority: high
+  tags: [infrastructure, proxmox, virtualization, vm, lxc]
+---
+
+# Proxmox Virtual Environment
+
+Single-node hypervisor hosting all homelab VMs and LXC containers. Verified live state 2026-04-19 via SSH.
+
+## Host Configuration
+
+| Property | Value |
+|----------|-------|
+| **IP** | 192.168.50.11 |
+| **Version** | Proxmox VE 9.1.4 |
+| **RAM** | 125 GB total, ~70 GB used |
+| **Web UI** | https://proxmox.local.tophermayor.com |
+| **Direct** | https://192.168.50.11:8006 |
+| **SSH** | `ssh bear@192.168.50.11` |
+| **Auth** | SSH key (`~/.ssh/id_ed25519`) |
+
+## Virtual Machines
+
+| VMID | Name | Status | RAM | IP | Purpose |
+|------|------|--------|-----|----|---------|
+| 9001 | TrueNAS | Running | 22 GB | 192.168.50.12 | TrueNAS SCALE 25.10.2.1 — ZFS storage, NFS/SMB shares |
+| 9003 | ubuntu-server | Running | 32 GB | 192.168.50.61 | Primary Docker host — 59 containers, NVIDIA GTX 1080 passthrough |
+| 9100 | W10-migrated | Stopped | 16 GB | — | Windows VM (offline) |
+
+### VM Architecture
+
+```mermaid
+graph TD
+    PVE["Proxmox VE 9.1.4<br/>192.168.50.11<br/>125 GB RAM"]
+
+    PVE --> TN["VM 9001: TrueNAS<br/>Running · 22 GB<br/>192.168.50.12"]
+    PVE --> UB["VM 9003: ubuntu-server<br/>Running · 32 GB<br/>192.168.50.61"]
+    PVE --> W10["VM 9100: W10-migrated<br/>Stopped · 16 GB"]
+    PVE --> LX["LXC 102: traefik<br/>Running"]
+
+    TN --> ZFS1["TrueNAS Pool<br/>25.4 TB · 65% used"]
+    TN --> ZFS2["RPiPool<br/>10.9 TB · 5% used"]
+    TN --> NFS["NFS Exports<br/>mediadata, traefik-certs"]
+
+    UB --> GPU["NVIDIA GTX 1080<br/>8 GB VRAM<br/>Driver 535 · CUDA 12.2"]
+    UB --> DOCKER["Docker Engine<br/>59 containers"]
+    DOCKER --> MEDIA["Media Stack"]
+    DOCKER --> IMMICH["Immich"]
+    DOCKER --> AUTH["Authentik SSO"]
+    DOCKER --> MON["Monitoring"]
+    DOCKER --> AI["Ollama / Qdrant"]
+
+    LX --> TRAEFIK["Traefik Reverse Proxy<br/>192.168.50.115"]
+
+    style PVE fill:#e63946,color:#fff
+    style TN fill:#457b9d,color:#fff
+    style UB fill:#2a9d8f,color:#fff
+    style W10 fill:#6c757d,color:#ccc
+    style LX fill:#e9c46a,color:#000
+```
+
+## LXC Containers
+
+| VMID | Name | Status | IP | Purpose |
+|------|------|--------|----|---------|
+| 102 | traefik | Running | 192.168.50.115 | Traefik reverse proxy (LXC) |
+
+## GPU Passthrough
+
+NVIDIA GTX 1080 (8 GB VRAM) passed through to **ubuntu-server** (VM 9003) via VFIO/IOMMU:
+
+| Use Case | Service | Driver Capabilities |
+|----------|---------|---------------------|
+| Video transcoding | Jellyfin | gpu, video, compute, utility |
+| AI inference | Ollama | gpu, compute, utility |
+| ML processing | Immich ML | gpu, video, compute |
+| Media transcoding | Tdarr | gpu, video, compute |
+
+- **Driver**: NVIDIA 535.274.02
+- **CUDA**: 12.2
+
+## Network Configuration
+
+| VLAN | Subnet | Purpose | Hosts |
+|------|--------|---------|-------|
+| Prod | 192.168.1.x | Main network | PVE management, Hyte workstation |
+| Lab | 192.168.50.x | Infrastructure | ubuntu, grizzley, ice, truenas, pve, panda SSH |
+| IoT | 192.168.30.x | Home automation | panda/HA |
+
+- VMs are bridged to the lab VLAN (`vmbr0`)
+- DNS managed via UniFi — `*.tophermayor.com` resolves internally
+- Traefik routes on both ubuntu (VM) and LXC 102
+
+## Storage
+
+| Storage | Type | Purpose |
+|---------|------|---------|
+| `local-zfs` | ZFS pool (Proxmox) | VM disks — thin provisioned |
+| TrueNAS NFS | NFS export (VM 9001) | Media, traefik certs, backups |
+
+TrueNAS provides centralized storage via NFS mounts to ubuntu:
+- `/mnt/truenas/mediadata` — media library (mounted on ubuntu)
+- `/mnt/truenas/traefik-certs/grizzley` — TLS certs (mounted on grizzley)
+
+## Management Commands
+
+```bash
+# List all VMs and containers
+qm list
+pct list
+
+# VM lifecycle
+qm start 9001
+qm shutdown 9003
+qm reboot 9100
+
+# LXC lifecycle
+pct start 102
+pct stop 102
+pct enter 102         # Shell into container
+
+# Snapshots
+qm snapshot 9003 pre-update
+qm listsnapshot 9003
+
+# Status check
+qm status 9001
+pct status 102
+```
+
+## Related Docs
+
+- [[truenas-config.md|TrueNAS Configuration]]
+- [[architecture.md|Homelab Architecture]]
+- [[project.md|Homelab Project]]
--- a/homelab/queries/index.md
+++ b/homelab/queries/index.md
@@ -0,0 +1,16 @@
+---
+title: Homelab Queries Index
+created: 2026-04-29
+updated: 2026-04-29
+type: index
+tags: [meta]
+---
+
+# Queries Index
+
+> Filed Q&A — answers to homelab questions worth keeping. Each entry is a synthesis from compiled wiki knowledge.
+> Last updated: 2026-04-29 | Total pages: 0
+
+## Infrastructure
+
+(no queries yet)
--- a/homelab/raw/articles/ai-assistant/project.md
+++ b/homelab/raw/articles/ai-assistant/project.md
@@ -0,0 +1,61 @@
+---
+project:
+  name: AI Assistant Configuration
+  status: active
+  category: configuration
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-23
+  description: OpenCode agent configuration, skills, and storage workflows
+  tags: [ai, assistant, configuration, opencode]
+---
+
+# AI Assistant Configuration
+
+## OpenCode Cluster
+
+| Instance | Host | Port | Status | Updated |
+|----------|------|------|--------|---------|
+| ubuntu | 192.168.50.61 | 4096 | Active (systemd) | 2026-04-23 |
+| ice | 192.168.50.197 | 4096 | Active (systemd) | 2026-04-23 |
+| grizzley | 192.168.50.84 | 4096 | Inactive/disabled | 2026-04-23 |
+
+## Host Context Detection
+
+Each host clone has a `.host-context` file that identifies the local context.
+
+```bash
+python3 scripts/detect_host_context.py
+```
+
+See [[host-context.md|Host Context Detection]] for details.
+
+## Skills
+
+Skills are located in `.agents/skills/` and `.opencode/`:
+
+- `proxmox-management` — VM/LXC operations
+- `traefik-diagnostic` — Router/service health
+- `truenas-storage` — ZFS pool/share management
+- `authentik-sso` — SSO/OIDC configuration
+- `media-stack` — Radarr, Sonarr, Jellyfin management
+- `komodo-management` — Docker stack deployment
+- `host-power-management` — Wake-on-LAN, VM control
+- `infra-audit` — Live infrastructure verification
+
+## Workflows
+
+- [[workflows.md|VM Storage Policy]] — Storage rules for application data on Ubuntu host
+
+## Related
+
+- [[../automation/|Automation Scripts]]
+- [[../platform-config/|Platform Config]]
+
+## Tasks
+```dataview
+TASK
+FROM "ai-assistant/tasks"
+WHERE !completed
+SORT file.name ASC
+```
--- a/homelab/raw/articles/automation/project.md
+++ b/homelab/raw/articles/automation/project.md
@@ -0,0 +1,34 @@
+---
+project:
+  name: Automation Scripts
+  status: active
+  category: automation
+  source: live-verification
+  created: 2026-01-06
+  updated: 2026-04-19
+  description: Maintenance, deployment, and operational automation scripts
+  tags: [automation, scripts, homelab]
+---
+
+# Automation Scripts
+
+## Overview
+
+Maintenance, deployment, and operational automation scripts for homelab management.
+
+## Components
+
+- [[scripts.md|Scripts Documentation]] — Complete scripts overview
+
+## Related Projects
+
+- [[../homelab/|Homelab Infrastructure]] — Target for automation
+- [[../platform-config/|Platform Config]] — Deployment target
+
+## Tasks
+```dataview
+TASK
+FROM "automation/tasks"
+WHERE !completed
+SORT file.name ASC
+```
--- a/homelab/raw/articles/forge/blog-ai-agent-best-practices.md
+++ b/homelab/raw/articles/forge/blog-ai-agent-best-practices.md
@@ -0,0 +1,254 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/ai-agent-best-practices/
+scraped: 2026-04-28T19:04:57.678110+00:00
+content_hash: c602bf97
+---
+# AI Agent Best Practices: 12 Lessons from AI Pair Programming for Developers
+
+![Cover Image for AI Agent Best Practices: 12 Lessons from AI Pair Programming for Developers](https://forgecode.dev/images/blog/ai-pair-programmer.png)
+
+After 6 months of daily AI pair programming across multiple codebases, here's what actually moves the needle. Skip the hype this is what works in practice.
+
+## TL;DR
+
+Planning & Process:
+
+- Write a plan first, let AI critique it before coding
+- Use edit-test loops: write failing test → AI fixes → repeat
+- Commit small, frequent changes for readable diffs
+
+Prompt Engineering:
+
+- Keep prompts short and specific context bloat kills accuracy
+- Ask for step-by-step reasoning before code
+- Use file references (@path/file.rs:42-88) not code dumps
+
+Context Management:
+
+- Re-index your project after major changes to avoid hallucinations
+- Use tools like gitingest.com for codebase summaries
+- Use Context7 MCP to stay synced with latest documentation
+- Treat AI output like junior dev PRs review everything
+
+What Doesn't Work:
+
+- Dumping entire codebases into prompts
+- Expecting AI to understand implicit requirements
+- Trusting AI with security-critical code without review
+
+---
+
+## 1. Start With a Written Plan (Seriously, Do This First)
+
+Ask your AI to draft a Markdown plan of the feature you're building. Then make it better:
+
+1. Ask clarifying questions about edge cases
+2. Have it critique its own plan for gaps
+3. Regenerate an improved version
+
+Save the final plan as instructions.md and reference it in every prompt. This single step eliminates 80% of "the AI got confused halfway through" moments.
+
+Real example:
+
+```
+Write a plan for adding rate limiting to our API. Include:- Which endpoints need protection- Storage mechanism for rate data- Error responses and status codes- Integration points with existing middlewareNow critique this plan. What did you miss?
+```
+
+---
+
+## 2. Master the Edit-Test Loop
+
+This is TDD but with an AI doing the implementation:
+
+1. Ask AI to write a failing test that captures exactly what you want
+2. Review the test yourself - make sure it tests the right behavior
+3. Then tell the AI: "Make this test pass"
+4. Let the AI iterate - it can run tests and fix failures automatically
+
+The key is reviewing the test before implementation. A bad test will lead to code that passes the wrong requirements.
+
+---
+
+## 3. Demand Step-by-Step Reasoning
+
+Add this to your prompts:
+
+```
+Explain your approach step-by-step before writing any code.
+```
+
+You'll catch wrong assumptions before they become wrong code. AI models that think out loud make fewer stupid mistakes.
+
+---
+
+## 4. Stop Dumping Context, Start Curating It
+
+Large projects break AI attention. Here's how to fix it:
+
+### Use gitingest.com for Codebase Summaries
+
+1. Go to gitingest.com
+2. Enter your repo URL (or replace "github.com" with "gitingest.com" in any GitHub URL)
+3. Download the generated text summary
+4. Reference this instead of copy-pasting files
+
+Instead of: Pasting 10 files into your prompt Do this: "See attached codebase_summary.txt for project structure"
+
+### For Documentation: Use Context7 MCP or Alternatives for Live Docs
+
+Context7 MCP keeps AI synced with the latest documentation by presenting the "Most Current Page" of your docs.
+
+When to use: When your docs change frequently, reference the MCP connection rather than pasting outdated snippets each time.
+
+---
+
+## 5. Version Control Is Your Safety Net
+
+- Commit granularly with git add -p so diffs stay readable
+- Never let uncommitted changes pile up: clean git state makes it easier to isolate AI-introduced bugs and rollback cleanly
+- Use meaningful commit messages: they help AI understand change context
+
+---
+
+## 6. Keep Prompts Laser-Focused
+
+Bad: "Here's my entire codebase. Why doesn't authentication work?"
+
+Good: "@src/auth.rs line 85 panics on None when JWT is malformed. Fix this and add proper error handling."
+
+Specific problems get specific solutions. Vague problems get hallucinations.
+
+Use your code’s terminology in prompts: reference the exact identifiers from your codebase, not generic business terms. For example, call createOrder() and processRefund() instead of 'place order' or 'issue refund', or use UserEntity rather than 'account'. This precision helps the AI apply the correct abstractions and avoids mismatches between your domain language and code.
+
+---
+
+## 7. Re-Index After Big Changes
+
+If you're using AI tools with project indexing, rebuild the index after major refactors. Out-of-date indexes are why AI "can't find" functions that definitely exist.
+
+Most tools auto-index, but force a refresh when things seem off.
+
+---
+
+## 8. Use File References, Not Copy-Paste
+
+Most AI editors support references like @src/database.rs. Use them instead of pasting code blocks.
+
+Benefits:
+
+- AI sees the current file state, not a stale snapshot
+- Smaller token usage = better accuracy
+- Less prompt clutter
+
+Note: Syntax varies by tool (ForgeCode uses @, some use #, etc.)
+
+---
+
+## 9. Let AI Write Tests, But You Write the Specs
+
+Tell the AI exactly what to test:
+
+```
+For the new `validate_email` function, write tests for:- Valid email formats (basic cases)- Invalid formats (no @, multiple @, empty string)- Edge cases (very long domains, unicode characters)- Return value format (should be Result<(), ValidationError>)
+```
+
+AI is good at generating test boilerplate once you specify the cases.
+
+---
+
+## 10. Debug with Diagnostic Reports
+
+When stuck, ask for a systematic breakdown:
+
+```
+Generate a diagnostic report:1. List all files modified in our last session2. Explain the role of each file in the current feature3. Identify why the current error is occurring4. Propose 3 different debugging approaches
+```
+
+This forces the AI to think systematically instead of guess-and-check.
+
+---
+
+## 11. Set Clear Style Guidelines
+
+Give your AI a brief system prompt:
+
+```
+Code style rules:- Use explicit error handling, no unwraps in production code- Include docstrings for public functions- Prefer composition over inheritance- Keep functions under 50 lines- Use `pretty_assertions` in test- Be explicit about lifetimes in Rust- Use `anyhow::Result` for error handling in services and repositories.- Create domain errors using `thiserror`.- Never implement `From` for converting domain errors, manually convert them
+```
+
+Consistent rules = consistent code quality.
+
+---
+
+## 12. Review Everything Like a Senior Engineer
+
+Treat every AI change like a junior developer's PR:
+
+Security Review:
+
+- Check for injection vulnerabilities
+- Verify input validation
+- Look for hardcoded secrets
+
+Performance Review:
+
+- Watch for N+1 queries
+- Check algorithm complexity
+- Look for unnecessary allocations
+
+Correctness Review:
+
+- Test edge cases manually
+- Verify error handling
+- Check for off-by-one errors
+
+The AI is smart but not wise. Your experience matters.
+
+---
+
+## What Doesn't Work (Learn From My Mistakes)
+
+### The "Magic Prompt" Fallacy
+
+There's no perfect prompt that makes AI never make mistakes. Better workflows beat better prompts.
+
+### Expecting Mind-Reading
+
+AI can't infer requirements you haven't stated. "Make it production-ready" means nothing without specifics.
+
+### Trusting AI with Architecture Decisions
+
+AI is great at implementing your design but terrible at high-level system design. You architect, AI implements.
+
+### Ignoring Domain-Specific Context
+
+AI doesn't know your business logic, deployment constraints, or team conventions unless you tell it.
+
+---
+
+## Controversial Take: AI Pair Programming Is Better Than Human Pair Programming
+
+For most implementation tasks.
+
+AI doesn't get tired, doesn't have ego, doesn't argue about code style, and doesn't judge your googling habits. It's like having a junior developer with infinite patience and perfect memory.
+
+But it also doesn't catch logic errors, doesn't understand business context, and doesn't push back on bad ideas. You still need humans for the hard stuff.
+
+---
+
+## Final Reality Check
+
+AI coding tools can significantly boost productivity, but only if you use them systematically. The engineers seeing massive gains aren't using magic prompts they're using disciplined workflows.
+
+Plan first, test everything, review like your production system depends on it (because it does), and remember: the AI is your intern, not your architect.
+
+The future of coding isn't human vs AI it's humans with AI vs humans without it. Choose your side wisely.
+
+## Related Articles
+
+- Claude 4 Opus vs Grok 4: AI Model Comparison for Complex Coding Tasks
+- Claude Sonnet 4 vs Gemini 2.5 Pro Preview: AI Coding Assistant Comparison
+- ForgeCode Performance RCA: Root Cause Analysis of Quality Degradation on July 12, 2025
+- MCP Security Prevention: Practical Strategies for AI Development - Part 2
--- a/homelab/raw/articles/forge/blog-archive.md
+++ b/homelab/raw/articles/forge/blog-archive.md
@@ -0,0 +1,37 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/archive/
+scraped: 2026-04-28T19:05:08.736510+00:00
+content_hash: d317e68a
+---
+# Archive
+
+### 2026
+
+- March 3 - Benchmarks Don't Matter — Until They Do (Part 1)
+- March 16 - Benchmarks Don't Matter — Until They Do (Part 2)
+- March 28 - How to Use Novita AI in ForgeCode: Quick Guide
+
+### 2025
+
+- May 23 - Claude 4 Initial Impressions: A Developer's Review of Anthropic's AI Coding Breakthrough
+- May 26 - Claude Sonnet 4 vs Gemini 2.5 Pro Preview: AI Coding Assistant Comparison
+- May 30 - DeepSeek-R1-0528: A Detailed Review of its AI Coding Performance & Latency
+- June 1 - AI Agent Best Practices: 12 Lessons from AI Pair Programming for Developers
+- June 3 - AI Code Agents: Indexed vs. Non-Indexed Performance for Real-Time Development
+- June 12 - When Google Sneezes, the Whole World Catches a Cold
+- June 17 - MCP Security Prevention: Practical Strategies for AI Development - Part 2
+- June 17 - MCP Security Crisis: Uncovering Vulnerabilities and Attack Vectors - Part 1
+- June 27 - Simple Over Easy: Architectural Constraints for Maintainable AI-Generated Code
+- July 1 - MCP 2025-06-18 Spec Update: AI Security, Structured Output, and User Elicitation for LLMs
+- July 7 - ForgeCode v0.98.0: Integrated Authentication and Developer Experience Improvements
+- July 10 - Claude 4 Opus vs Grok 4: Which Model Dominates Complex Coding Tasks?
+- July 17 - Grok 4 Initial Impressions: Is xAI's New LLM the Most Intelligent AI Model Yet?
+- July 18 - ForgeCode Performance RCA: Root Cause Analysis of Quality Degradation on July 12, 2025
+- July 23 - Kimi K2 vs Qwen-3 Coder: Testing Two AI Models on Coding Tasks
+- July 26 - Kimi K2 vs Grok 4: Which AI Model Codes Better?
+- July 27 - Graduating from Early Access: New Pricing Tiers Now Available
+- August 10 - Claude Sonnet 4 vs Kimi K2 vs Gemini 2.5 Pro: Which AI actually ships production code?
+- August 12 - Coding Agents Showdown: VSCode Forks vs. IDE Extensions vs. CLI Agents
+- August 13 - ForgeCode v0.106.0 Release: Plan Progress Tracking and Reliability Improvements
--- a/homelab/raw/articles/forge/blog-authors.md
+++ b/homelab/raw/articles/forge/blog-authors.md
@@ -0,0 +1,20 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/authors/
+scraped: 2026-04-28T19:04:48.642799+00:00
+content_hash: b36be1e6
+---
+# Authors
+
+ForgeCode ranks #1 on TermBench with 81.8% accuracy.Learn more →
+
+# Authors
+
+- ForgeCode Team8
+- Tushar9
+- Anmol1
+- Arindam Majumder1
+- Amit Singh2
+- Shrijal Acharya1
+- Amitesh Anand1
--- a/homelab/raw/articles/forge/blog-benchmarks-dont-matter.md
+++ b/homelab/raw/articles/forge/blog-benchmarks-dont-matter.md
@@ -0,0 +1,183 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/benchmarks-dont-matter/
+scraped: 2026-04-28T19:04:58.892485+00:00
+content_hash: c953a3ca
+---
+# Benchmarks Don't Matter — Until They Do (Part 1)
+
+![Cover Image for Benchmarks Don't Matter — Until They Do (Part 1)](https://forgecode.dev/images/blog/benchmarks-cover.svg)
+
+We started this project convinced we were in good shape.
+
+ForgeCode is an open-source coding agent. Engineers on X were posting about how good Claude Code felt. We felt the same about ForgeCode in daily usage — fast, capable, generally reliable. We assumed our production agent would translate directly into strong benchmark performance. We were using the same model everyone else was raving about.
+
+So we ran TermBench 2.0 with one engineer dedicated to the exercise. TermBench is a realistic evaluation suite: agents receive coding tasks in a sandboxed terminal environment and must complete them autonomously under strict time constraints. It tests what actually matters — can the agent navigate an unfamiliar codebase, decompose a problem, call tools correctly, and finish the task before context and budget collapse?
+
+We passed 25% of tests.
+
+This post is about how we diagnosed seven distinct failure modes, fixed them systematically, and reached 78.4% SOTA with gemini-3.1-pro-preview — and why those fixes generalized across models instead of overfitting to a single provider.
+
+## Failure Mode 1: Same model, very different performance
+
+Our agent was built for interactive use. It asks clarifying questions when requirements are ambiguous, confirms architectural decisions before proceeding, and checks in with the user when it is uncertain about scope. This is exactly the right behavior in a chat interface.
+
+In a benchmark environment, it is catastrophic.
+
+TermBench tasks are graded on completion. There is no user to answer clarification requests. Every turn spent asking a question is a turn not spent solving the problem. Our agent was failing tasks not because it lacked the intelligence to solve them, but because it was waiting for a human who was never coming.
+
+Fix: We introduced a strict Non-Interactive Mode — a separate runtime profile activated during evaluation:
+
+- System prompt rewritten to prohibit conversational branching and clarification requests
+- Tool behavior changed so the agent assumes reasonable defaults and proceeds
+- Completion logic tightened so the agent commits to an answer rather than hedging
+
+The model was identical. The runtime configuration changed everything.
+
+## Failure Mode 2: Tool descriptions do not guarantee tool correctness
+
+Our assumption: write clear tool descriptions, and models will call them reliably.
+
+Reality: tool misuse was one of the top two failure classes in our initial runs. The failures broke down into three distinct categories:
+
+- Wrong tool selected — agent uses shell to apply a code edit instead of the structured edit tool
+- Correct tool, wrong argument names — field names close but not matching the schema
+- Correct tool, correct arguments, wrong sequencing — tool called before its preconditions are met
+
+These failure classes mix together in aggregate pass rate, which makes them nearly invisible without targeted micro-evals. We had to build separate, single-purpose evaluations that isolate each class per tool, per model. Aggregate scoring alone will not catch this.
+
+## Failure Mode 3: Tool and argument naming is a reliability variable, not an aesthetic choice
+
+This one surprised us most.
+
+Models have strong priors from training about what tool calls look like. When your tool names conflict with those priors or your argument names fall outside the patterns the model has seen, error rates climb — not because the model can't understand the description, but because it pattern-matches against training data first.
+
+Concrete example: our file edit tool had generic internal argument names. We renamed them to old_string and new_string — names that appear frequently in training data for this kind of operation. Tool-call error rate on that tool dropped measurably in the same evaluation pass, same model, same prompt.
+
+This is not a small effect. If you are seeing persistent tool-call errors and attribute them entirely to model capability, check your naming first. We address this at the runtime layer — more on that in the ForgeCode Services section below.
+
+## Failure Mode 4: Context size is a multiplier on the right entry point, not a substitute for it
+
+The conventional wisdom is that more context means better performance. The nuanced reality is that context only helps once the agent is oriented correctly.
+
+In TermBench tasks, the agent has to explore an unfamiliar codebase. If it finds the right entry point early — the relevant file, function, or module where the actual problem lives — more context helps it reason more deeply from that point. If it never finds the right entry point, more context just means it explores more of the wrong area more thoroughly.
+
+The real bottleneck is entry-point discovery latency, not token count. We built a semantic analysis layer specifically for this — described in the ForgeCode Services section below.
+
+## Failure Mode 5: Time limits punish trajectories, not just wrong answers
+
+The common belief: if the model is smart enough, it will eventually solve the problem.
+
+TermBench is a constrained system. Each task has a strict wall-clock time budget — run out of time and the task is marked failed, same as a wrong answer. Each failed tool call, each exploratory dead end, and each redundant read burns real seconds. Agents that drift — spending time on exploration when they should be executing — exhaust their budget without completing the task.
+
+The problem is not that the model cannot solve the task. The problem is that a brilliant but meandering trajectory times out just as definitively as an incorrect one.
+
+## Failure Mode 6: Planning tools only work if you enforce them
+
+We had a todo_write tool available from the beginning. It lets the agent maintain an explicit task list — creating items, marking them in-progress, marking them complete. We documented it. We mentioned it in the system prompt. We assumed the agent would use it when appropriate.
+
+It did not use it consistently. The agent would begin multi-step tasks, complete some sub-tasks, lose track of others, and then either repeat work or skip steps entirely — all while the task list sat empty.
+
+The issue is not model capability. It is that optional tools get deprioritized under pressure. When an agent is inside a complex problem, it takes the path of least resistance: the next tool call that seems relevant, not the one that maintains long-term planning state.
+
+Fix: We made todo_write non-optional for decomposed tasks by building low-level evals that assert it:
+
+- todo_write must be called to create items when a multi-step task is identified
+- Each item must be updated as the agent progresses
+- Completion must be explicitly marked
+
+We treated failure to call todo_write as a reliability failure class in our eval suite, not just a stylistic miss. Tasks that decompose correctly but lack tracking state are graded as at-risk.
+
+After integrating this enforcement layer: 38% → 66% pass rate.
+
+## Failure Mode 7: TermBench is more about speed than intelligence
+
+This is the one that changed our architecture most significantly.
+
+A very intelligent agent with a slow reasoning trajectory still fails TermBench tasks because the benchmark imposes a strict wall-clock time limit per task — timeout is failure. An agent that slowly deep-reasons its way to the perfect solution loses to one that finds a good-enough solution fast enough to finish within budget.
+
+This forced two structural changes:
+
+Subagent parallelization for low-complexity work. We split tasks by difficulty. Easier, parallelizable subtasks — file reads, pattern searches, routine edits — are delegated to subagents running with low/minimal thinking budget. This keeps the main agent's latency low on work that does not need deep reasoning.
+
+Progressive thinking policy on the main agent. Rather than running full thinking budget throughout, we applied a tiered policy:
+
+1. First 10 assistant messages: very high thinking — this is where the agent forms its plan, identifies the problem structure, and selects its approach. Getting this right is worth the latency.
+2. Messages 11 onward: low thinking by default — execution phase. The plan is set; the agent should act, not re-deliberate.
+3. If a verification skill is called: switch back to high thinking — verification is a decision point where wrong answers cascade.
+
+The threshold of 10 messages was calibrated against task complexity distributions in TermBench. Most tasks show the critical decision-making concentrated in early messages; later messages are primarily mechanical execution.
+
+## Performance Trajectory
+
+| Phase | Change | Pass Rate |
+|---|---|---|
+| Baseline | Interactive-first runtime, no planning enforcement | ~25% |
+| Stabilization | Non-Interactive mode + tool-call naming + micro-evals | ~38% |
+| Planning control | todo_write enforcement via low-level evals | 66% |
+| Speed architecture | Subagent parallelization + progressive thinking + skill routing | 78.4% (SOTA) |
+
+Each phase was a targeted intervention against a specific failure class, not a general quality improvement. That specificity is what makes the result reproducible.
+
+An open-source agent. No proprietary model fine-tuning. The #1 position on TermBench 2.0 came from runtime engineering, not scale.
+
+To put that in context: Google reports gemini-3.1-pro-preview scoring 68.5% on TermBench — that is the number the model gets running as Google ships it. We ran the same model and scored 78.4%. The delta is not a better model. It is better harness. Same weights, 10 percentage points higher.
+
+## What ForgeCode Services does under the hood
+
+The failure modes above demanded capabilities that go beyond what the open-source agent handles alone. That work became ForgeCode Services — a proprietary runtime layer that sits on top of the open-source ForgeCode agent. It is currently available for free.
+
+1. Semantic entry-point discovery. Before the agent begins exploring, a lightweight semantic pass identifies the most likely starting files and functions based on task description. This converts random codebase exploration into directed traversal.
+
+2. Dynamic skill loading. Skills — specialized instruction sets for particular task types — are loaded only when the task profile requires them. A task involving test-writing loads the testing skill. A task involving debugging does not. This keeps context lean and relevant.
+
+3. Tool-call correction layer. A heuristic + static analysis layer runs before each tool call is dispatched. It checks argument validity, catches common error patterns, and applies corrections where possible. Errors that would fail silently are caught at the dispatch boundary.
+
+4. todo_write enforcement. Task decomposition triggers mandatory planning state updates. The agent is not trusted to remember to update its task list; the runtime asserts it.
+
+5. Reasoning budget control. The progressive thinking policy is applied automatically based on turn count and skill invocation signals. The agent does not manage its own reasoning budget explicitly.
+
+The result generalizes across models because none of these five components depend on model-specific behavior. They are constraints and scaffolding applied at the runtime layer, below the model.
+
+## Using benchmarks without fooling yourself
+
+The 78.4% is a result, not the goal. Run TermBench to answer operational questions about your agent system:
+
+- Is your context engine actually efficient under pressure, or does it bloat and stall?
+- Are your tools named and described in a way that aligns with model priors across providers?
+- Are tools being called when they should be, not just when the model feels like it?
+- Does your caching behave correctly under the access patterns a benchmark generates?
+
+TermBench will not answer all of your reliability questions. What it will do is surface failure modes that are invisible in interactive usage, where a patient user compensates for agent drift and tool errors.
+
+The real value is downstream: each TermBench failure class becomes a smaller, cheaper eval that you can run in CI/CD continuously. We now have evals in our pipeline that gate releases on:
+
+- Tool-call correctness rates per tool, per model
+- todo_write compliance for decomposed tasks
+- Entry-point discovery precision
+- Skill routing accuracy
+
+These run in minutes. They are not TermBench. But they exist because TermBench showed us exactly where to look.
+
+If your skill engine routes to the wrong skill, the model fails regardless of raw capability. Refining skill selection is one of the highest-leverage improvements available in an agent system that uses skill-based context loading.
+
+## What comes next
+
+We are expanding measurement across dimensions that aggregate pass rate obscures:
+
+- Per-tool reliability score by model — different models have different weak tools
+- Entry-point discovery latency distribution — not just whether the agent gets there, but how much time it costs
+- Recovery rate after the first tool-call error in a trajectory
+- Time-efficiency curves under tight budgets — does the agent spend its time wisely or drift?
+- Cross-model variance on the same task slices — where do models diverge, and why?
+
+The headline is 78.4% SOTA with gemini-3.1-pro-preview — the #1 result on TermBench 2.0, built by a team of three on an open-source agent. The actual output of this work is an agent runtime that holds up under structured pressure and a diagnostic system that tells us specifically what to fix when it does not.
+
+If you're building agents: don't run a benchmark to get a number. Run it to find out which part of your system is lying to you in production.
+
+The ForgeCode agent is open-source at github.com/antinomyhq/forge. ForgeCode Services — the runtime layer that powered the 78.4% result — is proprietary (for now) but currently available for free.
+
+---
+
+Continue reading: Benchmarks Don't Matter — Until They Do (Part 2) — how we reached 81.8% with both GPT 5.4 and Opus 4.6, and what we had to change in the agent to get there.
--- a/homelab/raw/articles/forge/blog-claude-4-initial-impressions-anthropic-ai-coding-breakthrough.md
+++ b/homelab/raw/articles/forge/blog-claude-4-initial-impressions-anthropic-ai-coding-breakthrough.md
@@ -0,0 +1,125 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/claude-4-initial-impressions-anthropic-ai-coding-breakthrough/
+scraped: 2026-04-28T19:05:01.965576+00:00
+content_hash: 3c96a980
+---
+# Claude 4 Initial Impressions: A Developer's Review of Anthropic's AI Coding Breakthrough
+
+Claude 4 achieved a groundbreaking 72.7% on SWE-bench Verified, surpassing OpenAI's latest models and setting a new standard for AI-assisted development. After 24 hours of intensive testing with challenging refactoring scenarios, I can confirm these benchmarks translate to remarkable real-world capabilities.
+
+Anthropic unveiled Claude 4 at their inaugural developer conference on May 22, 2025, introducing both Claude Opus 4 and Claude Sonnet 4. As someone actively building coding assistants and evaluating AI models for development workflows, I immediately dove into extensive testing to validate whether these models deliver on their ambitious promises.
+
+## What Sets Claude 4 Apart
+
+Claude 4 represents more than an incremental improvement—it's Anthropic's strategic push toward "autonomous workflows" for software engineering. Founded by former OpenAI researchers, Anthropic has been methodically building toward this moment, focusing specifically on the systematic thinking that defines professional development practices.
+
+The key differentiator lies in what Anthropic calls "reduced reward hacking"—the tendency for AI models to exploit shortcuts rather than solve problems properly. In my testing, Claude 4 consistently chose approaches aligned with software engineering best practices, even when easier workarounds were available.
+
+## Benchmark Performance Analysis
+
+The SWE-bench Verified results tell a compelling story about real-world coding capabilities:
+
+Figure 1: SWE-bench Verified performance comparison showing Claude 4's leading position in practical software engineering tasks
+
+- Claude Sonnet 4: 72.7%
+- Claude Opus 4: 72.5%
+- OpenAI Codex 1: 72.1%
+- OpenAI o3: 69.1%
+- Google Gemini 2.5 Pro Preview: 63.2%
+
+### Methodology Transparency
+
+Some developers have raised questions about Anthropic's "parallel test-time compute" methodology and data handling practices. While transparency remains important, my hands-on testing suggests these numbers reflect authentic capabilities rather than benchmark gaming.
+
+## Real-World Testing: Advanced Refactoring Scenarios
+
+I focused my initial evaluation on scenarios that typically expose AI coding limitations: intricate, multi-faceted problems requiring deep codebase understanding and architectural awareness.
+
+### The Ultimate Test: Resolving Interconnected Test Failures
+
+My most revealing challenge involved a test suite with 10+ unit tests where 3 consistently failed during refactoring work on a complex Rust-based project. These weren't simple bugs—they represented interconnected issues requiring understanding of:
+
+- Data validation logic architecture
+- Asynchronous processing workflows
+- Edge case handling in parsing systems
+- Cross-component interaction patterns
+
+After hitting limitations with Claude Sonnet 3.7, I switched to Claude Opus 4 for the same challenge. The results were transformative.
+
+### Performance Comparison Across Models
+
+The following table illustrates the dramatic difference in capability:
+
+| Model | Time Required | Cost | Success Rate | Solution Quality | Iterations |
+|---|---|---|---|---|---|
+| Claude Opus 4 | 9 minutes | $3.99 | ✅ Complete fix | Comprehensive, maintainable | 1 |
+| Claude Sonnet 4 | 6m 13s | $1.03 | ✅ Complete fix | Excellent + documentation | 1 |
+| Claude Sonnet 3.7 | 17m 16s | $3.35 | ❌ Failed | Modified tests instead of code | 4 |
+
+Figure 2: Comparative analysis showing Claude 4's superior efficiency and accuracy in resolving multi-faceted coding challenges
+
+### Key Observations
+
+Single-Iteration Resolution: Both Claude 4 variants resolved all three failing tests in one comprehensive pass, modifying 15+ of lines across multiple files with zero hallucinations.
+
+Architectural Understanding: Rather than patching symptoms, the models demonstrated genuine comprehension of system architecture and implemented solutions that strengthened overall design patterns.
+
+> Engineering Discipline: Most critically, both models adhered to my instruction not to modify tests—a principle Claude Sonnet 3.7 eventually abandoned under pressure.
+
+## Revolutionary Capabilities
+
+### System-Level Reasoning
+
+Claude 4 excels at maintaining awareness of broader architectural concerns while implementing localized fixes. This system-level thinking enables it to anticipate downstream effects and implement solutions that enhance long-term maintainability.
+
+### Precision Under Pressure
+
+The models consistently chose methodical, systematic approaches over quick fixes. This reliability becomes crucial in production environments where shortcuts can introduce technical debt or system instabilities.
+
+### Agentic Development Integration
+
+Claude 4 demonstrates particular strength in agentic coding environments like ForgeCode, maintaining context across multi-file operations while executing comprehensive modifications. This suggests optimization specifically for sophisticated development workflows.
+
+## Pricing and Availability
+
+### Cost Structure
+
+| Model | Input (per 1M tokens) | Output (per 1M tokens) |
+|---|---|---|
+| Opus 4 | $15 | $75 |
+| Sonnet 4 | $3 | $15 |
+
+### Platform Access
+
+Claude 4 is available through:
+
+- Amazon Bedrock
+- Google Cloud's Vertex AI
+- OpenRouter
+- Anthropic API
+
+## Initial Assessment: A Paradigm Shift
+
+After intensive testing, Claude 4 represents a qualitative leap in AI coding capabilities. The combination of benchmark excellence and real-world performance suggests we're witnessing the emergence of truly agentic coding assistance.
+
+### What Makes This Different
+
+- Reliability: Consistent adherence to engineering principles under pressure
+- Precision: Single-iteration resolution of multi-faceted problems
+- Integration: Seamless operation within sophisticated development environments
+- Scalability: Maintained performance across varying problem dimensions
+
+### Looking Forward
+
+The true test will be whether Claude 4 maintains these capabilities under extended use while proving reliable for mission-critical development work. Based on initial evidence, we may be witnessing the beginning of a new era in AI-assisted software engineering.
+
+Claude 4 delivers on its ambitious promises with measurable impact on development productivity and code quality. For teams serious about AI-assisted development, this release warrants immediate evaluation.
+
+## Related Articles
+
+- Claude 4 Opus vs. Grok 4 Comparison: A Deep Dive into AI Coding Capabilities
+- Grok 4 Initial Impression: AI Coding Assistant for Developers
+- AI Agent Best Practices: Maximizing Productivity with ForgeCode
+- Deepseek R1 0528 Coding Experience: Enhancing AI-Assisted Development
--- a/homelab/raw/articles/forge/blog-claude-4-opus-vs-grok-4-comparison-full.md
+++ b/homelab/raw/articles/forge/blog-claude-4-opus-vs-grok-4-comparison-full.md
@@ -0,0 +1,119 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/claude-4-opus-vs-grok-4-comparison-full/
+scraped: 2026-04-28T19:04:58.440214+00:00
+content_hash: d4e256ae
+---
+# Claude 4 Opus vs Grok 4: Which Model Dominates Complex Coding Tasks?
+
+I've been knee-deep in AI-assisted coding for months, and when Grok 4 dropped, I couldn't resist throwing it into the ring with Claude 4 Opus. Using the same 15 complex tasks involving race conditions, deadlocks, and multi-file refactors in a Rust codebase of about ~28k lines of code, I put them head-to-head.
+
+The bottom line? Grok 4 is a powerhouse for identifying complicated, hard-to-find bugs like deadlocks in a complex tokio based async Rust project. It's significantly cheaper per task but can occasionally ignore custom instructions. Claude 4 Opus, while more expensive, is more obedient and reliable, especially when you need it to follow specific rules.
+
+Grok comes with frustratingly low rate limits.
+
+## Testing Methodology and Technical Setup
+
+I threw both models at actual Rust projects I've been working on, focusing on the stuff that actually matters to me: finding bugs, cleaning up code, and using tools properly. Same prompts for both to keep things fair.
+
+### Test Environment Specifications
+
+Hardware Configuration:
+
+- MacBook Pro M2 Pro, 16GB RAM
+- Network: 500Mbps connection
+- Development Environment: VS Code, with ForgeCode running on integrated Terminal for AI interactions
+
+API Configuration:
+
+- Claude 4 Opus: Anthropic API
+- Grok 4: xAI API
+- Request timeout: 120 seconds
+- Max retries: 3
+
+Task Specifications:
+
+- 15 tasks involving concurrency issues, code refactors, and fixes
+- Mix of small (under 128k tokens) and larger contexts upto 200k tokens
+- Custom rules for Design patterns, Library usage and Like using Pretty assertions in tests etc.
+
+Claude 4 Opus
+
+- Context Window: 200,000 tokens
+- Input Cost: ~$15/1M tokens
+- Output Cost: ~$75/1M tokens
+- Tool Calling: Native support
+
+Grok 4
+
+- Context Window: 128,000 tokens (effective, with doubling cost beyond)
+- Input Cost: ~$3/1M tokens (doubles after 128k)
+- Output Cost: ~$15/1M tokens (doubles after 128k)
+- Tool Calling: Native support
+
+Figure 1: Speed and cost comparison across 15 tasks
+
+## Performance Analysis: Quantified Results
+
+### Execution Metrics
+
+| Metric | Claude 4 Opus | Grok 4 | Notes |
+|---|---|---|---|
+| Avg Response Time | 13-24s | 9-15s | Grok 2x faster per request |
+| Single-Prompt Success | 8/15 | 9/15 | Both reached 15/15 with follow-ups |
+| Avg Cost per Task | $13 USD | $4.5 USD | Grok cheaper for small contexts |
+| Tool Calling Accuracy | ~99% (1614/1630) | ~99% (1785/1803) | Near-perfect for both |
+| XML Tool Calling Accuracy | 83% | 78% | Opus slightly better |
+| Bug Detection | Missed race conditions/deadlocks | Detected all | Grok stronger in concurrency |
+| Rule Adherence | Excellent | Good (ignored in 2/15) | Opus followed custom rules better |
+
+Test Sample: 15 tasks, repeated 3 times for consistency Confidence Level: High, based on manual verification
+
+## Speed and Efficiency: Grok's Edge with a Catch
+
+Grok 4 was consistently faster, 9-15 seconds versus Opus's 13-24 seconds. This made quick iterations feel way snappier. But then I kept slamming into xAI's rate limits every few requests. It turned what should've been a quick test session into a stop-and-wait nightmare. I couldn't even get clean timing data because I was constantly throttled.
+
+## Cost Breakdown: Savings That Scale...
+
+Grok 4 cost me $4.50 per task on average while Opus hit $13. That's a big win for smaller jobs. But Grok's pricing doubles after 128k tokens. Opus pricing stays flat.
+
+Here's what Grok's pricing structure looks like in practice:
+
+Figure 3: Grok 4 standard pricing for contexts under 128k tokens
+
+When you enable "higher context pricing" (which kicks in automatically for larger contexts), the costs double:
+
+Figure 4: Grok 4 pricing for contexts over 128k tokens - notice the doubled rates
+
+## Accuracy and Capabilities: Where Grok Shines (and Slips)
+
+Grok 4 impressed me by spotting a deadlock in a tokio::RwLock-based setup that Opus completely missed. In one task, Grok identified a subtle thread drop that prevented the panic hook from executing in a Rust async block. Something Opus glossed over.
+
+Both nailed tool calling at 99% accuracy, picking the right tools with valid args nearly every time. Switching to an XML-based setup dropped that: Opus hit 83%, Grok 78%. Solid, but not flawless.
+
+Rule-following was where things got interesting. My custom rules (tuned over months using Anthropic's eval console) worked perfectly with Opus. Grok ignored them twice out of 15 tasks. Could be because I optimized these rules specifically for Claude models, but it still broke my flow when it happened.
+
+For single-prompt completions, Grok edged out with 9/15 versus Opus's 8/15. With follow-up instructions, both aced everything, showing they're both capable but Grok might "get it" faster out of the gate.
+
+## Frustrations and Real-World Implications
+
+The rate limiting on Grok was incredibly frustrating. I'd send a request, get a good response, then hit a wall for the next few minutes. It completely killed my testing momentum.
+
+In terms of model behavior, Opus felt more "obedient," sticking to rules without deviation. Grok was bolder, sometimes ignoring constraints for what it thought was a better approach. That creativity helped with bug hunting but could lead to scope creep in team settings.
+
+## Conclusion
+
+After all this, I'm leaning toward Grok 4 for complex tasks purely for the cost savings and speed, plus that eagle-eye for complex bugs. It completed more tasks on the first try and ran cheaper, even if the rate limits drove me nuts. Opus is reliable and follows rules consistently, making it the safer choice when you need predictable results and can't afford surprises.
+
+Ultimately, Grok 4's value won me over for my specific needs, but definitely test both yourself. Each has clear strengths depending on what you're building.
+
+## Try Grok 4 on ForgeCode
+
+We've enabled Grok 4 on ForgeCode! If you're curious to experience the speed and bug-hunting capabilities we discussed, sign up for ForgeCode and give it a shot. You can compare it directly with Claude 4 Opus and see which model works better for your specific coding tasks.
+
+## Related posts
+
+1. Deepseek R1-0528 Coding experience
+2. Claude Sonnet 4 vs Gemini 2.5 Pro
+3. Claude 4 initial Impression
--- a/homelab/raw/articles/forge/blog-claude-sonnet-4-vs-gemini-2-5-pro-preview-coding-comparison.md
+++ b/homelab/raw/articles/forge/blog-claude-sonnet-4-vs-gemini-2-5-pro-preview-coding-comparison.md
@@ -0,0 +1,238 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/claude-sonnet-4-vs-gemini-2-5-pro-preview-coding-comparison/
+scraped: 2026-04-28T19:04:54.606187+00:00
+content_hash: 2250ad78
+---
+# Claude Sonnet 4 vs Gemini 2.5 Pro Preview: AI Coding Assistant Comparison
+
+After conducting extensive head-to-head testing between Claude Sonnet 4 and Gemini 2.5 Pro Preview using identical coding challenges, I've uncovered significant performance disparities that every developer should understand. My findings reveal critical differences in execution speed, cost efficiency, and most importantly, the ability to follow instructions precisely.
+
+## Testing Methodology and Technical Setup
+
+I designed my comparison around real-world coding scenarios that test both models' capabilities in practical development contexts. The evaluation focused on a complex Rust project refactor task requiring understanding of existing code architecture, implementing changes across multiple files, and maintaining backward compatibility.
+
+### Test Environment Specifications
+
+Hardware Configuration:
+
+- MacBook Pro M2 Max, 16GB RAM
+- Network: 1Gbps fiber connection
+- Development Environment: VS Code with Rust Analyzer
+
+API Configuration:
+
+- Claude Sonnet 4: OpenRouter
+- Gemini 2.5 Pro Preview: OpenRouter
+- Request timeout: 60 seconds
+- Max retries: 3 with exponential backoff
+
+Project Specifications:
+
+- Rust 1.75.0 stable toolchain
+- 135000+ lines of code across 15+ modules
+- Complex async/await patterns with tokio runtime
+
+### Technical Specifications
+
+Claude Sonnet 4
+
+- Context Window: 200,000 tokens
+- Input Cost: $3/1M tokens
+- Output Cost: $15/1M tokens
+- Response Formatting: Structured JSON with tool calls
+- Function calling: Native support with schema validation
+
+Gemini 2.5 Pro Preview
+
+- Context Window: 2,000,000 tokens
+- Input Cost: $1.25/1M tokens
+- Output Cost: $10/1M tokens
+- Response Formatting: Native function calling
+
+Figure 1: Execution time and cost comparison between Claude Sonnet 4 and Gemini 2.5 Pro Preview
+
+## Performance Analysis: Quantified Results
+
+### Execution Metrics
+
+| Metric | Claude Sonnet 4 | Gemini 2.5 Pro Preview | Performance Ratio |
+|---|---|---|---|
+| Execution Time | 6m 5s | 17m 1s | 2.8x faster |
+| Total Cost | $5.849 | $2.299 | 2.5x more expensive |
+| Task Completion | 100% | 65% | 1.54x completion rate |
+| User Interventions | 1 | 3+ | 63% fewer interventions |
+| Files Modified | 2 (as requested) | 4 (scope creep) | 50% better scope adherence |
+
+Test Sample: 15 identical refactor tasks across different Rust codebases Confidence Level: 95% for all timing and completion metrics Inter-rater Reliability: Code review by senior developers
+
+Figure 2: Technical capabilities comparison across key development metrics
+
+## Instruction Adherence: A Critical Analysis
+
+The most significant differentiator emerged in instruction following behavior, which directly impacts development workflow reliability.
+
+### Scope Adherence Analysis
+
+Claude Sonnet 4 Behavior:
+
+- Strict adherence to specified file modifications
+- Preserved existing function signatures exactly
+- Implemented only requested functionality
+- Required minimal course correction
+
+Gemini 2.5 Pro Preview Pattern:
+
+```
+User: "Only modify x.rs and y.rs"Gemini: [Modifies x.rs, y.rs, tests/x_tests.rs, Cargo.toml]User: "Please stick to the specified files only"Gemini: [Reverts some changes but adds new modifications to z.rs]
+```
+
+This pattern repeated across multiple test iterations, suggesting fundamental differences in instruction processing architecture.
+
+## Cost-Effectiveness Analysis
+
+While Gemini 2.5 Pro Preview appears more cost-effective superficially, comprehensive analysis reveals different dynamics:
+
+### True Cost Calculation
+
+Claude Sonnet 4:
+
+- Direct API Cost: $5.849
+- Developer Time: 6 minutes
+- Completion Rate: 100%
+- Effective Cost per Completed Task: $5.849
+
+Gemini 2.5 Pro Preview:
+
+- Direct API Cost: $2.299
+- Developer Time: 17+ minutes
+- Completion Rate: 65%
+- Additional completion cost: ~$1.50 (estimated)
+- Effective Cost per Completed Task: $5.83
+
+When factoring in developer time at $100k/year ($48/hour):
+
+- Claude total cost: $10.70 ($5.85 + $4.85 time)
+- Gemini total cost: $16.48 ($3.80 + $12.68 time)
+
+## Model Behavior Analysis
+
+### Instruction Processing Mechanisms
+
+The observed differences stem from distinct architectural approaches to instruction following:
+
+Claude Sonnet 4's Constitutional AI Approach:
+
+- Explicit constraint checking before code generation
+- Multi-step reasoning with constraint validation
+- Conservative estimation of scope boundaries
+- Error recovery through constraint re-evaluation
+
+Gemini 2.5 Pro Preview's Multi-Objective Training:
+
+- Simultaneous optimization for multiple objectives
+- Creative problem-solving prioritized over constraint adherence
+- Broader interpretation of improvement opportunities
+- Less explicit constraint boundary recognition
+
+### Error Pattern Documentation
+
+Common Gemini 2.5 Pro Preview Deviations:
+
+1. Scope Creep: 78% of tests involved unspecified file modifications
+2. Feature Addition: 45% included unrequested functionality
+3. Breaking Changes: 23% introduced API incompatibilities
+4. Incomplete Termination: 34% claimed completion without finishing core requirements
+
+Claude Sonnet 4 Consistency:
+
+1. Scope Adherence: 96% compliance with specified constraints
+2. Feature Discipline: 12% minor additions (all beneficial and documented)
+3. API Stability: 0% breaking changes introduced
+4. Completion Accuracy: 94% accurate completion assessment
+
+### Scalability Considerations
+
+Enterprise Integration:
+
+- Claude: Better instruction adherence reduces review overhead
+- Gemini: Lower cost per request but higher total cost due to iterations
+
+Team Development:
+
+- Claude: Predictable behavior reduces coordination complexity
+- Gemini: Requires more experienced oversight for optimal results
+
+## Benchmark vs Reality Gap
+
+While Gemini 2.5 Pro Preview achieves impressive scores on standardized benchmarks (63.2% on SWE-bench Verified), real-world performance reveals the limitations of benchmark-driven evaluation:
+
+Benchmark Optimization vs. Practical Utility:
+
+- Benchmarks reward correct solutions regardless of constraint violations
+- Real development prioritizes maintainability and team coordination
+- Instruction adherence isn't measured in most coding benchmarks
+- Production environments require predictable, controllable behavior
+
+## Advanced Technical Insights
+
+### Memory Architecture Implications
+
+The 2M token context window advantage of Gemini 2.5 Pro Preview provides significant benefits for:
+
+- Large codebase analysis
+- Multi-file refactoring with extensive context
+- Documentation generation across entire projects
+
+However, this advantage is offset by:
+
+- Increased tendency toward scope creep with more context
+- Higher computational overhead leading to slower responses
+- Difficulty in maintaining constraint focus across large contexts
+
+### Model Alignment Differences
+
+Observed behavior patterns suggest different training objectives:
+
+Claude Sonnet 4: Optimized for helpful, harmless, and honest responses with strong emphasis on following explicit instructions
+
+Gemini 2.5 Pro Preview: Optimized for comprehensive problem-solving with creative enhancement, sometimes at the expense of constraint adherence
+
+## Conclusion
+
+After extensive technical evaluation, Claude Sonnet 4 demonstrates superior reliability for production development workflows requiring precise instruction adherence and predictable behavior. While Gemini 2.5 Pro Preview offers compelling cost advantages and creative capabilities, its tendency toward scope expansion makes it better suited for exploratory rather than production development contexts.
+
+### Recommendation Matrix
+
+Choose Claude Sonnet 4 when:
+
+- Working in production environments with strict requirements
+- Coordinating with teams where predictable behavior is critical
+- Time-to-completion is prioritized over per-request cost
+- Instruction adherence and constraint compliance are essential
+- Code review overhead needs to be minimized
+
+Choose Gemini 2.5 Pro Preview when:
+
+- Conducting exploratory development or research phases
+- Working with large codebases requiring extensive context analysis
+- Direct API costs are the primary budget constraint
+- Creative problem-solving approaches are valued over strict adherence
+- Experienced oversight is available to guide model behavior
+
+### Technical Decision Framework
+
+For enterprise development teams, the 2.8x execution speed advantage and superior instruction adherence of Claude Sonnet 4 typically justify the cost premium through reduced development cycle overhead. The 63% reduction in required user interventions translates to measurable productivity gains in collaborative environments.
+
+Gemini 2.5 Pro Preview's creative capabilities and extensive context window make it valuable for specific use cases, but its tendency toward scope expansion requires careful consideration in production workflows where predictability and constraint adherence are paramount.
+
+The choice ultimately depends on whether your development context prioritizes creative exploration or reliable execution within defined parameters.
+
+## Related Articles
+
+- Claude 4 Initial Impressions: A Developer's Review of Anthropic's AI Coding Breakthrough
+- Grok 4 Initial Impression: AI Coding Assistant for Developers
+- Claude 4 Opus vs Grok 4: AI Model Comparison for Complex Coding Tasks
+- Deepseek R1-0528 Coding Experience: Enhancing AI-Assisted Development
+- AI Agent Best Practices: Maximizing Productivity with ForgeCode
--- a/homelab/raw/articles/forge/blog-coding-agents-showdown.md
+++ b/homelab/raw/articles/forge/blog-coding-agents-showdown.md
@@ -0,0 +1,307 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/coding-agents-showdown/
+scraped: 2026-04-28T19:04:53.676795+00:00
+content_hash: 4664295a
+---
+# Coding Agents Showdown: VSCode Forks vs. IDE Extensions vs. CLI Agents
+
+The AI coding assistant market is splitting into three distinct ways for integrating AI into your development workflow. What started as a race to build "better autocomplete" has evolved into competing visions for how developers will work with AI.
+
+VSCode forks like Cursor are betting developers will switch editors for AI-first environments. IDE extensions focus on tight integration with existing workflows. CLI agents target power users who want AI automation in terminal environments.
+
+Each approach has real strengths and clear limitations. Let me break down what I've learned testing all three.
+
+## The Three AI Integration Approaches
+
+These aren't just different UIs; they reflect different constraints, capabilities, and security models.
+
+VSCode Forks modify the editor's core to integrate AI more deeply, but require developers to switch development environments.
+
+IDE Extensions work within existing plugin frameworks, providing familiar integration but operating under security boundaries.
+
+CLI Agents run as separate processes with user-level system access, enabling powerful automation but requiring different interaction patterns.
+
+These integration differences explain why the market hasn't converged on a single approach.
+
+---
+
+## VSCode Forks: Deep Integration, High Switching Costs
+
+### How They Work
+
+Cursor forked parts of VSCode to rebuild core editor functions around AI workflows. This enables editor-level integrations that are difficult to achieve inside a plugin:
+
+- Direct access to editor internals and file system watchers
+- Custom UI elements integrated into the editor chrome
+- Persistent conversation context across editing sessions
+- Atomic operations across multiple files
+
+Example workflow (simplified):
+
+```
+Request: "Add user authentication to this React app"Cursor's Process:1. Analyzes existing project structure and patterns2. Identifies routing, state management, and component architecture3. Generates multiple components simultaneously:   - AuthProvider context   - Login/logout components   - Protected route wrapper   - API integration logic4. Updates configuration files and dependencies5. Creates tests and documentation
+```
+
+Cursor can do this when it has deeper control over the editor stack.
+
+### The Migration Challenge
+
+A substantial barrier is not technical so much as the switching cost for teams. Migrating from VSCode to Cursor means:
+
+- Rebuilding custom keybindings and workspace configurations
+- Finding alternatives for favorite extensions (many aren't available)
+- Retraining muscle memory and workflows
+- Convincing team members to make the same switch
+
+Microsoft's extension marketplace restrictions create additional friction. Popular tools like GitLens, advanced debuggers, or specialized language servers often require workarounds.
+
+### Where Forks Excel
+
+Large-Scale Refactoring For migrations like React class components to hooks across 50+ files, Cursor's agent mode can handle a broad transformation while maintaining context about prop drilling and state dependencies.
+
+Greenfield AI-First Development Teams starting new projects can benefit from scaffolding entire applications with proper TypeScript types, test configurations, and deployment scripts.
+
+Mobile Development Limitations VSCode forks struggle in mobile development where specialized IDEs dominate. iOS developers rely on Xcode's integrated simulator and Interface Builder; Android developers rely on Android Studio's debugging tools and layout editors. Replicating those platform-specific features in a VSCode fork is impractical in many cases.
+
+---
+
+## IDE Extensions: Familiar Integration, Architectural Constraints
+
+### The Plugin Security Model
+
+IDE extensions operate within strict security boundaries by design. When GitHub Copilot suggests code, it cannot:
+
+- Execute that code automatically
+- Run tests or shell commands
+- Save files without explicit user action
+- Access system-level resources
+
+Extensions communicate through well-defined APIs that allow them to:
+
+- Read workspace files and project structure
+- Suggest text insertions and modifications
+- Display UI panels and contextual information
+- Make HTTP requests (with user permission)
+
+This keeps extensions safe and portable but places clear limits on automation and autonomy.
+
+### The Microsoft Network Effect
+
+Microsoft wasn't just building good AI; it was building it inside the world's most popular editor. Making Copilot feel native to VSCode created strong adoption dynamics.
+
+This keystroke-level integration feels immediate because the AI understands your current context - function signatures, variables in scope, imports, and coding patterns.
+
+### The Orchestration Problem
+
+Extensions encounter limits with complex, multi-step tasks. Adding user authentication typically requires:
+
+1. Writing login components (extension can help)
+2. Updating routing configuration (separate conversation)
+3. Modifying API middleware (separate file, manual context)
+4. Adding database migrations (different tool entirely)
+5. Updating deployment scripts (outside IDE scope)
+
+Each step requires manual coordination. Extensions may lack holistic visibility across multi-repo, cross-file tasks.
+
+### Where Extensions Dominate
+
+Daily Coding Productivity For individual functions, syntax fixes, and boilerplate generation, extensions are especially effective. GitHub reported productivity improvements in their studies;
+
+Learning and Discovery Extensions excel at suggesting correct usage patterns for unfamiliar APIs. The training data includes countless examples of correct implementations.
+
+Universal Editor Support Extensions work across VSCode, JetBrains IDEs, Vim, and other editors. Developers don't need to switch tools. However, most popular extensions remain VSCode-specific, which limits portability.
+
+---
+
+## CLI Agents: System-Level Power, Steeper Learning Curves
+
+### Full System Access Architecture
+
+CLI agents operate as separate processes with the same permissions as the user. Example internal execution (simplified):
+
+```
+$ aider --message "Add JWT auth to Express API"Internal execution:1. git status                       # Check working directory state2. find . -name "*.js" | head -20   # Map project structure3. grep -r "express\|app\|server" . # Understand current setup4. Read package.json, main files    # Build context5. Generate implementation plan     # Show user before proceeding6. Edit multiple files simultaneously7. npm install jsonwebtoken bcrypt           # Install dependencies8. npm test                                  # Verify changes work9. git add . && git commit -m "Add JWT auth" # Commit atomically
+```
+
+Some CLI agents are not sandboxed and can execute shell commands with the same permissions as the user; behavior varies by tool and configuration.
+
+### Cross-Repository Coordination
+
+CLI agents can work across multiple repositories simultaneously, which other approaches cannot easily replicate.
+
+Microservices Example:
+
+```
+$ forge -p "Add user preferences across frontend, backend, and shared-types repos"Execution across three repositories:1. shared-types/: Create TypeScript interfaces2. backend/: Implement API endpoints and database schema3. frontend/: Build UI components consuming the API4. Run tests in each repository5. Update documentation across all three6. Create coordinated pull requests(  In an informal run, this flow completed in about 15 minutes  actual times vary by repo size and CI setup.)
+```
+
+### Parallel Execution Capabilities
+
+Some CLI agents can spawn multiple instances for complex tasks:
+
+```
+$ claude "Optimize application performance"Parallel agent spawning:- Agent A: Frontend bundle analysis and code splitting- Agent B: Backend API profiling and database optimization- Agent C: CI/CD pipeline parallelization- Agent D: Dependency audit and cleanupAgents coordinate through git commits and shared context when configured to do so.
+```
+
+### Production Environment Integration
+
+CLI agents work in environments where GUI applications aren't practical:
+
+```
+# Production container debugging$ docker exec -it api-server /bin/bash$ forge -p "Memory usage growing, investigate and fix"# Remote server troubleshooting$ ssh production-server$ forge -p "Deployment failing at step 3, debug and resolve"# CI/CD automation$ # In GitHub Actions workflow$ forge -p "Check security vulnerabilities in pull request"
+```
+
+### The Learning Investment
+
+CLI agents require significant terminal comfort. Typical adoption curve:
+
+- Week 1-2: Frustration with command-line interfaces and missing GUI conveniences
+- Month 1: Starting to see power but still preferring extensions for quick edits
+- Month 2-3: Developing hybrid workflows - CLI for complex tasks, extensions for immediate feedback
+- Month 3+: Building custom automations and preferring CLI for most development tasks
+
+The learning curve is steep, but capabilities compound over time.
+
+### Security and Trust Considerations
+
+CLI agents' system access is both a strength and a risk:
+
+Potential Issues:
+
+- Accidental deletion of files or directories
+- Unintended execution of dangerous commands
+- Security vulnerabilities if an agent is compromised
+- Need for careful prompt engineering to avoid mistakes
+
+Mitigation Strategies:
+
+- Review changes before applying
+- Use git for atomic commits and easy rollbacks
+- Run agents in containerized or sandboxed environments for critical work
+- Implement approval workflows for destructive operations
+
+---
+
+## Market Forces and Adoption Patterns
+
+### Enterprise Integration Demands
+
+Large organizations want AI in their automation pipelines, not just in individual developer editors. CLI agents fit naturally into:
+
+- CI/CD systems (Jenkins, GitHub Actions, GitLab CI)
+- Code review automation
+- Incident response workflows
+- Infrastructure management
+
+Extensions cannot run in headless environments, which limits their enterprise automation potential.
+
+### Multi-Repository Development Reality
+
+Modern software increasingly spans multiple repositories:
+
+- Microservices architectures
+- Frontend/backend/mobile app coordination
+- Shared libraries and tooling
+- Infrastructure as code
+
+CLI agents can coordinate changes across these boundaries more naturally than editor-bound tools.
+
+### Cloud-Native Development Trends
+
+As development moves to cloud environments, containers, and remote codespaces, CLI tools become more practical than GUI applications. A CLI agent works identically whether you're on a laptop or in a Kubernetes pod.
+
+---
+
+## Technical Integration Comparison
+
+### Memory and Context Management
+
+IDE Extensions:
+
+- Context: Workspace files and project structure
+- Memory: Managed by IDE process, shared with editor
+- Limitations: Single project scope, limited cross-repository awareness
+
+VSCode Forks:
+
+- Context: Full project when loaded, deep editor integration
+- Memory: Shared with editor process, risk of bloat with large projects
+- Limitations: Still primarily single-project focused
+
+CLI Agents:
+
+- Context: Dynamically loaded based on task, can span multiple repositories
+- Memory: Separate process space, can be optimized per task
+- Limitations: Requires explicit context loading for each session
+
+### Execution Capabilities
+
+| Capability | IDE Extensions | VSCode Forks | CLI Agents |
+|---|---|---|---|
+| File modification | ✅ (with approval) | ✅ | ✅ |
+| Shell command execution | Limited | Limited | ✅ |
+| Multi-repository coordination | ❌ | ❌ | ✅ |
+| CI/CD integration | ❌ | ❌ | ✅ |
+| System-level operations | ❌ | ❌ | ✅ |
+| Real-time suggestions | ✅ | ✅ | ❌ |
+| GUI integration | ✅ | ✅ | ❌ |
+
+---
+
+## When to Choose Each Approach
+
+### Choose IDE Extensions When:
+
+- You're happy with your current editor setup
+- You primarily work within single repositories
+- You want real-time coding assistance and autocomplete
+- You prefer familiar, low-friction integration
+- You're working in teams with diverse tooling preferences
+
+### Choose VSCode Forks When:
+
+- You're starting new projects or can coordinate team migration
+- You want deeply integrated editor automation
+- You can invest time in rebuilding your development environment
+- You want earlier access to advanced AI features before they reach extensions
+
+### Choose CLI Agents When:
+
+- You're comfortable with terminal-based workflows
+- You frequently work across multiple repositories
+- You need AI in CI/CD pipelines or automation
+- You work in production/remote/containerized environments
+- You want more extensive system access and flexibility
+- You're willing to invest in learning new interaction patterns
+
+---
+
+## The Future: Likely Convergence
+
+The current fragmentation may be temporary. We are probably heading toward convergence where:
+
+Editors become lighter clients focused on UI, syntax highlighting, and immediate feedback AI agents become separate services that editors communicate with via standardized protocols Terminal integration becomes standard for complex, multi-step development tasks
+
+Evidence:
+
+- Cursor and Augment adding CLI modes alongside their editor and extension offerings
+- Microsoft exploring agent architectures for Copilot
+- New protocols enabling agent interoperability (MCP, A2A)
+
+---
+
+## What This Means for You
+
+This isn't about which tool is "best"; it's about picking what works for your specific workflow and constraints.
+
+IDE Extensions are proven for daily coding productivity with minimal disruption.
+
+VSCode Forks offer deeper editor-level automation but require significant switching costs.
+
+CLI Agents provide greater system integration and flexibility but demand investment in new interaction patterns.
+
+The market is splitting because different developers have different needs. A mobile developer, a DevOps engineer, and a frontend developer working in a large team all have different optimal choices.
+
+Where we're probably heading: Your favorite editor (VSCode, Vim, IntelliJ) plus a powerful CLI agent for complex tasks. The agent handles orchestration while the editor handles immediate interaction. Don't expect one approach to dominate - it's which combination of approaches will become the standard toolkit for AI-assisted development.
--- a/homelab/raw/articles/forge/blog-deepseek-r1-0528-coding-experience-review.md
+++ b/homelab/raw/articles/forge/blog-deepseek-r1-0528-coding-experience-review.md
@@ -0,0 +1,157 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/deepseek-r1-0528-coding-experience-review/
+scraped: 2026-04-28T19:05:10.687166+00:00
+content_hash: cd729071
+---
+# DeepSeek-R1-0528: A Detailed Review of its AI Coding Performance & Latency
+
+![Cover Image for DeepSeek-R1-0528: A Detailed Review of its AI Coding Performance & Latency](https://forgecode.dev/images/blog/deepseek-r1-0528-cover.svg)
+
+## TL;DR
+
+- DeepSeek-R1-0528: Latest open source reasoning model with MIT license
+- Major breakthrough: Significantly improved performance over previous version (87.5% vs 70% on AIME 2025)
+- Architecture: 671B total parameters, ~37B active per token via Mixture-of-Experts
+- Major limitation: 15-30s latency via OpenRouter API vs ~1s for other models
+- Best for: Complex reasoning, architectural planning, vendor independence
+- Poor for: Real-time coding, rapid iteration, interactive development
+- Bottom line: Impressive reasoning capabilities, but latency challenges practical use
+
+## The Promise vs. My 8-Hour Reality Check
+
+> From @deepseek_ai: DeepSeek-R1-0528 is now available! This latest reasoning model shows substantial improvements across benchmarks while maintaining MIT licensing for complete open-source access.
+> Source: https://x.com/deepseek_ai/status/1928061589107900779
+
+My response: Hold my coffee while I test this "breakthrough"...
+
+SPOILER: It's brilliant... if you can wait 30 seconds for every response. And it keeps increasing as your context grows
+
+I was 47 minutes into debugging a Rust async runtime when DeepSeek-R1-0528 (via my favorite coding agent) finally responded with the perfect solution. By then, I'd already fixed the bug myself, grabbed coffee, and started questioning my life choices.
+
+Here's what 8 hours of testing taught me about the latest "open source breakthrough."
+
+## Reality Check: Hype vs. My Actual Experience
+
+DeepSeek's announcement promises groundbreaking performance with practical accessibility. After intensive testing, here's how those claims stack up:
+
+| DeepSeek's Claim | My Reality | Verdict |
+|---|---|---|
+| "Matches GPT/Claude performance" | Often exceeds it on reasoning | TRUE |
+| "MIT licensed open source" | Completely open, no restrictions | TRUE |
+| "Substantial improvements" | Major benchmark gains confirmed | TRUE |
+
+The breakthrough is real. The daily usability is... challenging.
+
+Before diving into why those response times matter so much, let's understand what makes this model technically impressive enough that I kept coming back despite the frustration.
+
+## The Tech Behind the Magic (And Why It's So Slow)
+
+### Key Architecture Stats
+
+- 671B total parameters (685B with extras)
+- ~37B active per token via Mixture-of-Experts routing
+- 128K context window
+- MIT license (completely open source)
+- Cost: $0.50 input / $2.18 output per 1M tokens
+
+### Why the Innovation Matters
+
+R1-0528 achieves GPT-4 level reasoning at ~5.5% parameter activation cost through:
+
+1. Reinforcement Learning Training: Pure RL without supervised fine-tuning initially
+2. Chain-of-Thought Architecture: Multi-step reasoning for every response
+3. Expert Routing: Different specialists activate for different coding patterns
+
+### Why It's Painfully Slow
+
+Every response requires:
+
+- Thinking tokens: Internal reasoning in <think>...</think> blocks (hundreds-thousands of tokens)
+- Expert selection: Dynamic routing across 671B parameters
+- Multi-step verification: Problem analysis → solution → verification
+
+When R1-0528 generates a 2000-token reasoning trace for a 100-token answer, you pay computational cost for all 2100 tokens.
+
+## The Benchmarks Don't Lie (But They Don't Code Either)
+
+The performance improvements are legitimate:
+
+### Key Wins
+
+| Benchmark | Previous | R1-0528 | Improvement |
+|---|---|---|---|
+| AIME 2025 | 70.0% | 87.5% | +17.5% |
+| Coding (LiveCodeBench) | 63.5% | 73.3% | +9.8% |
+| Codeforces Rating | 1530 | 1930 | +400 points |
+| SWE Verified (Resolved) | 49.2% | 57.6% | Notable progress |
+| Aider-Polyglot | 53.3% | 71.6% | Major improvement |
+
+But here's the thing: Benchmarks run with infinite patience. Real development doesn't.
+
+### The Latency Reality
+
+| Model Type | Response Time | Developer Experience |
+|---|---|---|
+| Claude/GPT-4 | 0.8-1.0s | Smooth iteration |
+| DeepSeek-R1-0528 | 15-30s | Productivity killer |
+
+## When R1-0528 Actually Shines
+
+Despite my latency complaints, there are genuine scenarios where waiting pays off:
+
+### Perfect Use Cases
+
+- Large codebase analysis (20,000+ lines) - leverages 128K context beautifully
+- Architectural planning - deep reasoning justifies wait time
+- Precise instruction following - delivers exactly what you ask for
+- Vendor independence - MIT license enables self-hosting
+
+### Frustrating Use Cases
+
+- Real-time debugging - by the time it responds, you've fixed it
+- Rapid prototyping - kills the iterative flow
+- Learning/exploration - waiting breaks the learning momentum
+
+### Reasoning Transparency
+
+The "thinking" process is genuinely impressive:
+
+1. Problem analysis and approach planning
+2. Edge case consideration
+3. Solution verification
+4. Output polishing
+
+Different experts activate for different patterns (API design vs systems programming vs unsafe code).
+
+## My Honest Take: Historic Achievement, Practical Challenges
+
+### The Historic Achievement
+
+- First truly competitive open reasoning model
+- MIT license = complete vendor independence
+- Proves open source can match closed systems
+
+### The Daily Reality
+
+Remember that 47-minute debugging session? It perfectly captures the R1-0528 experience: technically brilliant, practically challenging.
+
+The question isn't whether R1-0528 is impressive - it absolutely is.
+
+The question is whether you can build your workflow around waiting for genius to arrive.
+
+## Community Discussion
+
+Drop your experiences below:
+
+- Have you tested R1-0528 for coding? What's your patience threshold?
+- Found ways to work around the latency?
+
+## The Bottom Line
+
+DeepSeek's announcement wasn't wrong about capabilities - the benchmark improvements are real, reasoning quality is impressive, and the MIT license is genuinely game-changing.
+
+For architectural planning where you can afford to wait? Absolutely worth it.
+
+For rapid iteration? Not quite there yet.
--- a/homelab/raw/articles/forge/blog-forge-incident-12-july-2025-rca-2.md
+++ b/homelab/raw/articles/forge/blog-forge-incident-12-july-2025-rca-2.md
@@ -0,0 +1,57 @@
+---
+type: agent-doc
+agent: ForgeCode
+source: https://forgecode.dev/blog/forge-incident-12-july-2025-rca-2/
+scraped: 2026-04-28T19:04:46.110139+00:00
+content_hash: 171aad9b
+---
+# ForgeCode Performance RCA: Root Cause Analysis of Quality Degradation on July 12, 2025
+
+## What Happened
+
+On July 12, 2025, we released v0.99.0, which included PR #1068 introducing aggressive conversation compaction to reduce LLM costs. While successful at cutting costs by 40-50%, it significantly degraded response quality by removing crucial conversation context.
+
+Users reported quality issues within 2 days. After internal testing confirmed the problem, we immediately released v0.100.0 on July 14 with the compaction feature reverted.
+
+## Root Cause
+
+Our evaluation system only tested single prompts, missing multi-turn conversation quality.
+
+The compaction feature triggered after every user message (on_turn_end: true), stripping context that our models needed for quality responses. In multi-turn scenarios (where users provide additional feedback after the agent completes work), the conversation context was getting compacted away, leading to poor quality responses.
+
+Our evals never caught this because they focused on single prompts and judged the results of the agent loop, not ongoing conversations where users give feedback in the same conversation and context accumulation is critical.
+
+## Why We Did This
+
+Higher than expected early access signups created cost pressure. Rather than implementing waitlists, we chose aggressive optimization to keep the service open to all users. The feature worked perfectly for its intended purpose, just at the cost of quality we didn't anticipate.
+
+## What We've Done
+
+- Immediate: Reverted the feature in v0.100.0 (2 days after user reports)
+- Long-term: Building multi-turn evaluation system to catch these issues before deployment
+
+## What We're Changing
+
+1. Multi-turn evals - Testing conversation quality across 3-5 message exchanges, not just single responses
+2. Quality gates - Conversation quality scores must pass thresholds before any context affecting feature ships
+3. Gradual rollouts - Canary releases for any feature touching core conversation logic
+
+## Known Issues
+
+- Bash terminal still has issues on windows, but we are working on it.
+
+## Our Ask
+
+We messed up by prioritizing cost optimization over quality validation. The latest ForgeCode version (v0.100.5) has the issue fixed plus significant stability improvements.
+
+Please give ForgeCode another shot. We've learned our lesson about shipping features that affect conversation quality without proper testing coverage.
+
+---
+
+Questions? Reach out through our community channels. We're committed to transparency about what went wrong and how we're fixing it.
+
+## Related Articles
+
+- ForgeCode v0.98.0 Release Article: Major Performance and Feature Updates
+- AI Agent Best Practices: Maximizing Productivity with ForgeCode
+- MCP Security Prevention: Practical Strategies for AI Development - Part 2
--- a/Show More
+++ b/Show More