Projects — Jesse Anderson

codex-local

What 9B can actually do.

The reason I’m bullish on local models: codex-local — my fork of the Codex CLI, tuned to babysit a local model through a structured task list — drives multi-turn, multi-file, multi-task coding sessions to completion using just Qwen3.5-9B-IQ4_XS.

9Bparameters (Qwen3.5 IQ4_XS)
8+turns of sustained coherence
4+files edited per session
FullTODO list completion end-to-end

The trick isn’t the model — it’s the scaffolding. Specifically:

Docs-first. Every repo has an AGENTS.md and a TASK_STATE.md. The model reads them on every turn. Persistent intent survives compaction.
State in files, not in-context. Long runs survive crashes, context exhaustion, model swaps, OAuth expirations. The agent rehydrates from disk.
Explicit task tracking. TODO list in a file, model crosses items off as it goes. No “where was I” questions.
Verify before declare-done. No green-by-inference. Run the test, read the output, look at the deployed version.

Most local-model coding agents stall past trivial single-edit tasks. With those four disciplines, 9B handles real work. The frontier model is no longer the moat — the scaffolding is.

The fork itself adjusted compaction behavior, model-routing interception, and exec-log inspection so a long unattended run could survive overnight without losing the thread.

The rig

Dual-GPU local-first AI infra.

GPUs. GTX 1080 (8GB) + RTX 3080. Two hosts, services routed by endpoint.
Runtime. Ollama, tuned: OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0. Roughly 2× context fit at the same VRAM for the cost of a few percent of output quality.
Model picks against VRAM budgets:
- Qwen3 4B for the chat gate — small footprint, fast first-token latency.
- Qwen3.5-9B-IQ4_XS for the worker — sweet spot for code on the 1080.
- qwen3.5:35b-a3b for harder reasoning when the 3080 has headroom.
Topology. Chat gate on one host, Qwen service on another. Cross-host routing via explicit endpoint config — no service mesh, no magic.

OpenClaw bridges

3D scenes by voice.

Built so my third son could direct his Roblox / Blender / Daz scenes through natural language — “make Torus.1 50% thinner on the Z-axis” — and watch the live environment update. AI helper bridge through OpenClaw, with the language layer driving live scene state. Sits at the intersection of three threads: my AI work, my long-running 3DCG interest, and a kid who’s already designing in Blender, Roblox Studio, and Suno.

coding-agent-router

Multi-agent CLI routing.

Docs-first multi-agent CLI router. Routes prompts across specialized workers with shared context, built on the assumption that no single model is the right choice for every step in a long task. Pairs with codex-local for the local side and with cloud providers for harder work.

K.O.R.A.

Same scaffolding, in production.

Everything above runs in production at Kora Labs as K.O.R.A. — the autonomous Discord ops agent. Multi-stage pipeline, multi-model routing with graceful fallback, three-tier test battery, set-me-loose-all-night execution mode. The personal tools and the production agent share the same operating doctrine. See /work for the platform context.

Operating doctrine

How I actually work with AI.

“Mitigations, fallbacks, band-aids are the worst. Find the upstream fix.” The doctrine. Encoded in personal AGENTS.md, enforced across every repo.
Docs-first. PRD → spec → architecture → AGENTS.md in every repo. Docs aren’t notes; they’re load-bearing artifacts the agent reads on every turn.
No green-by-inference. Verify before declaring done. Check the deployed version, inspect the running process, read the Discord channel.
Persistent state in files. TODO + TASK_STATE patterns survive crashes and compactions. Agents revive from disk.
AI as real infrastructure, not autocomplete. Instrument it, test it in tiers (hermetic → real-process → live-canary), surface its silent failures (codex OAuth heartbeat) the same way you’d surface a dead database.
Long autonomous runs are normal. “Set me loose all night” is a working mode, not a stunt.
Platform-engineering reflexes are non-negotiable. Redundancy, failover, quick recovery, auto-scaling, reproducible deployment — with minimal overhead. Higher dollar cost beats higher human-operations cost.