Reasoning without a model in the loop.
HD-native endpoints that answer knowledge-graph queries, analogies, causal interventions, and plans in single-digit milliseconds -- deterministically, over HTTP, for 600x to 2900x less than calling a reasoning LLM.
Four operations. One substrate. No prompts.
Every endpoint is a function on bound HD vectors at D=8192 -- bind, unbind, bundle, cosine cleanup. The codebook is deterministic in a seed. The math is integer multiply, sign, and dot product. There is no model in the call path.
POST /v1/hd/kg/people/facts
{"facts":[
{"subject":"alice","relation":"lives_in","object":"toronto"},
{"subject":"alice","relation":"works_at","object":"acme"}
]}
POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}POST /v1/hd/causal/scm1/query
{"query_type":"observation",
"condition_var":1, "condition_value":1,
"query_var":2, "query_value":1}
POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}POST /v1/hd/plan
{"V":100, "n_actions":8,
"init_state":[0,1,2],
"goal_attrs":[10,11,12],
"depth":4, "n_candidates":20, "seed":6001}
-> {"best_plan":[6,2,6,0], "kl_divergence":33.5}Same workload. Same questions. Different physics.
We posed identical synthetic-ground-truth workloads to the substrate and to deepseek-reasoner (DeepSeek-R1) over the same HTTP timing harness. Reasoning LLMs are not built for this shape of work.
| Family | Substrate acc | R1 acc | Substrate p50 | R1 p50 | Cost ratio |
|---|---|---|---|---|---|
| kg | 100% | 100% | 51 ms | 1,349 ms | 697x |
| analogy | 100% | 100% | 51 ms | 3,358 ms | 549x |
| causal | 100% | 85% | 90 ms | 27,914 ms | 2,904x |
| plan | 100% | 30% | 160 ms | 60,640 ms | 1,612x |
Methodology: n=20 queries per family, identical synthetic ground truth, single-tenant Cloud Run replica, DeepSeek pricing $0.55/1M in + $2.19/1M out (incl. reasoning tokens). Full probe + raw artifact at probes/probe_bench_substrate_vs_deepseek.py.
The reasoning-on-top-of-LLM stack is the wrong default.
Reasoning calls cost more than your model bill.
todayEvery agent decision -- 'should I take action A given world X?' -- runs through an LLM. Reasoning tokens stack. Bills compound.
with neruvaSubstrate answers from algebra. Pricing is $1-5 per million ops, not per million tokens.
LLM reasoning is non-deterministic by default.
todayThe same prompt at temperature=0 still drifts across deployments, model versions, and reasoning trace depth. You can't reproduce a decision a week later.
with neruvaBit-identical reruns. Audit a year later, get the same answer. The codebook is deterministic in a seed you control.
Causal vs observational is hand-wavy in chat.
todayAsk an LLM 'what happens if I force X=1?' vs 'given that X=1, what happens?' The answer is whatever the trace decides. There is no principled distinction.
with neruvaTwo different endpoints. P(Y|X=x) and P(Y|do(X=x)) are arithmetically distinct queries on the same logged worlds.
Plan generation eats minutes per call.
todayAsking a reasoning model for a 4-step plan over a 100-attribute world is 60-second p50 wall time and 30% accuracy on our bench.
with neruvaEFE planner returns 4-step plans in 160ms p50, 100% accuracy, deterministic. State is a sparse attribute set, not a paragraph.
You can't put a model in a hot loop.
today500ms-30s round-trip latency rules out using the LLM as the inner loop of anything real-time -- robotics, games, live tools.
with neruvaSub-100ms p50 for KG and analogy queries. The substrate fits in the inner loop.
Reasoning traces are not the same as decisions.
todayLLM 'reasoning' tokens look like thought but produce no formal artifact. There's no proof object. Nothing to inspect post-hoc.
with neruvaEvery response carries a confidence number. KL-divergence for plans, cosine ambiguity for analogy, marginal probability for causal.
Where teams reach for the substrate.
Live agent decisioning at scale
scenarioCustomer-service agents that need to recall 'this user prefers refunds over store credit; their last 3 interactions were about returns' before each LLM call.
with substrateKG of (user, preference, value). Query per turn. 51ms p50. Zero LLM tokens to keep the agent grounded.
Counterfactual safety in agentic systems
scenarioBefore executing an action, the agent needs to know: 'if I push this commit, what's the historical conditional probability of a rollback?'
with substrateBuild an SCM over (action, context, outcome) logged worlds. Query observation vs intervention at decision time. 90ms p50.
Robotics and game-loop planning
scenarioA discrete-action planner that runs inside a 16ms frame budget -- no chance to call a reasoning LLM, but classical search blows up exponentially.
with substrateEFE plan endpoint. 4-step plan over a discrete factored state in 160ms p50. Fits the inner loop with budget to spare.
Knowledge-graph-grounded chatbots
scenarioChatbot needs to answer 'what does Alice know about Bob?' from a freshly-ingested CRM dump, then write the answer in natural language via the existing LLM.
with substrateSubstrate handles the (subject, relation) recall in 51ms; LLM only formats the answer. 100x cheaper than asking the LLM to recall and format.
Concept-drift-resistant retrieval
scenarioEmbedding-based retrieval breaks when the model version changes. You re-embed everything and ranks shift in production.
with substrateHD codebook is deterministic in a seed. Re-encode tomorrow with the same seed; vectors are bit-identical. Migration is free.
Auditable AI for regulated industries
scenarioFinance, healthcare, legal -- need to prove a decision came from a specific reasoning path, reproducible months later for an audit.
with substrateEvery substrate response is reproducible byte-for-byte from input + seed. Determinism is a compliance feature, not a footnote.
Built from the ground up for the agent loop.
Existing agent stacks bolted reasoning onto a chat completion API. We built the reasoning surface natively, on a substrate where the bind operator and the do-operator are the same primitive.
| Capability | Old stack (LLM + tools) | Neruva substrate |
|---|---|---|
| Reasoning unit | Reasoning-token trace through a transformer | HD vector algebra (bind, unbind, cleanup) |
| Latency | 1.3s -- 60s p50 per decision | 51 -- 160ms p50 |
| Cost | $2.19 / 1M output tokens (incl. reasoning) | $1 -- $5 / 1M ops |
| Determinism | Drifts across deployments and model versions | Bit-identical reruns forever |
| Causal vs observational | Hand-wavy, prompt-dependent | Two distinct endpoints, arithmetically separated |
| Auditability | Reasoning trace, no formal artifact | Confidence number per call, reproducible |
| Hot-loop friendly | No -- too slow for game/robotics inner loops | Yes -- fits a 16ms frame budget |
| Hardware | GPU per request; cold-start tax on autoscale | CPU-only; warm-start is microseconds |
Stop paying for tokens to think.
Pay for the answer.
$5 in credits on signup. Substrate ops are $1-5 per million. That's roughly five million KG queries before you spend a dollar.