/v1/hd/*·the substrate that thinks

Reasoning without a model in the loop.

HD-native endpoints that answer knowledge-graph queries, analogies, causal interventions, and plans in single-digit milliseconds -- deterministically, over HTTP, for 600x to 2900x less than calling a reasoning LLM.

Accuracy
100%
across knowledge graph, analogy, causal, and plan -- on synthetic ground truth at n=100. DeepSeek-R1 on the same workload: 30% on plan, 85% on causal.
Latency
51–160ms p50
end-to-end from Cloud Run to client per substrate call. DeepSeek-R1 on the same prompts: 1.3s to 60s p50.
Cost
600–2900x
cheaper than the reasoning LLM path. Pricing per million ops, not per million tokens-plus-reasoning-tokens.
Determinism
5/5
bit-identical reruns of the same query. No temperature, no sampling, no stochasticity. Audit it and you can prove it.
The surface

Four operations. One substrate. No prompts.

Every endpoint is a function on bound HD vectors at D=8192 -- bind, unbind, bundle, cosine cleanup. The codebook is deterministic in a seed. The math is integer multiply, sign, and dot product. There is no model in the call path.

/v1/hd/kg/*
Knowledge graphs
Bind (subject, relation, object) triples into one compact vector per graph. Query by (subject, relation) -- unbind returns the answer with calibrated confidence. Sharded for capacity; thousands of facts per graph.
POST /v1/hd/kg/people/facts
{"facts":[
  {"subject":"alice","relation":"lives_in","object":"toronto"},
  {"subject":"alice","relation":"works_at","object":"acme"}
]}

POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}
/v1/hd/analogy
Analogy by algebra
A:B::C:? Parallelogram completion over factored items. The substrate finds D by XOR-style algebra, not by prompting a model. Returns candidate, cosine, and an ambiguity gap you can threshold on.
POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}
/v1/hd/causal/*
Pearl's do-operator
Distinguish observation from intervention. P(Y|X=1) is what you observed; P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector DB exposes this.
POST /v1/hd/causal/scm1/query
{"query_type":"observation",
 "condition_var":1, "condition_value":1,
 "query_var":2,     "query_value":1}

POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}
/v1/hd/plan
Active-Inference planning
Expected Free Energy planner over discrete actions. Give it (state, goal, action space, depth). Returns the optimal action sequence plus a KL-to-goal confidence. No prompt, no model, no temperature.
POST /v1/hd/plan
{"V":100, "n_actions":8,
 "init_state":[0,1,2],
 "goal_attrs":[10,11,12],
 "depth":4, "n_candidates":20, "seed":6001}
-> {"best_plan":[6,2,6,0], "kl_divergence":33.5}
Head-to-head vs DeepSeek-R1

Same workload. Same questions. Different physics.

We posed identical synthetic-ground-truth workloads to the substrate and to deepseek-reasoner (DeepSeek-R1) over the same HTTP timing harness. Reasoning LLMs are not built for this shape of work.

FamilySubstrate accR1 accSubstrate p50R1 p50Cost ratio
kg100%100%51 ms1,349 ms697x
analogy100%100%51 ms3,358 ms549x
causal100%85%90 ms27,914 ms2,904x
plan100%30%160 ms60,640 ms1,612x

Methodology: n=20 queries per family, identical synthetic ground truth, single-tenant Cloud Run replica, DeepSeek pricing $0.55/1M in + $2.19/1M out (incl. reasoning tokens). Full probe + raw artifact at probes/probe_bench_substrate_vs_deepseek.py.

What we solve

The reasoning-on-top-of-LLM stack is the wrong default.

Reasoning calls cost more than your model bill.

todayEvery agent decision -- 'should I take action A given world X?' -- runs through an LLM. Reasoning tokens stack. Bills compound.

with neruvaSubstrate answers from algebra. Pricing is $1-5 per million ops, not per million tokens.

LLM reasoning is non-deterministic by default.

todayThe same prompt at temperature=0 still drifts across deployments, model versions, and reasoning trace depth. You can't reproduce a decision a week later.

with neruvaBit-identical reruns. Audit a year later, get the same answer. The codebook is deterministic in a seed you control.

Causal vs observational is hand-wavy in chat.

todayAsk an LLM 'what happens if I force X=1?' vs 'given that X=1, what happens?' The answer is whatever the trace decides. There is no principled distinction.

with neruvaTwo different endpoints. P(Y|X=x) and P(Y|do(X=x)) are arithmetically distinct queries on the same logged worlds.

Plan generation eats minutes per call.

todayAsking a reasoning model for a 4-step plan over a 100-attribute world is 60-second p50 wall time and 30% accuracy on our bench.

with neruvaEFE planner returns 4-step plans in 160ms p50, 100% accuracy, deterministic. State is a sparse attribute set, not a paragraph.

You can't put a model in a hot loop.

today500ms-30s round-trip latency rules out using the LLM as the inner loop of anything real-time -- robotics, games, live tools.

with neruvaSub-100ms p50 for KG and analogy queries. The substrate fits in the inner loop.

Reasoning traces are not the same as decisions.

todayLLM 'reasoning' tokens look like thought but produce no formal artifact. There's no proof object. Nothing to inspect post-hoc.

with neruvaEvery response carries a confidence number. KL-divergence for plans, cosine ambiguity for analogy, marginal probability for causal.

Real-world use cases

Where teams reach for the substrate.

Live agent decisioning at scale

scenarioCustomer-service agents that need to recall 'this user prefers refunds over store credit; their last 3 interactions were about returns' before each LLM call.

with substrateKG of (user, preference, value). Query per turn. 51ms p50. Zero LLM tokens to keep the agent grounded.

Counterfactual safety in agentic systems

scenarioBefore executing an action, the agent needs to know: 'if I push this commit, what's the historical conditional probability of a rollback?'

with substrateBuild an SCM over (action, context, outcome) logged worlds. Query observation vs intervention at decision time. 90ms p50.

Robotics and game-loop planning

scenarioA discrete-action planner that runs inside a 16ms frame budget -- no chance to call a reasoning LLM, but classical search blows up exponentially.

with substrateEFE plan endpoint. 4-step plan over a discrete factored state in 160ms p50. Fits the inner loop with budget to spare.

Knowledge-graph-grounded chatbots

scenarioChatbot needs to answer 'what does Alice know about Bob?' from a freshly-ingested CRM dump, then write the answer in natural language via the existing LLM.

with substrateSubstrate handles the (subject, relation) recall in 51ms; LLM only formats the answer. 100x cheaper than asking the LLM to recall and format.

Concept-drift-resistant retrieval

scenarioEmbedding-based retrieval breaks when the model version changes. You re-embed everything and ranks shift in production.

with substrateHD codebook is deterministic in a seed. Re-encode tomorrow with the same seed; vectors are bit-identical. Migration is free.

Auditable AI for regulated industries

scenarioFinance, healthcare, legal -- need to prove a decision came from a specific reasoning path, reproducible months later for an audit.

with substrateEvery substrate response is reproducible byte-for-byte from input + seed. Determinism is a compliance feature, not a footnote.

Old vs new agent infra

Built from the ground up for the agent loop.

Existing agent stacks bolted reasoning onto a chat completion API. We built the reasoning surface natively, on a substrate where the bind operator and the do-operator are the same primitive.

CapabilityOld stack (LLM + tools)Neruva substrate
Reasoning unitReasoning-token trace through a transformerHD vector algebra (bind, unbind, cleanup)
Latency1.3s -- 60s p50 per decision51 -- 160ms p50
Cost$2.19 / 1M output tokens (incl. reasoning)$1 -- $5 / 1M ops
DeterminismDrifts across deployments and model versionsBit-identical reruns forever
Causal vs observationalHand-wavy, prompt-dependentTwo distinct endpoints, arithmetically separated
AuditabilityReasoning trace, no formal artifactConfidence number per call, reproducible
Hot-loop friendlyNo -- too slow for game/robotics inner loopsYes -- fits a 16ms frame budget
HardwareGPU per request; cold-start tax on autoscaleCPU-only; warm-start is microseconds

Stop paying for tokens to think.
Pay for the answer.

$5 in credits on signup. Substrate ops are $1-5 per million. That's roughly five million KG queries before you spend a dollar.