/v1/hd/*·the substrate that thinks

Reasoning without a model in the loop.

HD-native endpoints that answer knowledge-graph queries, analogies, causal interventions, and plans in single-digit milliseconds -- deterministically, over HTTP, for 600x to 2900x less than calling a reasoning LLM.

Start free -- $5 in credits Read the HD spec

Accuracy

100%

across knowledge graph, analogy, causal, and plan -- on synthetic ground truth at n=100. DeepSeek-R1 on the same workload: 30% on plan, 85% on causal.

Latency

51–160ms p50

end-to-end from Cloud Run to client per substrate call. DeepSeek-R1 on the same prompts: 1.3s to 60s p50.

Cost

600–2900x

cheaper than the reasoning LLM path. Pricing per million ops, not per million tokens-plus-reasoning-tokens.

Determinism

5/5

bit-identical reruns of the same query. No temperature, no sampling, no stochasticity. Audit it and you can prove it.

The surface

Four operations. One substrate. No prompts.

Every endpoint is a function on bound HD vectors at D=8192 -- bind, unbind, bundle, cosine cleanup. The codebook is deterministic in a seed. The math is integer multiply, sign, and dot product. There is no model in the call path.

/v1/hd/kg/*

Knowledge graphs

Bind (subject, relation, object) triples into one compact vector per graph. Query by (subject, relation) -- unbind returns the answer with calibrated confidence. Sharded for capacity; thousands of facts per graph.

POST /v1/hd/kg/people/facts
{"facts":[
  {"subject":"alice","relation":"lives_in","object":"toronto"},
  {"subject":"alice","relation":"works_at","object":"acme"}
]}

POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}

/v1/hd/analogy

Analogy by algebra

A:B::C:? Parallelogram completion over factored items. The substrate finds D by XOR-style algebra, not by prompting a model. Returns candidate, cosine, and an ambiguity gap you can threshold on.

POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}

/v1/hd/causal/*

Pearl's do-operator

Distinguish observation from intervention. P(Y|X=1) is what you observed; P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector DB exposes this.

POST /v1/hd/causal/scm1/query
{"query_type":"observation",
 "condition_var":1, "condition_value":1,
 "query_var":2,     "query_value":1}

POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}

/v1/hd/plan

Active-Inference planning

Expected Free Energy planner over discrete actions. Give it (state, goal, action space, depth). Returns the optimal action sequence plus a KL-to-goal confidence. No prompt, no model, no temperature.

POST /v1/hd/plan
{"V":100, "n_actions":8,
 "init_state":[0,1,2],
 "goal_attrs":[10,11,12],
 "depth":4, "n_candidates":20, "seed":6001}
-> {"best_plan":[6,2,6,0], "kl_divergence":33.5}

Head-to-head vs DeepSeek-R1

Same workload. Same questions. Different physics.

We posed identical synthetic-ground-truth workloads to the substrate and to deepseek-reasoner (DeepSeek-R1) over the same HTTP timing harness. Reasoning LLMs are not built for this shape of work.

Family	Substrate acc	R1 acc	Substrate p50	R1 p50	Cost ratio
kg	100%	100%	51 ms	1,349 ms	697x
analogy	100%	100%	51 ms	3,358 ms	549x
causal	100%	85%	90 ms	27,914 ms	2,904x
plan	100%	30%	160 ms	60,640 ms	1,612x

Methodology: n=20 queries per family, identical synthetic ground truth, single-tenant Cloud Run replica, DeepSeek pricing $0.55/1M in + $2.19/1M out (incl. reasoning tokens). Full probe + raw artifact at probes/probe_bench_substrate_vs_deepseek.py.

What we solve

The reasoning-on-top-of-LLM stack is the wrong default.

Reasoning calls cost more than your model bill.

todayEvery agent decision -- 'should I take action A given world X?' -- runs through an LLM. Reasoning tokens stack. Bills compound.

with neruvaSubstrate answers from algebra. Pricing is $1-5 per million ops, not per million tokens.

LLM reasoning is non-deterministic by default.

todayThe same prompt at temperature=0 still drifts across deployments, model versions, and reasoning trace depth. You can't reproduce a decision a week later.

with neruvaBit-identical reruns. Audit a year later, get the same answer. The codebook is deterministic in a seed you control.

Causal vs observational is hand-wavy in chat.

todayAsk an LLM 'what happens if I force X=1?' vs 'given that X=1, what happens?' The answer is whatever the trace decides. There is no principled distinction.

with neruvaTwo different endpoints. P(Y|X=x) and P(Y|do(X=x)) are arithmetically distinct queries on the same logged worlds.

Plan generation eats minutes per call.

todayAsking a reasoning model for a 4-step plan over a 100-attribute world is 60-second p50 wall time and 30% accuracy on our bench.

with neruvaEFE planner returns 4-step plans in 160ms p50, 100% accuracy, deterministic. State is a sparse attribute set, not a paragraph.

You can't put a model in a hot loop.

today500ms-30s round-trip latency rules out using the LLM as the inner loop of anything real-time -- robotics, games, live tools.

with neruvaSub-100ms p50 for KG and analogy queries. The substrate fits in the inner loop.

Reasoning traces are not the same as decisions.

todayLLM 'reasoning' tokens look like thought but produce no formal artifact. There's no proof object. Nothing to inspect post-hoc.

with neruvaEvery response carries a confidence number. KL-divergence for plans, cosine ambiguity for analogy, marginal probability for causal.

Real-world use cases

Where teams reach for the substrate.

Live agent decisioning at scale

scenarioCustomer-service agents that need to recall 'this user prefers refunds over store credit; their last 3 interactions were about returns' before each LLM call.

with substrateKG of (user, preference, value). Query per turn. 51ms p50. Zero LLM tokens to keep the agent grounded.

Counterfactual safety in agentic systems

scenarioBefore executing an action, the agent needs to know: 'if I push this commit, what's the historical conditional probability of a rollback?'

with substrateBuild an SCM over (action, context, outcome) logged worlds. Query observation vs intervention at decision time. 90ms p50.

Robotics and game-loop planning

scenarioA discrete-action planner that runs inside a 16ms frame budget -- no chance to call a reasoning LLM, but classical search blows up exponentially.

with substrateEFE plan endpoint. 4-step plan over a discrete factored state in 160ms p50. Fits the inner loop with budget to spare.

Knowledge-graph-grounded chatbots

scenarioChatbot needs to answer 'what does Alice know about Bob?' from a freshly-ingested CRM dump, then write the answer in natural language via the existing LLM.

with substrateSubstrate handles the (subject, relation) recall in 51ms; LLM only formats the answer. 100x cheaper than asking the LLM to recall and format.

Concept-drift-resistant retrieval

scenarioEmbedding-based retrieval breaks when the model version changes. You re-embed everything and ranks shift in production.

with substrateHD codebook is deterministic in a seed. Re-encode tomorrow with the same seed; vectors are bit-identical. Migration is free.

Auditable AI for regulated industries

scenarioFinance, healthcare, legal -- need to prove a decision came from a specific reasoning path, reproducible months later for an audit.

with substrateEvery substrate response is reproducible byte-for-byte from input + seed. Determinism is a compliance feature, not a footnote.

Old vs new agent infra

Built from the ground up for the agent loop.

Existing agent stacks bolted reasoning onto a chat completion API. We built the reasoning surface natively, on a substrate where the bind operator and the do-operator are the same primitive.

Capability	Old stack (LLM + tools)	Neruva substrate
Reasoning unit	Reasoning-token trace through a transformer	HD vector algebra (bind, unbind, cleanup)
Latency	1.3s -- 60s p50 per decision	51 -- 160ms p50
Cost	$2.19 / 1M output tokens (incl. reasoning)	$1 -- $5 / 1M ops
Determinism	Drifts across deployments and model versions	Bit-identical reruns forever
Causal vs observational	Hand-wavy, prompt-dependent	Two distinct endpoints, arithmetically separated
Auditability	Reasoning trace, no formal artifact	Confidence number per call, reproducible
Hot-loop friendly	No -- too slow for game/robotics inner loops	Yes -- fits a 16ms frame budget
Hardware	GPU per request; cold-start tax on autoscale	CPU-only; warm-start is microseconds

Stop paying for tokens to think.
Pay for the answer.

$5 in credits on signup. Substrate ops are $1-5 per million. That's roughly five million KG queries before you spend a dollar.

Get an API key Read the HD spec