Numbers, not claims.
All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.
Sub-second end-to-end. Sub-100ms substrate.
End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.
Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.
Same seed, same answer. Every time.
The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.
Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.
Calibrated confidence at scale.
Sharded HD knowledge-graph queries return the bound object plus a calibrated confidence. We seed N facts of shape (person, born_in, city), then query each subject and check the returned object matches the originally-bound one.
Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.
3,125× cheaper per recall.
Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.
records_query with typed filtersOpus 4.7 list pricing $5/M input. Other models differ.
How we measure.
- All measurements run against
https://api.neruva.ioover public TLS. No localhost, no preferential routing, no server-side shortcuts. - Each operation is warmed up 10 times to absorb cold-start cost (Cloud Run scale-to-zero adds a one- time penalty on the first call after idle), then measured 100 times.
- Latency is measured client-side via
time.perf_counter()around the HTTP call -- the number includes TLS handshake, round-trip from caller location tous-central1, and server-side compute. - Determinism is verified by comparing 20 reruns of the same analogy query and checking outputs are bit-identical.
- KG accuracy seeds N synthetic facts and queries each subject, comparing the returned object against the originally-bound one.
- Cost ratio uses Opus 4.7 list pricing ($5/M input) for the stuff-into-prompt baseline and our published per-op rate ($2/M for
records_query). Other models / cheaper tiers shift the absolute numbers but preserve the order of magnitude.
Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.
Bring your own analysis.
The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.
{
"base_url": "https://api.neruva.io",
"namespace": "bench-pending",
"ts": 0,
"warmup_n": 10,
"measure_n": 100,
"ops": [],
"side_checks": {
"determinism": null,
"kg_accuracy": null,
"cost_vs_opus47": {
"records_query_usd_per_call": 0.000002,
"context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
"ratio": 3125,
"notes": "Opus 4.7 list pricing $5/M input. Other models differ."
}
},
"_status": "pending -- numbers will appear after the next benchmark run; see methodology below"
}Sub-100ms substrate. Provable, not promised.
All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.