Benchmarks

Numbers, not claims.

All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.

Last measured:pendingWarmup / measure:10 + 100 per opRegion:us-central1 (Cloud Run)Transport:HTTPS, TLS 1.2+
Latency

Sub-second end-to-end. Sub-100ms substrate.

End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Determinism

Same seed, same answer. Every time.

The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Knowledge graph accuracy

Calibrated confidence at scale.

Sharded HD knowledge-graph queries return the bound object plus a calibrated confidence. We seed N facts of shape (person, born_in, city), then query each subject and check the returned object matches the originally-bound one.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Cost vs LLM token-stuffing

3,125× cheaper per recall.

Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.

Stuff-into-prompt
5 KB context every Opus 4.7 turn
~1.25k input tokens × $5/M = $0.00625 / turn
Neruva
One records_query with typed filters
$2 / 1M = $0.0000020 / call
Ratio
3,125×
cheaper per recall on the same payload size

Opus 4.7 list pricing $5/M input. Other models differ.

Methodology

How we measure.

Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.

Raw measurements

Bring your own analysis.

The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.

{
  "base_url": "https://api.neruva.io",
  "namespace": "bench-pending",
  "ts": 0,
  "warmup_n": 10,
  "measure_n": 100,
  "ops": [],
  "side_checks": {
    "determinism": null,
    "kg_accuracy": null,
    "cost_vs_opus47": {
      "records_query_usd_per_call": 0.000002,
      "context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
      "ratio": 3125,
      "notes": "Opus 4.7 list pricing $5/M input. Other models differ."
    }
  },
  "_status": "pending -- numbers will appear after the next benchmark run; see methodology below"
}

Sub-100ms substrate. Provable, not promised.

All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.