Benchmarks

Numbers, not claims.

All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.

Last measured:pendingWarmup / measure:10 + 100 per opRegion:us-central1 (Cloud Run)Transport:HTTPS, TLS 1.2+

Latency

Sub-second end-to-end. Sub-100ms substrate.

End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Determinism

Same seed, same answer. Every time.

The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Knowledge graph accuracy

Calibrated confidence at scale.

Sharded HD knowledge-graph queries return the bound object plus a calibrated confidence. We seed N facts of shape (person, born_in, city), then query each subject and check the returned object matches the originally-bound one.

Measurement pending

Numbers will appear here after the next benchmark run against the live API. The methodology section below describes exactly what will be measured.

Cost vs LLM token-stuffing

3,125× cheaper per recall.

Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.

Stuff-into-prompt

5 KB context every Opus 4.7 turn

~1.25k input tokens × $5/M = $0.00625 / turn

Neruva

One records_query with typed filters

$2 / 1M = $0.0000020 / call

Ratio

3,125×

cheaper per recall on the same payload size

Opus 4.7 list pricing $5/M input. Other models differ.

Methodology

How we measure.

All measurements run against https://api.neruva.io over public TLS. No localhost, no preferential routing, no server-side shortcuts.
Each operation is warmed up 10 times to absorb cold-start cost (Cloud Run scale-to-zero adds a one- time penalty on the first call after idle), then measured 100 times.
Latency is measured client-side via time.perf_counter() around the HTTP call -- the number includes TLS handshake, round-trip from caller location to us-central1, and server-side compute.
Determinism is verified by comparing 20 reruns of the same analogy query and checking outputs are bit-identical.
KG accuracy seeds N synthetic facts and queries each subject, comparing the returned object against the originally-bound one.
Cost ratio uses Opus 4.7 list pricing ($5/M input) for the stuff-into-prompt baseline and our published per-op rate ($2/M for records_query). Other models / cheaper tiers shift the absolute numbers but preserve the order of magnitude.

Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.

Raw measurements

Bring your own analysis.

The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.

{
  "base_url": "https://api.neruva.io",
  "namespace": "bench-pending",
  "ts": 0,
  "warmup_n": 10,
  "measure_n": 100,
  "ops": [],
  "side_checks": {
    "determinism": null,
    "kg_accuracy": null,
    "cost_vs_opus47": {
      "records_query_usd_per_call": 0.000002,
      "context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
      "ratio": 3125,
      "notes": "Opus 4.7 list pricing $5/M input. Other models differ."
    }
  },
  "_status": "pending -- numbers will appear after the next benchmark run; see methodology below"
}

Sub-100ms substrate. Provable, not promised.

All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.

Get an API key Read the docs