api.neruva.io·now in early access

Memory thatreasons. Not memory you reason over.

Pinecone-shaped agent memory at the speed of a function call -- plus a substrate that answers knowledge-graph queries, analogies, causal interventions, and plans itself. No model in the loop. Deterministic. Sub-millisecond.

The substrate
1-bit prefilter cosine.
Vectors are packed to bipolar sign bits for the hot-path cosine prefilter, with a float rescore for accuracy. The fast part of the index is 32x smaller than a float-only ANN.
The speed
No model in the loop.
Writes are pure numpy: normalize, pack to sign bits, append. No embedder, no rerank LLM, no async reindex queue between you and your agent's next turn.
The scale
Namespace per agent. Zero per-namespace fee.
Each agent, user, or thread gets its own isolated namespace. We don't charge for them. Pricing is purely per operation.
The ownership
Your memory is a file.
.nmm -- a portable, inspectable memory file. Export your index, diff it, migrate it, or read it offline. The agent's mind, on disk.
Migrate in one line

Already on Pinecone? Change one import.

Same client. Same JSON. Same upsert / query / delete. Move your codebase across in the time it takes to commit a diff.

agent.py
# before
from pinecone import Pinecone

# after
from neruva import Pinecone
Get started

Wire into the tool you already use.

One MCP server, every modern coding agent. Pick your client and paste the line.

claude mcp add neruva \
  --env NERUVA_API_KEY=$NERUVA_API_KEY \
  -- npx -y @neruva/mcp
The wedge -- HD-native substrate

Operations on memory, not just storage of it.

Every other vector DB is a dumb cabinet. You store, you retrieve by cosine, then you pay an LLM to think. Neruva's HD-native API lets the substrate itself answer the question -- knowledge-graph queries, analogies, causal inference, planning -- all over HTTP, all deterministic, none of them burning model tokens.

/v1/hd/kg/*
Knowledge graphs
Bind (subject, relation, object) triples into one ~32KB vector. Query by (subject, relation) -- unbind returns the answer with calibrated confidence. Thousands of facts per shard. Sub-millisecond.
POST /v1/hd/kg/people/facts
{"facts":[
  {"subject":"alice","relation":"lives_in","object":"toronto"},
  {"subject":"alice","relation":"works_at","object":"acme"}
]}

POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}
/v1/hd/analogy
Analogy by algebra
A:B::C:? Parallelogram completion over factored items. The substrate finds D by XOR-style algebra, not by prompting a model. Returns candidate, cosine, and ambiguity gap.
POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}
/v1/hd/causal/*
Pearl's do-operator
Distinguish observation from intervention. P(Y|X=1) is what you observed; P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector DB exposes this.
POST /v1/hd/causal/scm1/query
{"query_type":"observation",
 "condition_var":1, "condition_value":1,
 "query_var":2,     "query_value":1}

POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}
/v1/hd/plan
Active-Inference planning
Expected Free Energy planner over discrete actions. Give it (state, goal, action space, depth). Returns the optimal action sequence plus a KL-to-goal confidence. No prompt, no model, no temperature.
POST /v1/hd/plan
{"V":100, "n_actions":8,
 "init_state":[0,1,2],
 "goal_attrs":[10,11,12],
 "depth":4, "n_candidates":20, "seed":6001}
-> {"best_plan":[6,2,6,0], "kl_divergence":33.5}
Cost
No tokens burned for substrate-answerable queries.
Latency
Sub-millisecond HD ops vs ~500ms model round-trips.
Determinism
Same input -> same output. Auditable. No temperature.
Safety
Causal vs observational queries are arithmetically distinct -- principled refusal.
What we solve

Every agent team hits the same wall.

We've heard it from teams shipping agents into production. The shape of pain is the same every time.

Your agents forget between sessions.

todayContext resets the moment the loop ends. Users restart and the agent has no idea who they are.

with neruvaAppend-only memory keyed by agent + user. Recall in a single call, no LLM round-trip.

The memory bill scales with every user.

todayVector DBs charge per index, per pod, per read unit. Spin up 10,000 user namespaces and you're funding the wrong startup.

with neruvaOne mmap segment per namespace. Marginal cost per agent rounds to zero.

LLM-as-memory hallucinates -- and bills you for it.

todayCalling a model to summarize, extract, and rerank on every write adds seconds of latency and per-token cost that compounds.

with neruvaNo model in the retrieval path. Deterministic, auditable, cheap by construction.

Write-heavy workloads break read-optimized indexes.

todayHNSW rebuilds choke when agents write thousands of memories per minute. Throughput collapses; tail latency explodes.

with neruvaAppend-only write-ahead log with async indexing. Writes don't wait for the index.

Nobody can audit what your agent remembers.

todayCompliance asks 'show me what this agent knows about user X' and the answer is a black-box embedding.

with neruvaEvery memory is a first-class row with metadata, timestamp, and a content-addressable id. Inspect, export, delete.

Surgical forget is an afterthought.

todayA user revokes consent. Now you have to find their fingerprint scattered across embeddings, rebuilds, and caches.

with neruvaSoft-delete by id, by filter, or by predicate. Tombstones flush on the next compaction.

How it feels in production

Designed for the workload you actually run.

Writes
Hot-path safe.
Agents write memories inline without blocking. The index catches up in the background.
Tenancy
Namespace per agent, per user, per anything.
Spin up a million namespaces. We don't charge for them individually.
Filters
The operators you expect.
$eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.
Recency
Time-aware retrieval, first-class.
Decay weights and time-window filters built into the index, not bolted on with metadata.
Compliance
Surgical forget. Full audit.
Delete by id, by user, by predicate. Every memory is inspectable and removable.
Billing
Wallet model. No surprises.
Top up via PayPal. Operations deduct in real time. No subscription, no minimum, no overage trap.

Stop renting search infra.
Start owning agent memory.

$5 in credits on signup. No card. No subscription. No demo call. Wire it in and decide.