Memory that
reasons. Not memory you reason over.
Pinecone-shaped agent memory at the speed of a function call -- plus a substrate that answers knowledge-graph queries, analogies, and causal interventions itself. No LLM in the retrieval path. Deterministic responses, replayable from a seed. Sub-millisecond.
Already on Pinecone? Change one import.
Same client. Same JSON. Same upsert / query / delete. Move your codebase across in the time it takes to commit a diff.
# before from pinecone import Pinecone # after from neruva import Pinecone
Wire into the tool you already use.
One MCP server, every modern coding agent. Pick your client and paste the line.
claude mcp add neruva \ --env NERUVA_API_KEY=$NERUVA_API_KEY \ -- npx -y @neruva/mcp
Operations on memory, not just storage of it.
Every other vector DB is a dumb cabinet. You store, you retrieve by cosine, then you pay an LLM to think. Neruva's HD-native API lets the substrate itself answer the question -- knowledge-graph queries, analogies, causal inference -- all over HTTP, all deterministic, none of them burning model tokens.
POST /v1/hd/kg/people/facts
{"facts":[
{"subject":"alice","relation":"lives_in","object":"toronto"},
{"subject":"alice","relation":"works_at","object":"acme"}
]}
POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}POST /v1/hd/causal/scm1/query
{"query_type":"observation",
"condition_var":1, "condition_value":1,
"query_var":2, "query_value":1}
POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}Four primitives. One mental model.
Stop stuffing context windows with everything the agent might recall. Stop asking the LLM to do what a vector op can do for free.
# on every turn hits = memory_query_text( index="brain", namespace="handoffs", text=task_description, topK=3) # pass the 3 hits to the LLM, not the whole history
# refactor an old function hd_kg_delete_fact(kg="code", subject="myfunc", relation="defined_in", object="file_a") hd_kg_add_fact(kg="code", subject="myfunc", relation="defined_in", object="file_b")
# CI debug: is my PR the cause, or noise? p = hd_causal_query(scm="ci", query_type="intervention", condition_var=my_change, condition_value=1, query_var=test_failed, query_value=1) # p > 0.6 means yes, my change broke it
# experimental, n_feat up to 20 (1M items) ans = hd_analogy(n_feat=10, a=king, b=queen, c=man) # returns: woman with cosine 1.0
Every offloaded recall is tokens you don't pay for.
An agent that stuffs 5 KB of "memory" context into every turn burns input tokens at LLM rates. The same recall via Neruva is sub-100 ms and zero LLM tokens. KG queries replace whole paragraphs of "here's what you should know about project X". Causal queries replace multi-step prompts that ask the model to weigh interventions.
Example numbers use Claude Opus 4.7 list pricing ($5/M input, $25/M output). Other models vary -- Sonnet is cheaper, Haiku cheaper still, and self-hosted is its own math. The shape of the savings is what matters: every recall via cosine is a recall you don't do via a 6 KB context-stuffing prompt.
Adopt the optional { kind: "decision" | "mistake" | "session_end" | "handoff" } metadata convention and any Neruva-aware harness (Claude Code, Cursor, Codex, custom agents) can recall and render your records the same way. Plus tags: ["foo"] sugar on every query for filter-by-tag.
Every agent team hits the same wall.
We've heard it from teams shipping agents into production. The shape of pain is the same every time.
Your agents forget between sessions.
todayContext resets the moment the loop ends. Users restart and the agent has no idea who they are.
with neruvaAppend-only memory keyed by agent + user. Recall in a single call, no LLM round-trip.
The memory bill scales with every user.
todayVector DBs charge per index, per pod, per read unit. Spin up 10,000 user namespaces and you're funding the wrong startup.
with neruvaOne mmap segment per namespace. Marginal cost per agent rounds to zero.
LLM-as-memory hallucinates -- and bills you for it.
todayCalling a model to summarize, extract, and rerank on every write adds seconds of latency and per-token cost that compounds.
with neruvaNo model in the retrieval path. Deterministic, auditable, cheap by construction.
Write-heavy workloads break read-optimized indexes.
todayHNSW rebuilds choke when agents write thousands of memories per minute. Throughput collapses; tail latency explodes.
with neruvaAppend-only write-ahead log with async indexing. Writes don't wait for the index.
Nobody can audit what your agent remembers.
todayCompliance asks 'show me what this agent knows about user X' and the answer is a black-box embedding.
with neruvaEvery memory is a first-class row with metadata, timestamp, and a content-addressable id. Inspect, export, delete.
Surgical forget is an afterthought.
todayA user revokes consent. Now you have to find their fingerprint scattered across embeddings, rebuilds, and caches.
with neruvaSoft-delete by id, by filter, or by predicate. Tombstones flush on the next compaction.
Designed for the workload you actually run.
$eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.Stop renting search infra.
Start owning agent memory.
$5 in credits on signup. No card. No subscription. No demo call. Wire it in and decide.