api.neruva.io·Pinecone-compatible memory for agents

Agent memory at the speed of a function call.

A Pinecone-shaped vector store with namespace-per-agent built in, sub-millisecond writes, and a portable .nmm memory file you actually own. Drop-in for the Pinecone client; change one import.

Writes
<1ms
Pure numpy: normalize, pack to sign bits, append. No embedder, no rerank LLM, no async reindex queue between you and your agent's next turn.
Index size
32x smaller
1-bit prefilter cosine -- vectors are packed to bipolar sign bits for the hot path. Float rescore for accuracy on top-k. Fits in RAM.
Namespaces
$0/each
One mmap segment per namespace. Spin up a million; we don't charge for them individually. Per-op pricing only.
Ownership
.nmm
Your memory is a file. Portable, inspectable, exportable. Export your index, diff it, migrate it, read it offline.
Migrate in one line

Already on Pinecone? Change one import.

Same client. Same JSON. Same upsert / query / delete. Move your codebase across in the time it takes to commit a diff.

agent.py
# before
from pinecone import Pinecone

# after
from neruva import Pinecone
What we solve

Every agent team hits the same wall.

We've heard it from teams shipping agents into production. The shape of pain is always the same.

Your agents forget between sessions.

todayContext resets the moment the loop ends. Users restart and the agent has no idea who they are.

with neruvaAppend-only memory keyed by agent + user. Recall in a single call, no LLM round-trip.

The memory bill scales with every user.

todayVector DBs charge per index, per pod, per read unit. Spin up 10,000 user namespaces and you're funding the wrong startup.

with neruvaOne mmap segment per namespace. Marginal cost per agent rounds to zero.

LLM-as-memory hallucinates -- and bills you for it.

todayCalling a model to summarize, extract, and rerank on every write adds seconds of latency and per-token cost that compounds.

with neruvaNo model in the retrieval path. Deterministic, auditable, cheap by construction.

Write-heavy workloads break read-optimized indexes.

todayHNSW rebuilds choke when agents write thousands of memories per minute. Throughput collapses; tail latency explodes.

with neruvaAppend-only write-ahead log with async indexing. Writes don't wait for the index.

Nobody can audit what your agent remembers.

todayCompliance asks 'show me what this agent knows about user X' and the answer is a black-box embedding.

with neruvaEvery memory is a first-class row with metadata, timestamp, and a content-addressable id. Inspect, export, delete.

Surgical forget is an afterthought.

todayA user revokes consent. Now you have to find their fingerprint scattered across embeddings, rebuilds, and caches.

with neruvaSoft-delete by id, by filter, or by predicate. Tombstones flush on the next compaction.

Real-world use cases

Where teams reach for Neruva memory.

Multi-user customer-support agents

scenarioA SaaS support assistant serves 50,000 customers. Each customer needs an isolated memory: past tickets, preferences, sentiment. Pinecone bills per namespace and per pod.

with neruvaOpen a Neruva namespace per customer. Marginal cost rounds to zero. Cosine-recall last 20 tickets in under a millisecond per turn.

LLM-agent personal-memory layer

scenarioA consumer chat product where the agent remembers user preferences across months. Writes are constant, reads are bursty, recall has to feel instant.

with neruvaHot-path-safe append-only writes. Sub-ms recall. Time-decay filters first-class -- no metadata kludge.

Compliance-grade agent audit trail

scenarioFinancial-services chatbot must prove which memory drove which response, retain for 7 years, support 'right to be forgotten' requests.

with neruvaEvery memory is an inspectable row with timestamp + content hash. Surgical delete by id, by user, or by predicate. Export as .nmm to cold storage.

Write-heavy ingest pipelines

scenarioReal-time observability or log-analysis where thousands of events per minute become memory. HNSW indexes choke; tail latency explodes.

with neruvaAppend-only WAL with async indexing. Writes don't block on the index. Throughput stays flat as the corpus grows.

On-prem / edge agent deployments

scenarioAn agent that runs in a customer datacenter or on a developer's laptop. Pinecone-style managed indexes are off the table.

with neruvaExport the index as .nmm and ship it. The file is the deployment. Read it offline with the same client.

Migration from a legacy vector DB

scenarioTeam is on Pinecone or Weaviate, costs are growing, but the codebase has Pinecone-shaped calls everywhere.

with neruvaChange one import. The drop-in client speaks the same dialect: upsert, query, delete, fetch, update, describe_index_stats.

Cost savings

Per operation. Not per pod, per index, per RU.

Wallet model: top up via PayPal, ops deduct as you use them. No subscription. No minimum. No overage trap.

WorkloadManaged peer (estimated)NeruvaYou keep
10,000 user namespaces, idle~$700/mo (per-namespace fees)$0 / mo~$8,400 / yr
1M upserts / month~$50/mo (RU + bandwidth)~$1 / mo~$588 / yr
10M queries / month~$100/mo~$10 / mo~$1,080 / yr
Re-embed corpus (re-vendor migration)$$$ to re-embed + index rebuildre-pack bits, in-placeEngineer-weeks

Peer pricing modeled on public list rates for serverless Pinecone indexes at small-medium scale. Your real bill will vary by region, metric, and embed dimension.

How it feels in production

Designed for the workload you actually run.

Writes
Hot-path safe.
Agents write memories inline without blocking. The index catches up in the background.
Tenancy
Namespace per agent, per user, per anything.
Spin up a million namespaces. We don't charge for them individually.
Filters
The operators you expect.
$eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.
Recency
Time-aware retrieval, first-class.
Decay weights and time-window filters built into the index, not bolted on with metadata.
Compliance
Surgical forget. Full audit.
Delete by id, by user, by predicate. Every memory is inspectable and removable.
Billing
Wallet model. No surprises.
Top up via PayPal. Operations deduct in real time. No subscription, no minimum, no overage trap.
Old vs new agent memory

Built from the ground up for agent loops.

Managed vector DBs were designed for retrieval-augmented chat in 2022. Agent workloads write more, isolate harder, and need audit trails that don't leak across tenants.

ConcernOld managed vector DBNeruva memory
Tenancy unitIndex (heavy, billed)Namespace (free, lightweight)
Per-tenant costPer-pod or per-RU minimum$0 marginal; you pay per op
Write latencyAsync indexing tail; HNSW rebuildsSub-ms append; index catches up
Hot-path cosineFloat-only ANN; 32-bit lanes1-bit popcount prefilter; 32x smaller
PortabilityVendor-locked storage format.nmm file you export and own
Surgical forgetBest-effort; tombstone scatteringFirst-class by id, predicate, or user
Drop-in APIPinecone-shaped (if you're already there)Pinecone-shaped, same client, one-import swap
Substrate add-onNone -- you call an LLM to reasonSame key opens /v1/hd/* reasoning surface

Stop renting search infra.
Start owning agent memory.

$5 in credits on signup. No card. No subscription. No demo call. Wire it in and decide.