Measured • Reproducible • Bounded claims

Proof, not positioning.

This page publishes the current in-repo WorkMemory regression package: the benchmark artifact path, current aggregate score, recall latency, benchmark subscores, comparison artifacts, and the exact local command used to reproduce it.

-- Aggregate score
-- Recall p50
-- Recall p95
-- Total queries

Benchmark breakdown

Waiting for proof payload…
-- Long-term retention

Ability to retrieve older facts after filler turns and topic drift.

-- Cross-session recall

Ability to preserve facts across separate sessions and later queries.

-- Fact extraction

Accuracy of turning raw inputs into durable, searchable memories.

-- Temporal reasoning

Point-in-time and event-time correctness under explicit temporal questions.

-- Knowledge updates

Correct handling of supersession and newer facts overriding older ones.

-- Abstention

Graceful refusal when no matching memory should be returned.

Latency tables

Derived from the committed primary regression artifact.
Query class Count P50 P95 P99 Range
Waiting for proof payload…

Broader surface latencies

Repo-local proof-pack artifact not loaded yet.
Surface Case Route Coverage Count P50 P95 P99 Outcome
Waiting for proof payload…

SDK parity checks

Canonical HTTP to SDK parity checks not loaded yet.
Surface HTTP SDK Parity
Waiting for proof payload…

Usage reference

Current canonical metering formulas and settled usage signals.
Surface Route Signal Formula Notes
Waiting for proof payload…

Optional LLM metering

Representative query-expansion and reranking estimate rows not loaded yet.
Surface Workload Model Prompt Completion Reasoning Run estimate Metered estimate Notes
Waiting for proof payload…

Reference matrix

Secondary comparison artifacts on the same harness.
Loading Comparison artifacts

Waiting for secondary benchmark runs from the proof payload.

Official-dataset publication

Official-dataset publication not loaded yet.
-- Questions attempted
-- Quick evaluator
-- Reasoning judge
-- Total estimate
-- Total / question
Run details
  • Waiting for proof payload…
Question-type accuracy
Question type Count Accuracy
Waiting for proof payload…

Current developer surfaces

Shipped today, not roadmap fiction
Canonical REST

`/memory/v1/remember`, `/recall`, `/forget`, `/profile`, sessions, and staged uploads are the primary contract.

Standalone MCP

`workmemory-mcp --stdio` is the installable bridge entrypoint, and `python -m src.mcp --stdio` remains the repo-local server path. HTTP MCP can run with tenant-header mode or explicit API-key enforcement.

SDK path

TypeScript, Python, OpenAI-compatible, LangChain, and Vercel wrappers all target the same canonical shared-host namespace.

Reproduce locally

Same command used to generate the committed artifact
scripts/evals/run-workmemory-regression.sh
Artifact path
  • Waiting for proof payload…
Claim boundaries
  • Loading proof boundaries…