Start with install notes or jump straight into the API.

API

Query & Fetch

Query request

POST /v2/namespaces/products/query
Content-Type: application/json

{
  "vector": [0.0012, -0.043],
  "top_k": 10,
  "filters": ["category", "Eq", "Electronics"],
  "include_attributes": ["title", "category"]
}
{
  "results": [
    {"id": "asin-B08N5WRWNW", "dist": 0.42, "attributes": {"title": "..."}}
  ],
  "stable_as_of": 1715600400000
}

Strong-consistent reads

Turbopuffer indexes upserts asynchronously, so a naive query right after an upsert can return partial results or 429 entirely under streaming-write pressure. Layer sidesteps both:

  1. Queries run at consistency=eventual upstream, so they never block on indexing.
  2. A background loop polls each registered namespace’s index.status and records the latest status plus, when stable, a watermark equal to poll_start - safety_margin.
  3. Per-query decision:
    • Updating → inject a hidden _hevlayer_upserted_at <= watermark predicate so the read never sees partially-indexed rows.
    • Stable or Unknown → run without the predicate. The upstream index is caught up (or no contrary evidence exists).
  4. On a 429 to an unfiltered query, Layer retries once with the watermark filter forced on.

Responses always report stable_as_of (epoch ms) — the most recent watermark the watcher has recorded. Omitted on a cold-start gateway that has not yet observed a stable poll.

Filter shape

["category", "Eq", "Electronics"]                # leaf
["And", [["category", "Eq", "Electronics"],
         ["price", "Lte", 200]]]                 # conjunction
["Or",  [...]]                                   # disjunction

Filter shape follows Turbopuffer array syntax. Layer combines the caller’s filter with the watermark predicate using a 2-element And automatically — callers never see _hevlayer_upserted_at in their request or response.

Tunables

VariableDefaultPurpose
CONSISTENCY_POLL_INTERVAL_MS1000How often the watcher polls each namespace.
CONSISTENCY_SAFETY_MARGIN_MS500Cushion between poll time and watermark to cover in-flight upserts.

Explain query

POST /v2/namespaces/products/explain_query

explain_query is proxied to Turbopuffer verbatim — Layer adds nothing and applies no watermark filter. Use it to inspect upstream query planning; see the upstream docs for the request and response shape.

Fetch

Fetch is a Layer-only surface — there is no upstream equivalent. The NVMe cache is checked first; on miss or error the gateway falls through to Turbopuffer and backfills the cache best-effort.

Single fetch

GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category
OutcomeStatusHeader
Cached hit200x-layer-cache: hit
Cache miss, upstream hit, cache backfilled200x-layer-cache: miss
Cache unavailable, upstream hit200x-layer-cache: miss-on-error
Missing from both layers404

Batch fetch

POST /v2/namespaces/products/documents
Content-Type: application/json

{
  "ids": ["asin-1", "asin-2", "asin-3"],
  "include_attributes": ["title"]
}
{
  "documents": [
    {"id": "asin-1", "attributes": {"title": "..."}},
    {"id": "asin-3", "attributes": {"title": "..."}}
  ],
  "missing": ["asin-2"]
}

Batch fetch returns found documents and missing ids inline instead of a partial 404. documents preserves request order; ids the gateway could not find anywhere land in missing.

Behavior matrix

Cache stateSingle fetchBatch fetch
Hitcachecache
Miss, upstream presentupstream + backfillupstream + backfill
Miss, upstream absent404inline missing
Cache unavailableupstream, miss-on-errorupstream, miss-on-error
esc