API

Query & Fetch

Query request

POST /v2/namespaces/products/query
Content-Type: application/json

{
  "vector": [0.0012, -0.043],
  "top_k": 10,
  "filters": ["category", "Eq", "Electronics"],
  "include_attributes": ["title", "category"]
}

{
  "results": [
    {"id": "asin-B08N5WRWNW", "dist": 0.42, "attributes": {"title": "..."}}
  ],
  "stable_as_of": 1715600400000
}

Strong-consistent reads

Turbopuffer indexes upserts asynchronously, so a naive query right after an upsert can return partial results or 429 entirely under streaming-write pressure. Layer sidesteps both:

Queries run at consistency=eventual upstream, so they never block on indexing.
A background loop polls each registered namespace’s index.status and records the latest status plus, when stable, a watermark equal to poll_start - safety_margin.
Per-query decision:
- Updating → inject a hidden _hevlayer_upserted_at <= watermark predicate so the read never sees partially-indexed rows.
- Stable or Unknown → run without the predicate. The upstream index is caught up (or no contrary evidence exists).
On a 429 to an unfiltered query, Layer retries once with the watermark filter forced on.

Responses always report stable_as_of (epoch ms) — the most recent watermark the watcher has recorded. Omitted on a cold-start gateway that has not yet observed a stable poll.

Filter shape

["category", "Eq", "Electronics"]                # leaf
["And", [["category", "Eq", "Electronics"],
         ["price", "Lte", 200]]]                 # conjunction
["Or",  [...]]                                   # disjunction

Filter shape follows Turbopuffer array syntax. Layer combines the caller’s filter with the watermark predicate using a 2-element And automatically — callers never see _hevlayer_upserted_at in their request or response.

Tunables

Variable	Default	Purpose
`CONSISTENCY_POLL_INTERVAL_MS`	1000	How often the watcher polls each namespace.
`CONSISTENCY_SAFETY_MARGIN_MS`	500	Cushion between poll time and watermark to cover in-flight upserts.

Explain query

POST /v2/namespaces/products/explain_query

explain_query is proxied to Turbopuffer verbatim — Layer adds nothing and applies no watermark filter. Use it to inspect upstream query planning; see the upstream docs for the request and response shape.

Fetch

Fetch is a Layer-only surface — there is no upstream equivalent. The NVMe cache is checked first; on miss or error the gateway falls through to Turbopuffer and backfills the cache best-effort.

Single fetch

GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category

Outcome	Status	Header
Cached hit	200	`x-layer-cache: hit`
Cache miss, upstream hit, cache backfilled	200	`x-layer-cache: miss`
Cache unavailable, upstream hit	200	`x-layer-cache: miss-on-error`
Missing from both layers	404	—

Batch fetch

POST /v2/namespaces/products/documents
Content-Type: application/json

{
  "ids": ["asin-1", "asin-2", "asin-3"],
  "include_attributes": ["title"]
}

{
  "documents": [
    {"id": "asin-1", "attributes": {"title": "..."}},
    {"id": "asin-3", "attributes": {"title": "..."}}
  ],
  "missing": ["asin-2"]
}

Batch fetch returns found documents and missing ids inline instead of a partial 404. documents preserves request order; ids the gateway could not find anywhere land in missing.

Behavior matrix

Cache state	Single fetch	Batch fetch
Hit	cache	cache
Miss, upstream present	upstream + backfill	upstream + backfill
Miss, upstream absent	404	inline `missing`
Cache unavailable	upstream, `miss-on-error`	upstream, `miss-on-error`