# hev layer — full docs

> Concatenated docs surface. Index at https://hevlayer.com/llms.txt.

---

## Search knowledge graph

Source: https://hevlayer.com/docs/search-knowledge-graph
Version: 2
Generated: 2026-06-05T21:31:25.527Z
Content hash: 4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458

Context:
## Layer (hev layer)

Layer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`.

### Core building blocks
- **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime.
- **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status.
- **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic.

### Key concepts users ask about
- **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`.
- **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them.
- **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response.
- **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable.
- **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts.
- **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`.
- **Dashboard** — read-mostly operator GUI reading the same gateway API.

### How users talk about it
Users say "the gateway," "drop-in Turbopuffer client," "warm the cache," "strongly consistent query," "snapshot," "facet counts," "scan a filter," "stage/claim/embed," "UDF/function," "compute pool," and "scale to zero." Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache).

Glossary:
- Gateway: The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. Aliases: layer-gateway, the proxy, rust gateway.
- stable watermark: Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads. Aliases: watermark, stableasof, consistency watermark.
- pull-through cache: NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency. Aliases: document cache, nvme cache, aerospike.
- UDF: A stateless worker that computes one derived attribute per row of an index, without changing row count. Aliases: user-defined function, function, udfs.
- pipeline: A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output. Aliases: pipelines, indexing pipeline.
- operator: The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources. Aliases: layer-operator, k8s operator, kubernetes operator.
- CRD: Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules. Aliases: custom resource definition, index crd, function crd, pipeline crd, infrarules.
- snapshot: A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable. Aliases: snapshots, facet snapshot, facet histogram.
- scan: A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously. Aliases: scans, filter scan.
- ask CLI: Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key. Aliases: ask, hevlayer-docs skill.

Raw JSON:
```json
{
  "version": 2,
  "generatedAt": "2026-06-05T21:31:25.527Z",
  "contentHash": "4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458",
  "context": "## Layer (hev layer)\n\nLayer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`.\n\n### Core building blocks\n- **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime.\n- **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status.\n- **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic.\n\n### Key concepts users ask about\n- **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`.\n- **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them.\n- **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response.\n- **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable.\n- **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts.\n- **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`.\n- **Dashboard** — read-mostly operator GUI reading the same gateway API.\n\n### How users talk about it\nUsers say \"the gateway,\" \"drop-in Turbopuffer client,\" \"warm the cache,\" \"strongly consistent query,\" \"snapshot,\" \"facet counts,\" \"scan a filter,\" \"stage/claim/embed,\" \"UDF/function,\" \"compute pool,\" and \"scale to zero.\" Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache).",
  "glossary": [
    {
      "term": "Gateway",
      "aliases": [
        "layer-gateway",
        "the proxy",
        "rust gateway"
      ],
      "definition": "The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime."
    },
    {
      "term": "stable watermark",
      "aliases": [
        "watermark",
        "stableasof",
        "consistency watermark"
      ],
      "definition": "Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads."
    },
    {
      "term": "pull-through cache",
      "aliases": [
        "document cache",
        "nvme cache",
        "aerospike"
      ],
      "definition": "NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency."
    },
    {
      "term": "UDF",
      "aliases": [
        "user-defined function",
        "function",
        "udfs"
      ],
      "definition": "A stateless worker that computes one derived attribute per row of an index, without changing row count."
    },
    {
      "term": "pipeline",
      "aliases": [
        "pipelines",
        "indexing pipeline"
      ],
      "definition": "A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output."
    },
    {
      "term": "operator",
      "aliases": [
        "layer-operator",
        "k8s operator",
        "kubernetes operator"
      ],
      "definition": "The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources."
    },
    {
      "term": "CRD",
      "aliases": [
        "custom resource definition",
        "index crd",
        "function crd",
        "pipeline crd",
        "infrarules"
      ],
      "definition": "Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules."
    },
    {
      "term": "snapshot",
      "aliases": [
        "snapshots",
        "facet snapshot",
        "facet histogram"
      ],
      "definition": "A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable."
    },
    {
      "term": "scan",
      "aliases": [
        "scans",
        "filter scan"
      ],
      "definition": "A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously."
    },
    {
      "term": "ask CLI",
      "aliases": [
        "ask",
        "hevlayer-docs skill"
      ],
      "definition": "Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key."
    }
  ],
  "overview": "## API\n- Introduction — `api/introduction`\n- Cache warm hint — GET /v1/namespaces/{ns}/hint_cache_warm — `api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm`\n- Client fall-through — `api/introduction#client-fall-through`\n- Compatibility posture — `api/introduction#compatibility-posture`\n- Cross-cutting conventions — `api/introduction#cross-cutting-conventions`\n- Enhancements to upstream routes — `api/introduction#enhancements-to-upstream-routes`\n- Install — `api/introduction#install`\n- Metadata — GET /v2/namespaces/{ns}/metadata — `api/introduction#metadata--get-v2namespacesnsmetadata`\n- Query — POST /v2/namespaces/{ns}/query — `api/introduction#query--post-v2namespacesnsquery`\n- Write — POST /v2/namespaces/{ns} and PATCH /v2/namespaces/{ns} — `api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns`\n- Metrics API — `api/metrics`\n- Health — `api/metrics#health`\n- Metrics catalog — `api/metrics#metrics-catalog`\n- PromQL passthrough — `api/metrics#promql-passthrough`\n- Routes — `api/metrics#routes`\n- Namespace metadata — `api/namespace-metadata`\n- List namespaces — `api/namespace-metadata#list-namespaces`\n- Request — `api/namespace-metadata#request`\n- The layer block — `api/namespace-metadata#the-layer-block`\n- Query & Fetch — `api/query`\n- Batch fetch — `api/query#batch-fetch`\n- Behavior matrix — `api/query#behavior-matrix`\n- Explain query — `api/query#explain-query`\n- Fetch — `api/query#fetch`\n- Filter shape — `api/query#filter-shape`\n- Query request — `api/query#query-request`\n- Single fetch — `api/query#single-fetch`\n- Strong-consistent reads — `api/query#strong-consistent-reads`\n- Tunables — `api/query#tunables`\n- Result Count — `api/result-count`\n- Scan — `api/scans`\n- Auto-Mode Policy — `api/scans#auto-mode-policy`\n- Count Mode — `api/scans#count-mode`\n- ID Mode — `api/scans#id-mode`\n- Routes — `api/scans#routes`\n- Query History — `api/search-history`\n- Clickstream entry — `api/search-history#clickstream-entry`\n- Query parameters — `api/search-history#query-parameters`\n- Routes — `api/search-history#routes`\n- Search history entry — `api/search-history#search-history-entry`\n- Storage — `api/search-history#storage`\n- Tag contract — `api/search-history#tag-contract`\n- Writing metadata — `api/search-history#writing-metadata`\n- Snapshot History — `api/snapshots`\n- Activity — `api/snapshots#activity`\n- Configure watched fields — `api/snapshots#configure-watched-fields`\n- Create a snapshot job — `api/snapshots#create-a-snapshot-job`\n- History — `api/snapshots#history`\n- Routes — `api/snapshots#routes`\n- Snapshot body — `api/snapshots#snapshot-body`\n- Warm cache — `api/warm-cache`\n- Cache-cold behavior — `api/warm-cache#cache-cold-behavior`\n- Hint-cache warm — `api/warm-cache#hint-cache-warm`\n- Layer warm — `api/warm-cache#layer-warm`\n- Write & Stage — `api/write`\n- Patch — `api/write#patch`\n- Pipeline stage — `api/write#pipeline-stage`\n- Side effects — `api/write#side-effects`\n- Upsert and delete — `api/write#upsert-and-delete`\n## Guides\n- Dashboard — `dashboard`\n- Console — `dashboard#console`\n- Cost — `dashboard#cost`\n- Data — `dashboard#data`\n- Layout — `dashboard#layout`\n- Observe — `dashboard#observe`\n- Operational notes — `dashboard#operational-notes`\n- Read — `dashboard#read`\n- Write — `dashboard#write`\n- hev-shop — `hev-shop`\n- Reference starter kit — `hev-shop#reference-starter-kit`\n- What hev-shop is — `hev-shop#what-hev-shop-is`\n- Why it matters — `hev-shop#why-it-matters`\n- Pipelines — `pipelines`\n- Autoscaling — `pipelines#autoscaling`\n- Claim, heartbeat, stage — `pipelines#claim-heartbeat-stage`\n- CPU workers — scale on input source — `pipelines#cpu-workers--scale-on-input-source`\n- Create a pipeline — `pipelines#create-a-pipeline`\n- Document lifecycle — `pipelines#document-lifecycle`\n- Failure model — `pipelines#failure-model`\n- Gateway API — `pipelines#gateway-api`\n- Get pipeline status (KEDA polling) — `pipelines#get-pipeline-status-keda-polling`\n- Pipeline CRD — `pipelines#pipeline-crd`\n- Pipeline flow — `pipelines#pipeline-flow`\n- Prerequisites — `pipelines#prerequisites`\n- Read chunks and write vectors (GPU worker) — `pipelines#read-chunks-and-write-vectors-gpu-worker`\n- Stage a document (CPU worker) — `pipelines#stage-a-document-cpu-worker`\n- Scans — `scans`\n- Count scans — `scans#count-scans`\n- Filters — `scans#filters`\n- ID scans — `scans#id-scans`\n- Operational notes — `scans#operational-notes`\n- Sources — `scans#sources`\n- Search Knowledge Graph — `search-knowledge-graph`\n- Current graph — `search-knowledge-graph#current-graph`\n- UDFs — `udfs`\n- Author a worker — `udfs#author-a-worker`\n- Declare the function — `udfs#declare-the-function`\n- Gateway API — `udfs#gateway-api`\n- Lifecycle — `udfs#lifecycle`\n- Lifecycle routes — `udfs#lifecycle-routes`\n- Not in 0.1 — `udfs#not-in-01`\n- Scaling and placement — `udfs#scaling-and-placement`\n- Spec routes — `udfs#spec-routes`\n- Tuning knobs — `udfs#tuning-knobs`\n- Version markers — `udfs#version-markers`\n- Worker coordination routes — `udfs#worker-coordination-routes`\n- Writeback and discovery — `udfs#writeback-and-discovery`\n## Operations\n- Failure Modes — `failure-modes`\n- Read — `failure-modes#read`\n- Write — `failure-modes#write`\n- Install — `install`\n- What ships in 0.1 — `install#what-ships-in-01`\n- Helm Install — `install/helm`\n- Install — `install/helm#install`\n- Required values — `install/helm#required-values`\n- What gets installed — `install/helm#what-gets-installed`\n- Terraform — `install/terraform`\n- Cluster: recommended — `install/terraform#cluster-recommended`\n- Cost notes — `install/terraform#cost-notes`\n- Outputs — `install/terraform#outputs`\n- What it sets up — `install/terraform#what-it-sets-up`\n- Function CRD — `kubernetes/function-crd`\n- Output — `kubernetes/function-crd#output`\n- Scaling — `kubernetes/function-crd#scaling`\n- Selection — `kubernetes/function-crd#selection`\n- Worker — `kubernetes/function-crd#worker`\n- Index CRD — `kubernetes/index-crd`\n- Backend — `kubernetes/index-crd#backend`\n- Cache policy — `kubernetes/index-crd#cache-policy`\n- Snapshot policy — `kubernetes/index-crd#snapshot-policy`\n- Status — `kubernetes/index-crd#status`\n- Operator Overview — `kubernetes/operator`\n- CRDs — `kubernetes/operator#crds`\n- Relationship to the gateway — `kubernetes/operator#relationship-to-the-gateway`\n- Scheduling and node pools — `kubernetes/operator#scheduling-and-node-pools`\n- Pipeline CRD — `kubernetes/pipeline-crd`\n- Scaling — `kubernetes/pipeline-crd#scaling`\n- Source — `kubernetes/pipeline-crd#source`\n- Status — `kubernetes/pipeline-crd#status`\n- Target — `kubernetes/pipeline-crd#target`\n- Worker — `kubernetes/pipeline-crd#worker`\n- InfraRules CRD — `kubernetes/scaling-crd`\n- Compute pools — `kubernetes/scaling-crd#compute-pools`\n- Document cache rules — `kubernetes/scaling-crd#document-cache-rules`\n- InfraRules — `kubernetes/scaling-crd#infrarules`\n- Workload scaling — `kubernetes/scaling-crd#workload-scaling`\n## Overview\n- Agents — `agents`\n- 1. Install the CLI — `agents#1-install-the-cli`\n- 2. Add the skill — `agents#2-add-the-skill`\n- 3. Ask — `agents#3-ask`\n- The verbs — `agents#the-verbs`\n- Why answers stay grounded — `agents#why-answers-stay-grounded`\n- Concepts — `concepts`\n- Control loops — `concepts#control-loops`\n- Gateway enhancements — `concepts#gateway-enhancements`\n- Glossary — `concepts#glossary`\n- Kubernetes autoscaling — `concepts#kubernetes-autoscaling`\n- Observability as code — `concepts#observability-as-code`\n- Pull-through cache — `concepts#pull-through-cache`\n- Scatter/gather — `concepts#scattergather`\n- Document model — `document-model`\n- No Guarantees — `guarantees`\n- Commitments — `guarantees#commitments`\n- Introduction — `index`\n- Limits — `limits`\n- No limits — `limits#no-limits`\n- Roadmap & Changelog — `roadmap`\n- 0.1 Release (UAT) — `roadmap#01-release-uat`\n- API hardening — `roadmap#api-hardening`\n- Later — `roadmap#later`\n- Lifecycle and operability — `roadmap#lifecycle-and-operability`\n- Search — `roadmap#search`\n- Surfaces — `roadmap#surfaces`\n- Up Next — `roadmap#up-next`\n- Tradeoffs — `tradeoffs`",
  "suggestions": [
    "How do I get strongly consistent reads after a write?",
    "What's the difference between a pipeline and a UDF?",
    "What happens when the document cache is down?",
    "How do I install Layer into my cluster?",
    "Can my coding agent query these docs?"
  ],
  "nodes": [
    {
      "id": "agents",
      "kind": "section",
      "title": "Agents",
      "heading": null,
      "group": "Overview",
      "url": "/docs/agents",
      "summary": "Coding agents can query the Layer docs from the command line using the ask CLI, the same search engine behind the site overlay, getting grounded answers with citations and no scraping, MCP server, or API key. Two commands wire it up.",
      "facts": [
        {
          "kind": "code",
          "literal": "⌘K",
          "chunkId": "agents"
        },
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "agents"
        }
      ],
      "sources": [
        {
          "chunkId": "agents",
          "url": "/docs/agents",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "coding",
        "agents",
        "query",
        "layer",
        "docs",
        "command",
        "line",
        "same",
        "search",
        "engine",
        "behind",
        "site",
        "overlay",
        "getting",
        "grounded",
        "answers",
        "citations",
        "scraping",
        "server",
        "commands",
        "wire",
        "callout",
        "astro",
        "agent",
        "install",
        "file",
        "skill",
        "these",
        "queryable",
        "ships",
        "read",
        "cite",
        "directly"
      ]
    },
    {
      "id": "agents#1-install-the-cli",
      "kind": "section",
      "title": "Agents",
      "heading": "1. Install the CLI",
      "group": "Overview",
      "url": "/docs/agents#1-install-the-cli",
      "summary": "Install the self-contained ask CLI binary via go install; any agent harness that can run a shell command can then use it.",
      "facts": [
        {
          "kind": "code",
          "literal": "go install github.com/hev/ask/cmd/ask@latest",
          "chunkId": "agents#1-install-the-cli"
        }
      ],
      "sources": [
        {
          "chunkId": "agents#1-install-the-cli",
          "url": "/docs/agents#1-install-the-cli",
          "anchor": "1-install-the-cli"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "install",
        "self",
        "contained",
        "binary",
        "agent",
        "harness",
        "shell",
        "command",
        "github",
        "latest"
      ]
    },
    {
      "id": "agents#2-add-the-skill",
      "kind": "section",
      "title": "Agents",
      "heading": "2. Add the skill",
      "group": "Overview",
      "url": "/docs/agents#2-add-the-skill",
      "summary": "Add a one-file skill so an agent answers Layer questions from the docs rather than memory: for Claude Code drop a SKILL.md that points the keyless ask verbs at the public endpoint, and for other harnesses paste the same instructions into AGENTS.md. Section ids look like api/query#strong-consistent-reads and answers should cite the returned url.",
      "facts": [
        {
          "kind": "code",
          "literal": "AGENTS.md",
          "chunkId": "agents#2-add-the-skill"
        }
      ],
      "sources": [
        {
          "chunkId": "agents#2-add-the-skill",
          "url": "/docs/agents#2-add-the-skill",
          "anchor": "2-add-the-skill"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "skill",
        "file",
        "agent",
        "answers",
        "layer",
        "questions",
        "docs",
        "rather",
        "memory",
        "claude",
        "code",
        "drop",
        "points",
        "keyless",
        "verbs",
        "public",
        "endpoint",
        "other",
        "harnesses",
        "paste",
        "same",
        "instructions",
        "agents",
        "section",
        "look",
        "like",
        "query",
        "strong",
        "consistent",
        "reads",
        "should",
        "cite",
        "returned",
        "once",
        "mkdir",
        "skills",
        "hevlayer",
        "name",
        "description",
        "user"
      ]
    },
    {
      "id": "agents#3-ask",
      "kind": "section",
      "title": "Agents",
      "heading": "3. Ask",
      "group": "Overview",
      "url": "/docs/agents#3-ask",
      "summary": "Running the search verb against the endpoint returns ranked sections with titles, headings, deep-link URLs, and snippets; the agent typically then fetches the winning section and answers with its citation.",
      "facts": [
        {
          "kind": "code",
          "literal": "ask --endpoint https://hevlayer.com/api/ask search \"cache is down\"",
          "chunkId": "agents#3-ask"
        },
        {
          "kind": "code",
          "literal": "{\n  \"results\": [\n    {\n      \"title\": \"Concepts\",\n      \"heading\": \"Pull-through cache\",\n      \"url\": \"/docs/concepts#pull-through-cache\",\n      \"group\": \"Overview\",\n      \"snippet\": \"Document reads are served by a pull-through cache: the gateway checks...\"\n    }\n  ]\n}",
          "chunkId": "agents#3-ask"
        },
        {
          "kind": "code",
          "literal": "section get",
          "chunkId": "agents#3-ask"
        }
      ],
      "sources": [
        {
          "chunkId": "agents#3-ask",
          "url": "/docs/agents#3-ask",
          "anchor": "3-ask"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "running",
        "search",
        "verb",
        "against",
        "endpoint",
        "returns",
        "ranked",
        "sections",
        "titles",
        "headings",
        "deep",
        "link",
        "urls",
        "snippets",
        "agent",
        "typically",
        "fetches",
        "winning",
        "section",
        "answers",
        "citation",
        "https",
        "hevlayer",
        "cache",
        "down",
        "results",
        "title",
        "concepts",
        "heading",
        "pull",
        "through",
        "docs",
        "group",
        "overview",
        "snippet",
        "document",
        "reads",
        "served",
        "gateway",
        "checks"
      ]
    },
    {
      "id": "agents#the-verbs",
      "kind": "section",
      "title": "Agents",
      "heading": "The verbs",
      "group": "Overview",
      "url": "/docs/agents#the-verbs",
      "summary": "The CLI exposes four read verbs: an orientation/section-map overview, a ranked search with snippets and deep links, a single-section detail fetch, and a glossary lookup that resolves a product term through its aliases.",
      "facts": [
        {
          "kind": "code",
          "literal": "overview",
          "chunkId": "agents#the-verbs"
        },
        {
          "kind": "code",
          "literal": "search \"<query>\"",
          "chunkId": "agents#the-verbs"
        },
        {
          "kind": "code",
          "literal": "section get \"<id>\"",
          "chunkId": "agents#the-verbs"
        },
        {
          "kind": "code",
          "literal": "glossary get \"<term>\"",
          "chunkId": "agents#the-verbs"
        },
        {
          "kind": "code",
          "literal": "watermark",
          "chunkId": "agents#the-verbs"
        }
      ],
      "sources": [
        {
          "chunkId": "agents#the-verbs",
          "url": "/docs/agents#the-verbs",
          "anchor": "the-verbs"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "verbs",
        "exposes",
        "four",
        "read",
        "orientation",
        "section",
        "overview",
        "ranked",
        "search",
        "snippets",
        "deep",
        "links",
        "single",
        "detail",
        "fetch",
        "glossary",
        "lookup",
        "resolves",
        "product",
        "term",
        "through",
        "aliases",
        "query",
        "watermark",
        "verb",
        "returns",
        "context",
        "plus",
        "full",
        "stable",
        "sections",
        "summary",
        "exact",
        "identifiers",
        "source",
        "resolved"
      ]
    },
    {
      "id": "agents#why-answers-stay-grounded",
      "kind": "section",
      "title": "Agents",
      "heading": "Why answers stay grounded",
      "group": "Overview",
      "url": "/docs/agents#why-answers-stay-grounded",
      "summary": "Search runs over a committed, reviewable digest of the docs whose anchors are CI-verified against rendered pages so cited deep links always resolve, and the digest is rebuilt when docs change. Every verb is a keyless read; the docs are also available as plain-text llms files, though the CLI is the cheaper, better path for agents that can run commands.",
      "facts": [
        {
          "kind": "value",
          "literal": "llms.txt",
          "chunkId": "agents#why-answers-stay-grounded"
        },
        {
          "kind": "value",
          "literal": "llms-full.txt",
          "chunkId": "agents#why-answers-stay-grounded"
        }
      ],
      "sources": [
        {
          "chunkId": "agents#why-answers-stay-grounded",
          "url": "/docs/agents#why-answers-stay-grounded",
          "anchor": "why-answers-stay-grounded"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "answers",
        "stay",
        "grounded",
        "search",
        "runs",
        "committed",
        "reviewable",
        "digest",
        "docs",
        "whose",
        "anchors",
        "verified",
        "against",
        "rendered",
        "pages",
        "cited",
        "deep",
        "links",
        "always",
        "resolve",
        "rebuilt",
        "change",
        "every",
        "verb",
        "keyless",
        "read",
        "also",
        "available",
        "plain",
        "text",
        "llms",
        "files",
        "though",
        "cheaper",
        "better",
        "path",
        "agents",
        "commands",
        "full",
        "these"
      ]
    },
    {
      "id": "api/introduction",
      "kind": "section",
      "title": "Introduction",
      "heading": null,
      "group": "API",
      "url": "/docs/api/introduction",
      "summary": "Layer matches the Turbopuffer wire contract so existing clients keep working when pointed at the gateway, and the docs describe only what Layer adds on top of each route, linking out to upstream for the underlying request/response shapes.",
      "facts": [
        {
          "kind": "value",
          "literal": "Upstream.astro",
          "chunkId": "api/introduction"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction",
          "url": "/docs/api/introduction",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "layer",
        "matches",
        "turbopuffer",
        "wire",
        "contract",
        "existing",
        "clients",
        "keep",
        "working",
        "pointed",
        "gateway",
        "docs",
        "describe",
        "only",
        "adds",
        "route",
        "linking",
        "upstream",
        "underlying",
        "request",
        "response",
        "shapes",
        "astro",
        "point",
        "client",
        "equivalent",
        "site",
        "documents",
        "behavior",
        "itself",
        "follow",
        "link",
        "page",
        "shape"
      ]
    },
    {
      "id": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm",
      "kind": "section",
      "title": "Introduction",
      "heading": "Cache warm hint — GET /v1/namespaces/{ns}/hint_cache_warm",
      "group": "API",
      "url": "/docs/api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm",
      "summary": "The cache warm hint route forwards the hint upstream and then runs Layer-side warm steps: a warm job to backfill the NVMe cache from origin and a mirror of the latest snapshot body into NVMe, with each step independently toggleable per request.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v1/namespaces/{ns}/hint_cache_warm",
          "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm",
          "url": "/docs/api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm",
          "anchor": "cache-warm-hint--get-v1namespacesnshint_cache_warm"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "cache",
        "warm",
        "hint",
        "namespaces",
        "route",
        "forwards",
        "upstream",
        "runs",
        "layer",
        "side",
        "steps",
        "backfill",
        "nvme",
        "origin",
        "mirror",
        "latest",
        "snapshot",
        "body",
        "step",
        "independently",
        "toggleable",
        "request",
        "turbopuffer",
        "hintcachewarm",
        "contract",
        "plus",
        "page"
      ]
    },
    {
      "id": "api/introduction#client-fall-through",
      "kind": "section",
      "title": "Introduction",
      "heading": "Client fall-through",
      "group": "API",
      "url": "/docs/api/introduction#client-fall-through",
      "summary": "The Python SDK can fall through to Turbopuffer directly when the gateway is unreachable, but only for calls satisfiable without Layer state such as simple vector queries and raw Turbopuffer-compatible methods; Layer-only workflows like fetches, warm jobs, pipelines, UDFs, and search-by-id still fail fast because they depend on gateway-owned state. The fallback emits a warning, can be disabled, and is reported in the perf object.",
      "facts": [
        {
          "kind": "code",
          "literal": "write_namespace",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "query_turbopuffer_namespace",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "LayerPerf.fallback",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "turbopuffer_direct",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "with_perf=True",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "nearest_to_id",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "fallback_to_turbopuffer=False",
          "chunkId": "api/introduction#client-fall-through"
        },
        {
          "kind": "code",
          "literal": "AsyncHevlayer",
          "chunkId": "api/introduction#client-fall-through"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#client-fall-through",
          "url": "/docs/api/introduction#client-fall-through",
          "anchor": "client-fall-through"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "client",
        "fall",
        "through",
        "python",
        "turbopuffer",
        "directly",
        "gateway",
        "unreachable",
        "only",
        "calls",
        "satisfiable",
        "without",
        "layer",
        "state",
        "such",
        "simple",
        "vector",
        "queries",
        "compatible",
        "methods",
        "workflows",
        "like",
        "fetches",
        "warm",
        "jobs",
        "pipelines",
        "udfs",
        "search",
        "still",
        "fail",
        "fast",
        "because",
        "depend",
        "owned",
        "fallback",
        "emits",
        "warning",
        "disabled",
        "reported",
        "perf"
      ]
    },
    {
      "id": "api/introduction#compatibility-posture",
      "kind": "section",
      "title": "Introduction",
      "heading": "Compatibility posture",
      "group": "API",
      "url": "/docs/api/introduction#compatibility-posture",
      "summary": "Layer aims to be a drop-in for existing Turbopuffer clients; routes the upstream does not implement are namespaced separately so they never shadow upstream behavior, and a request to a route Layer does not proxy returns a 404 rather than silently re-routing.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/",
          "chunkId": "api/introduction#compatibility-posture"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#compatibility-posture",
          "url": "/docs/api/introduction#compatibility-posture",
          "anchor": "compatibility-posture"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "compatibility",
        "posture",
        "layer",
        "aims",
        "drop",
        "existing",
        "turbopuffer",
        "clients",
        "routes",
        "upstream",
        "does",
        "implement",
        "namespaced",
        "separately",
        "never",
        "shadow",
        "behavior",
        "request",
        "route",
        "proxy",
        "returns",
        "rather",
        "silently",
        "routing",
        "under",
        "client",
        "sends",
        "doesn",
        "gateway",
        "might",
        "handle",
        "differently"
      ]
    },
    {
      "id": "api/introduction#cross-cutting-conventions",
      "kind": "section",
      "title": "Introduction",
      "heading": "Cross-cutting conventions",
      "group": "API",
      "url": "/docs/api/introduction#cross-cutting-conventions",
      "summary": "Conventions apply to every proxied route: every write is server-stamped with an epoch-ms watermark attribute, the reserved attribute prefix is read-only to callers, Turbopuffer write/query failures are hard 5xx while cache failures are soft and never block, a cache header distinguishes hit/miss/miss-on-error, and reads through the watermark path report their freshness cut.",
      "facts": [
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_*",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "x-layer-cache",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "hit",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "miss",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "miss-on-error",
          "chunkId": "api/introduction#cross-cutting-conventions"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/introduction#cross-cutting-conventions"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#cross-cutting-conventions",
          "url": "/docs/api/introduction#cross-cutting-conventions",
          "anchor": "cross-cutting-conventions"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "cross",
        "cutting",
        "conventions",
        "apply",
        "every",
        "proxied",
        "route",
        "write",
        "server",
        "stamped",
        "epoch",
        "watermark",
        "attribute",
        "reserved",
        "prefix",
        "read",
        "only",
        "callers",
        "turbopuffer",
        "query",
        "failures",
        "hard",
        "while",
        "cache",
        "soft",
        "never",
        "block",
        "header",
        "distinguishes",
        "miss",
        "error",
        "reads",
        "through",
        "path",
        "report",
        "their",
        "freshness",
        "hevlayer",
        "upserted",
        "layer"
      ]
    },
    {
      "id": "api/introduction#enhancements-to-upstream-routes",
      "kind": "section",
      "title": "Introduction",
      "heading": "Enhancements to upstream routes",
      "group": "API",
      "url": "/docs/api/introduction#enhancements-to-upstream-routes",
      "summary": "Introduces the section listing the upstream-compatible routes whose bodies describe only the Layer overlay on top of each.",
      "facts": [],
      "sources": [
        {
          "chunkId": "api/introduction#enhancements-to-upstream-routes",
          "url": "/docs/api/introduction#enhancements-to-upstream-routes",
          "anchor": "enhancements-to-upstream-routes"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "enhancements",
        "upstream",
        "routes",
        "introduces",
        "section",
        "listing",
        "compatible",
        "whose",
        "bodies",
        "describe",
        "only",
        "layer",
        "overlay",
        "below",
        "wire",
        "turbopuffer",
        "body",
        "describes",
        "overlays"
      ]
    },
    {
      "id": "api/introduction#install",
      "kind": "section",
      "title": "Introduction",
      "heading": "Install",
      "group": "API",
      "url": "/docs/api/introduction#install",
      "summary": "The Python SDK is generated from the gateway OpenAPI spec, ships a typed async client, requires a recent Python, and reads gateway URL/key plus optional direct-fallback Turbopuffer connection info from environment variables. Other languages are generated on demand.",
      "facts": [
        {
          "kind": "code",
          "literal": "pip install hevlayer",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "apps/layer-gateway/openapi.yaml",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "AsyncHevlayer",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "LAYER_GATEWAY_URL",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "LAYER_GATEWAY_API_KEY",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "TURBOPUFFER_API_KEY",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "TURBOPUFFER_API_URL",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "code",
          "literal": "https://aws-us-east-1.turbopuffer.com",
          "chunkId": "api/introduction#install"
        },
        {
          "kind": "value",
          "literal": "3.11",
          "chunkId": "api/introduction#install"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#install",
          "url": "/docs/api/introduction#install",
          "anchor": "install"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "install",
        "python",
        "generated",
        "gateway",
        "openapi",
        "spec",
        "ships",
        "typed",
        "async",
        "client",
        "requires",
        "recent",
        "reads",
        "plus",
        "optional",
        "direct",
        "fallback",
        "turbopuffer",
        "connection",
        "info",
        "environment",
        "variables",
        "other",
        "languages",
        "demand",
        "hevlayer",
        "apps",
        "layer",
        "yaml",
        "asynchevlayer",
        "https",
        "east",
        "variable",
        "purpose",
        "layergatewayurl",
        "base",
        "layergatewayapikey",
        "sent",
        "every",
        "request"
      ]
    },
    {
      "id": "api/introduction#metadata--get-v2namespacesnsmetadata",
      "kind": "section",
      "title": "Introduction",
      "heading": "Metadata — GET /v2/namespaces/{ns}/metadata",
      "group": "API",
      "url": "/docs/api/introduction#metadata--get-v2namespacesnsmetadata",
      "summary": "The namespace metadata route proxies the upstream schema, row count, index status, and timestamps verbatim, then enriches the response with a layer block carrying the freshness watermark and stability flag.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/metadata",
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata"
        },
        {
          "kind": "code",
          "literal": "layer",
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata"
        },
        {
          "kind": "code",
          "literal": "is_stable",
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata",
          "url": "/docs/api/introduction#metadata--get-v2namespacesnsmetadata",
          "anchor": "metadata--get-v2namespacesnsmetadata"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "metadata",
        "namespaces",
        "namespace",
        "route",
        "proxies",
        "upstream",
        "schema",
        "count",
        "index",
        "status",
        "timestamps",
        "verbatim",
        "enriches",
        "response",
        "layer",
        "block",
        "carrying",
        "freshness",
        "watermark",
        "stability",
        "flag",
        "stable",
        "turbopuffer",
        "contract",
        "proxied",
        "enriched",
        "containing",
        "stableasof",
        "isstable",
        "page"
      ]
    },
    {
      "id": "api/introduction#query--post-v2namespacesnsquery",
      "kind": "section",
      "title": "Introduction",
      "heading": "Query — POST /v2/namespaces/{ns}/query",
      "group": "API",
      "url": "/docs/api/introduction#query--post-v2namespacesnsquery",
      "summary": "The query route is upstream-compatible and adds strong-consistent reads via an injected watermark predicate while the index is updating, a one-shot retry with the filter forced on for queries racing a write storm, and a freshness timestamp echoed on every response.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/query",
          "chunkId": "api/introduction#query--post-v2namespacesnsquery"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at <= watermark",
          "chunkId": "api/introduction#query--post-v2namespacesnsquery"
        },
        {
          "kind": "code",
          "literal": "updating",
          "chunkId": "api/introduction#query--post-v2namespacesnsquery"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/introduction#query--post-v2namespacesnsquery"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/introduction#query--post-v2namespacesnsquery"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#query--post-v2namespacesnsquery",
          "url": "/docs/api/introduction#query--post-v2namespacesnsquery",
          "anchor": "query--post-v2namespacesnsquery"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "query",
        "post",
        "namespaces",
        "route",
        "upstream",
        "compatible",
        "adds",
        "strong",
        "consistent",
        "reads",
        "injected",
        "watermark",
        "predicate",
        "while",
        "index",
        "updating",
        "shot",
        "retry",
        "filter",
        "forced",
        "queries",
        "racing",
        "write",
        "storm",
        "freshness",
        "timestamp",
        "echoed",
        "every",
        "response",
        "hevlayer",
        "upserted",
        "stable",
        "turbopuffer",
        "contract",
        "vector",
        "request",
        "shape",
        "ranking",
        "filters",
        "attribute"
      ]
    },
    {
      "id": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns",
      "kind": "section",
      "title": "Introduction",
      "heading": "Write — POST /v2/namespaces/{ns} and PATCH /v2/namespaces/{ns}",
      "group": "API",
      "url": "/docs/api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns",
      "summary": "The write and patch routes add a best-effort NVMe cache mirror before the upstream write, a server-stamped watermark attribute on every upsert and patch that powers query consistency, and rejection of writes to the reserved attribute prefix.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        },
        {
          "kind": "code",
          "literal": "PATCH /v2/namespaces/{ns}",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        },
        {
          "kind": "code",
          "literal": "patch_rows",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_*",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns"
        }
      ],
      "sources": [
        {
          "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns",
          "url": "/docs/api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns",
          "anchor": "write--post-v2namespacesns-and-patch-v2namespacesns"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "write",
        "post",
        "namespaces",
        "patch",
        "routes",
        "best",
        "effort",
        "nvme",
        "cache",
        "mirror",
        "before",
        "upstream",
        "server",
        "stamped",
        "watermark",
        "attribute",
        "every",
        "upsert",
        "powers",
        "query",
        "consistency",
        "rejection",
        "writes",
        "reserved",
        "prefix",
        "rows",
        "hevlayer",
        "upserted",
        "turbopuffer",
        "contract",
        "delete",
        "patchrows",
        "hevlayerupsertedat",
        "path",
        "attributes",
        "rejected",
        "page"
      ]
    },
    {
      "id": "api/metrics",
      "kind": "section",
      "title": "Metrics API",
      "heading": null,
      "group": "API",
      "url": "/docs/api/metrics",
      "summary": "The gateway exposes a Prometheus-shaped metrics surface plus passthrough routes to a bundled VictoriaMetrics so callers can run PromQL without a separate scraper, and a self-describing catalog of every emitted metric backs both the dashboard's observe tab and external automation.",
      "facts": [
        {
          "kind": "code",
          "literal": "vmsingle",
          "chunkId": "api/metrics"
        }
      ],
      "sources": [
        {
          "chunkId": "api/metrics",
          "url": "/docs/api/metrics",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "gateway",
        "exposes",
        "prometheus",
        "shaped",
        "metrics",
        "surface",
        "plus",
        "passthrough",
        "routes",
        "bundled",
        "victoriametrics",
        "callers",
        "promql",
        "without",
        "separate",
        "scraper",
        "self",
        "describing",
        "catalog",
        "every",
        "emitted",
        "metric",
        "backs",
        "both",
        "dashboard",
        "observe",
        "external",
        "automation",
        "vmsingle",
        "exposition",
        "endpoint",
        "instance",
        "emits",
        "definitions",
        "label",
        "conventions",
        "example",
        "live",
        "below"
      ]
    },
    {
      "id": "api/metrics#health",
      "kind": "section",
      "title": "Metrics API",
      "heading": "Health",
      "group": "API",
      "url": "/docs/api/metrics#health",
      "summary": "The health route always returns 200 while the process is up and reports version, cache backing connection state, and per-namespace cache state, which the dashboard reads for degradation signals like a cold or disconnected cache after a restart.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /health",
          "chunkId": "api/metrics#health"
        },
        {
          "kind": "code",
          "literal": "{\n  \"status\": \"ok\",\n  \"version\": \"0.1.0\",\n  \"aerospike\": {\n    \"connected\": true,\n    \"generation\": 3\n  },\n  \"cache_state\": [\n    {\"namespace\": \"products\", \"state\": \"warm\", \"warmed_through\": 1747300000123, \"warm_inflight\": false}\n  ]\n}",
          "chunkId": "api/metrics#health"
        },
        {
          "kind": "code",
          "literal": "200",
          "chunkId": "api/metrics#health"
        },
        {
          "kind": "code",
          "literal": "aerospike.connected",
          "chunkId": "api/metrics#health"
        },
        {
          "kind": "code",
          "literal": "cache_state[].state",
          "chunkId": "api/metrics#health"
        }
      ],
      "sources": [
        {
          "chunkId": "api/metrics#health",
          "url": "/docs/api/metrics#health",
          "anchor": "health"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "health",
        "route",
        "always",
        "returns",
        "while",
        "process",
        "reports",
        "version",
        "cache",
        "backing",
        "connection",
        "state",
        "namespace",
        "dashboard",
        "reads",
        "degradation",
        "signals",
        "like",
        "cold",
        "disconnected",
        "after",
        "restart",
        "status",
        "aerospike",
        "connected",
        "true",
        "generation",
        "products",
        "warm",
        "warmed",
        "through",
        "1747300000123",
        "inflight",
        "false",
        "cachestate",
        "warmedthrough",
        "warminflight",
        "responds",
        "glance",
        "cards"
      ]
    },
    {
      "id": "api/metrics#metrics-catalog",
      "kind": "section",
      "title": "Metrics API",
      "heading": "Metrics catalog",
      "group": "API",
      "url": "/docs/api/metrics#metrics-catalog",
      "summary": "The metrics catalog is an operator-facing manifest of every emitted metric, each entry carrying name, kind, family, labels, description, example PromQL, and any alert shape it backs, with a version that bumps on incompatible shape changes. The dashboard groups entries by family, and the same content is exportable from the repo.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/metrics/catalog",
          "chunkId": "api/metrics#metrics-catalog"
        },
        {
          "kind": "code",
          "literal": "version",
          "chunkId": "api/metrics#metrics-catalog"
        },
        {
          "kind": "code",
          "literal": "cargo run -p metrics-catalog --bin export",
          "chunkId": "api/metrics#metrics-catalog"
        }
      ],
      "sources": [
        {
          "chunkId": "api/metrics#metrics-catalog",
          "url": "/docs/api/metrics#metrics-catalog",
          "anchor": "metrics-catalog"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "metrics",
        "catalog",
        "operator",
        "facing",
        "manifest",
        "every",
        "emitted",
        "metric",
        "entry",
        "carrying",
        "name",
        "kind",
        "family",
        "labels",
        "description",
        "example",
        "promql",
        "alert",
        "shape",
        "backs",
        "version",
        "bumps",
        "incompatible",
        "changes",
        "dashboard",
        "groups",
        "entries",
        "same",
        "content",
        "exportable",
        "repo",
        "cargo",
        "export",
        "gateway",
        "emits",
        "carries",
        "histogram",
        "counter",
        "gauge",
        "applicable"
      ]
    },
    {
      "id": "api/metrics#promql-passthrough",
      "kind": "section",
      "title": "Metrics API",
      "heading": "PromQL passthrough",
      "group": "API",
      "url": "/docs/api/metrics#promql-passthrough",
      "summary": "The metrics query routes are thin, non-rewriting passthroughs to VictoriaMetrics whose response bodies match Prometheus's HTTP API one-for-one, with short-form aliases for ergonomic terminal use; auth happens at the gateway edge and the upstream metrics instance is never customer-reachable.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl -sG \"$LAYER_GATEWAY_URL/v2/metrics/query\" \\\n  --data-urlencode 'query=sum(layer_pipeline_stage_count{stage=\"pending\"})'",
          "chunkId": "api/metrics#promql-passthrough"
        },
        {
          "kind": "code",
          "literal": "/v2/metrics/api/v1/query",
          "chunkId": "api/metrics#promql-passthrough"
        },
        {
          "kind": "code",
          "literal": "query_range",
          "chunkId": "api/metrics#promql-passthrough"
        },
        {
          "kind": "code",
          "literal": "/v2/metrics/query",
          "chunkId": "api/metrics#promql-passthrough"
        }
      ],
      "sources": [
        {
          "chunkId": "api/metrics#promql-passthrough",
          "url": "/docs/api/metrics#promql-passthrough",
          "anchor": "promql-passthrough"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "promql",
        "passthrough",
        "metrics",
        "query",
        "routes",
        "thin",
        "rewriting",
        "passthroughs",
        "victoriametrics",
        "whose",
        "response",
        "bodies",
        "match",
        "prometheus",
        "http",
        "short",
        "form",
        "aliases",
        "ergonomic",
        "terminal",
        "auth",
        "happens",
        "gateway",
        "edge",
        "upstream",
        "instance",
        "never",
        "customer",
        "reachable",
        "curl",
        "layer",
        "data",
        "urlencode",
        "pipeline",
        "stage",
        "count",
        "pending",
        "range",
        "queryrange",
        "shape"
      ]
    },
    {
      "id": "api/metrics#routes",
      "kind": "section",
      "title": "Metrics API",
      "heading": "Routes",
      "group": "API",
      "url": "/docs/api/metrics#routes",
      "summary": "Lists the metrics routes: Prometheus exposition, health, instant and range PromQL proxies (full and short-form), the catalog listing, and single catalog-entry fetch.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /metrics",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET /health",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET\\|POST /v2/metrics/api/v1/query",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET\\|POST /v2/metrics/query",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET\\|POST /v2/metrics/api/v1/query_range",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET\\|POST /v2/metrics/query_range",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/metrics/catalog",
          "chunkId": "api/metrics#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/metrics/catalog/{name}",
          "chunkId": "api/metrics#routes"
        }
      ],
      "sources": [
        {
          "chunkId": "api/metrics#routes",
          "url": "/docs/api/metrics#routes",
          "anchor": "routes"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "routes",
        "lists",
        "metrics",
        "prometheus",
        "exposition",
        "health",
        "instant",
        "range",
        "promql",
        "proxies",
        "full",
        "short",
        "form",
        "catalog",
        "listing",
        "single",
        "entry",
        "fetch",
        "post",
        "query",
        "name",
        "route",
        "behavior",
        "gateway",
        "liveness",
        "nvme",
        "cache",
        "connection",
        "state",
        "namespace",
        "proxy",
        "queryrange",
        "list",
        "every",
        "metric",
        "emits",
        "including",
        "labels",
        "example"
      ]
    },
    {
      "id": "api/namespace-metadata",
      "kind": "section",
      "title": "Namespace metadata",
      "heading": null,
      "group": "API",
      "url": "/docs/api/namespace-metadata",
      "summary": "Namespace metadata is proxied verbatim from the upstream endpoint for schema, row counts, index status, and timestamps, with Layer adding a single freshness sub-object on top.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/namespaces/{ns}/metadata",
          "chunkId": "api/namespace-metadata"
        },
        {
          "kind": "value",
          "literal": "Upstream.astro",
          "chunkId": "api/namespace-metadata"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/namespace-metadata"
        }
      ],
      "sources": [
        {
          "chunkId": "api/namespace-metadata",
          "url": "/docs/api/namespace-metadata",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "namespace",
        "metadata",
        "proxied",
        "verbatim",
        "upstream",
        "endpoint",
        "schema",
        "counts",
        "index",
        "status",
        "timestamps",
        "layer",
        "adding",
        "single",
        "freshness",
        "object",
        "namespaces",
        "astro",
        "turbopuffer",
        "read",
        "enriched",
        "signals",
        "payload",
        "follow",
        "contract",
        "adds"
      ]
    },
    {
      "id": "api/namespace-metadata#list-namespaces",
      "kind": "section",
      "title": "Namespace metadata",
      "heading": "List namespaces",
      "group": "API",
      "url": "/docs/api/namespace-metadata#list-namespaces",
      "summary": "Listing namespaces is a Layer-only augmented, paged listing that enriches each row with freshness and cache signals and backs the dashboard inventory; a per-row metadata failure degrades to an error marker rather than dropping the namespace, and responses come from a short-TTL cache so dashboard polling does not fan out a call per namespace.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces?prefix=prod&page_size=100",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "{\n  \"namespaces\": [\n    {\n      \"name\": \"products\",\n      \"row_count\": 12500,\n      \"size_bytes\": 48800000,\n      \"stable_as_of_ms\": 1715600400000,\n      \"is_stable\": true,\n      \"cache_state\": {\"state\": \"warm\", \"warm_inflight\": false},\n      \"last_write_ms\": 1715600399000,\n      \"shadow\": false,\n      \"labels\": {}\n    }\n  ],\n  \"next_cursor\": \"...\"\n}",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "prefix",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "cursor",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "next_cursor",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "page_size",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "metadata_error",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "NAMESPACE_LIST_CACHE_TTL_MS",
          "chunkId": "api/namespace-metadata#list-namespaces"
        },
        {
          "kind": "code",
          "literal": "10000",
          "chunkId": "api/namespace-metadata#list-namespaces"
        }
      ],
      "sources": [
        {
          "chunkId": "api/namespace-metadata#list-namespaces",
          "url": "/docs/api/namespace-metadata#list-namespaces",
          "anchor": "list-namespaces"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "list",
        "namespaces",
        "listing",
        "layer",
        "only",
        "augmented",
        "paged",
        "enriches",
        "freshness",
        "cache",
        "signals",
        "backs",
        "dashboard",
        "inventory",
        "metadata",
        "failure",
        "degrades",
        "error",
        "marker",
        "rather",
        "dropping",
        "namespace",
        "responses",
        "come",
        "short",
        "polling",
        "does",
        "call",
        "prefix",
        "prod",
        "page",
        "size",
        "name",
        "products",
        "count",
        "12500",
        "bytes",
        "48800000",
        "stable",
        "1715600400000"
      ]
    },
    {
      "id": "api/namespace-metadata#request",
      "kind": "section",
      "title": "Namespace metadata",
      "heading": "Request",
      "group": "API",
      "url": "/docs/api/namespace-metadata#request",
      "summary": "A metadata request returns the upstream payload (schema, approximate counts, timestamps, index status) plus a Layer enhancement block carrying the freshness watermark and stability flag.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/metadata",
          "chunkId": "api/namespace-metadata#request"
        },
        {
          "kind": "code",
          "literal": "{\n  // Proxied from Turbopuffer verbatim\n  \"schema\": { },\n  \"approx_row_count\": 12500,\n  \"approx_logical_bytes\": 48800000,\n  \"created_at\": \"2026-03-15T10:30:45Z\",\n  \"updated_at\": \"2026-05-12T18:49:00Z\",\n  \"last_write_at\": \"2026-05-12T18:48:30Z\",\n  \"index\": { \"status\": \"up-to-date\" },\n\n  // Layer enhancement\n  \"layer\": {\n    \"stable_as_of\": 1715600400000,\n    \"is_stable\": true\n  }\n}",
          "chunkId": "api/namespace-metadata#request"
        }
      ],
      "sources": [
        {
          "chunkId": "api/namespace-metadata#request",
          "url": "/docs/api/namespace-metadata#request",
          "anchor": "request"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "request",
        "metadata",
        "returns",
        "upstream",
        "payload",
        "schema",
        "approximate",
        "counts",
        "timestamps",
        "index",
        "status",
        "plus",
        "layer",
        "enhancement",
        "block",
        "carrying",
        "freshness",
        "watermark",
        "stability",
        "flag",
        "namespaces",
        "products",
        "proxied",
        "turbopuffer",
        "verbatim",
        "approx",
        "count",
        "12500",
        "logical",
        "bytes",
        "48800000",
        "created",
        "2026",
        "15t10",
        "updated",
        "12t18",
        "last",
        "write",
        "date",
        "stable"
      ]
    },
    {
      "id": "api/namespace-metadata#the-layer-block",
      "kind": "section",
      "title": "Namespace metadata",
      "heading": "The layer block",
      "group": "API",
      "url": "/docs/api/namespace-metadata#the-layer-block",
      "summary": "The layer block exposes the epoch-ms watermark from the most recent stable poll and a boolean for whether that poll observed the index up-to-date; the boolean is the current signal driving the per-query filter-skip decision while the watermark is the historical cut a filtered query would apply. Both are null/false on cold start.",
      "facts": [
        {
          "kind": "code",
          "literal": "layer",
          "chunkId": "api/namespace-metadata#the-layer-block"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/namespace-metadata#the-layer-block"
        },
        {
          "kind": "code",
          "literal": "is_stable",
          "chunkId": "api/namespace-metadata#the-layer-block"
        },
        {
          "kind": "code",
          "literal": "index.status == \"up-to-date\"",
          "chunkId": "api/namespace-metadata#the-layer-block"
        }
      ],
      "sources": [
        {
          "chunkId": "api/namespace-metadata#the-layer-block",
          "url": "/docs/api/namespace-metadata#the-layer-block",
          "anchor": "the-layer-block"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "layer",
        "block",
        "exposes",
        "epoch",
        "watermark",
        "most",
        "recent",
        "stable",
        "poll",
        "boolean",
        "whether",
        "observed",
        "index",
        "date",
        "current",
        "signal",
        "driving",
        "query",
        "filter",
        "skip",
        "decision",
        "while",
        "historical",
        "filtered",
        "would",
        "apply",
        "both",
        "null",
        "false",
        "cold",
        "start",
        "status",
        "field",
        "meaning",
        "stableasof",
        "before",
        "watcher",
        "namespace",
        "isstable",
        "true"
      ]
    },
    {
      "id": "api/query",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": null,
      "group": "API",
      "url": "/docs/api/query",
      "summary": "Query is wire-compatible with the upstream query endpoint for vector and full-text search, with the documented shape covering what Layer adds on top, alongside a pull-through document fetch by id.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/query",
          "chunkId": "api/query"
        },
        {
          "kind": "value",
          "literal": "Upstream.astro",
          "chunkId": "api/query"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/query"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query",
          "url": "/docs/api/query",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "query",
        "wire",
        "compatible",
        "upstream",
        "endpoint",
        "vector",
        "full",
        "text",
        "search",
        "documented",
        "shape",
        "covering",
        "layer",
        "adds",
        "alongside",
        "pull",
        "through",
        "document",
        "fetch",
        "post",
        "namespaces",
        "astro",
        "turbopuffer",
        "similarity",
        "strong",
        "consistent",
        "watermark",
        "handling",
        "plus",
        "request",
        "schema",
        "filters",
        "ranking",
        "attribute",
        "selection",
        "below"
      ]
    },
    {
      "id": "api/query#batch-fetch",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Batch fetch",
      "group": "API",
      "url": "/docs/api/query#batch-fetch",
      "summary": "Batch fetch takes a list of ids and returns found documents and missing ids inline rather than a partial 404, preserving request order for the found documents and collecting unfound ids separately.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/documents\nContent-Type: application/json\n\n{\n  \"ids\": [\"asin-1\", \"asin-2\", \"asin-3\"],\n  \"include_attributes\": [\"title\"]\n}",
          "chunkId": "api/query#batch-fetch"
        },
        {
          "kind": "code",
          "literal": "{\n  \"documents\": [\n    {\"id\": \"asin-1\", \"attributes\": {\"title\": \"...\"}},\n    {\"id\": \"asin-3\", \"attributes\": {\"title\": \"...\"}}\n  ],\n  \"missing\": [\"asin-2\"]\n}",
          "chunkId": "api/query#batch-fetch"
        },
        {
          "kind": "code",
          "literal": "documents",
          "chunkId": "api/query#batch-fetch"
        },
        {
          "kind": "code",
          "literal": "missing",
          "chunkId": "api/query#batch-fetch"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#batch-fetch",
          "url": "/docs/api/query#batch-fetch",
          "anchor": "batch-fetch"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "batch",
        "fetch",
        "takes",
        "list",
        "returns",
        "found",
        "documents",
        "missing",
        "inline",
        "rather",
        "partial",
        "preserving",
        "request",
        "order",
        "collecting",
        "unfound",
        "separately",
        "post",
        "namespaces",
        "products",
        "content",
        "type",
        "application",
        "json",
        "asin",
        "include",
        "attributes",
        "title",
        "includeattributes",
        "instead",
        "preserves",
        "gateway",
        "could",
        "find",
        "anywhere",
        "land"
      ]
    },
    {
      "id": "api/query#behavior-matrix",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Behavior matrix",
      "group": "API",
      "url": "/docs/api/query#behavior-matrix",
      "summary": "A matrix maps single- and batch-fetch outcomes by cache state: a hit serves cache, a miss with the document present upstream serves upstream and backfills, a miss with no upstream document is a 404 (single) or inline-missing (batch), and an unavailable cache serves upstream with a miss-on-error marker.",
      "facts": [
        {
          "kind": "code",
          "literal": "missing",
          "chunkId": "api/query#behavior-matrix"
        },
        {
          "kind": "code",
          "literal": "miss-on-error",
          "chunkId": "api/query#behavior-matrix"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#behavior-matrix",
          "url": "/docs/api/query#behavior-matrix",
          "anchor": "behavior-matrix"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "behavior",
        "matrix",
        "maps",
        "single",
        "batch",
        "fetch",
        "outcomes",
        "cache",
        "state",
        "serves",
        "miss",
        "document",
        "present",
        "upstream",
        "backfills",
        "inline",
        "missing",
        "unavailable",
        "error",
        "marker",
        "backfill",
        "absent"
      ]
    },
    {
      "id": "api/query#explain-query",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Explain query",
      "group": "API",
      "url": "/docs/api/query#explain-query",
      "summary": "Explain query is proxied verbatim with no Layer overlay and no watermark filter, useful for inspecting upstream query planning per the upstream docs.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/explain_query",
          "chunkId": "api/query#explain-query"
        },
        {
          "kind": "code",
          "literal": "explain_query",
          "chunkId": "api/query#explain-query"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/query#explain-query"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#explain-query",
          "url": "/docs/api/query#explain-query",
          "anchor": "explain-query"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "explain",
        "query",
        "proxied",
        "verbatim",
        "layer",
        "overlay",
        "watermark",
        "filter",
        "useful",
        "inspecting",
        "upstream",
        "planning",
        "docs",
        "post",
        "namespaces",
        "products",
        "turbopuffer",
        "explainquery",
        "adds",
        "nothing",
        "applies",
        "inspect",
        "request",
        "response",
        "shape"
      ]
    },
    {
      "id": "api/query#fetch",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Fetch",
      "group": "API",
      "url": "/docs/api/query#fetch",
      "summary": "Fetch is a Layer-only surface with no upstream equivalent: the NVMe cache is checked first, and on miss or error the gateway falls through to Turbopuffer and backfills the cache best-effort.",
      "facts": [],
      "sources": [
        {
          "chunkId": "api/query#fetch",
          "url": "/docs/api/query#fetch",
          "anchor": "fetch"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "fetch",
        "layer",
        "only",
        "surface",
        "upstream",
        "equivalent",
        "nvme",
        "cache",
        "checked",
        "first",
        "miss",
        "error",
        "gateway",
        "falls",
        "through",
        "turbopuffer",
        "backfills",
        "best",
        "effort",
        "there"
      ]
    },
    {
      "id": "api/query#filter-shape",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Filter shape",
      "group": "API",
      "url": "/docs/api/query#filter-shape",
      "summary": "Filters follow the upstream array syntax with leaf, conjunction, and disjunction forms, and Layer automatically combines the caller's filter with the watermark predicate so callers never see the reserved upsert-time attribute in their request or response.",
      "facts": [
        {
          "kind": "code",
          "literal": "[\"category\", \"Eq\", \"Electronics\"]                # leaf\n[\"And\", [[\"category\", \"Eq\", \"Electronics\"],\n         [\"price\", \"Lte\", 200]]]                 # conjunction\n[\"Or\",  [...]]                                   # disjunction",
          "chunkId": "api/query#filter-shape"
        },
        {
          "kind": "code",
          "literal": "And",
          "chunkId": "api/query#filter-shape"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "api/query#filter-shape"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#filter-shape",
          "url": "/docs/api/query#filter-shape",
          "anchor": "filter-shape"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "filter",
        "shape",
        "filters",
        "follow",
        "upstream",
        "array",
        "syntax",
        "leaf",
        "conjunction",
        "disjunction",
        "forms",
        "layer",
        "automatically",
        "combines",
        "caller",
        "watermark",
        "predicate",
        "callers",
        "never",
        "reserved",
        "upsert",
        "time",
        "attribute",
        "their",
        "request",
        "response",
        "category",
        "electronics",
        "price",
        "hevlayer",
        "upserted",
        "follows",
        "turbopuffer",
        "element",
        "hevlayerupsertedat"
      ]
    },
    {
      "id": "api/query#query-request",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Query request",
      "group": "API",
      "url": "/docs/api/query#query-request",
      "summary": "A query request posts a vector, top-k, filters, and selected attributes and returns ranked results with id, distance, and attributes, plus the freshness timestamp of the served response.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/query\nContent-Type: application/json\n\n{\n  \"vector\": [0.0012, -0.043],\n  \"top_k\": 10,\n  \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n  \"include_attributes\": [\"title\", \"category\"]\n}",
          "chunkId": "api/query#query-request"
        },
        {
          "kind": "code",
          "literal": "{\n  \"results\": [\n    {\"id\": \"asin-B08N5WRWNW\", \"dist\": 0.42, \"attributes\": {\"title\": \"...\"}}\n  ],\n  \"stable_as_of\": 1715600400000\n}",
          "chunkId": "api/query#query-request"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#query-request",
          "url": "/docs/api/query#query-request",
          "anchor": "query-request"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "query",
        "request",
        "posts",
        "vector",
        "filters",
        "selected",
        "attributes",
        "returns",
        "ranked",
        "results",
        "distance",
        "plus",
        "freshness",
        "timestamp",
        "served",
        "response",
        "post",
        "namespaces",
        "products",
        "content",
        "type",
        "application",
        "json",
        "0012",
        "category",
        "electronics",
        "include",
        "title",
        "asin",
        "b08n5wrwnw",
        "dist",
        "stable",
        "1715600400000",
        "topk",
        "includeattributes",
        "stableasof"
      ]
    },
    {
      "id": "api/query#single-fetch",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Single fetch",
      "group": "API",
      "url": "/docs/api/query#single-fetch",
      "summary": "Single fetch returns 200 with a cache header indicating hit, miss-with-backfill, or miss-on-error depending on cache and upstream state, and a 404 only when the document is absent from both layers.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category",
          "chunkId": "api/query#single-fetch"
        },
        {
          "kind": "code",
          "literal": "x-layer-cache: hit",
          "chunkId": "api/query#single-fetch"
        },
        {
          "kind": "code",
          "literal": "x-layer-cache: miss",
          "chunkId": "api/query#single-fetch"
        },
        {
          "kind": "code",
          "literal": "x-layer-cache: miss-on-error",
          "chunkId": "api/query#single-fetch"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#single-fetch",
          "url": "/docs/api/query#single-fetch",
          "anchor": "single-fetch"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "single",
        "fetch",
        "returns",
        "cache",
        "header",
        "indicating",
        "miss",
        "backfill",
        "error",
        "depending",
        "upstream",
        "state",
        "only",
        "document",
        "absent",
        "both",
        "layers",
        "namespaces",
        "products",
        "documents",
        "asin",
        "b08n5wrwnw",
        "include",
        "attributes",
        "title",
        "category",
        "layer",
        "includeattributes",
        "outcome",
        "status",
        "cached",
        "backfilled",
        "unavailable",
        "missing"
      ]
    },
    {
      "id": "api/query#strong-consistent-reads",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Strong-consistent reads",
      "group": "API",
      "url": "/docs/api/query#strong-consistent-reads",
      "summary": "Because the upstream indexes upserts asynchronously, a naive read after an upsert can be partial or rate-limited under write pressure; Layer runs queries at eventual consistency upstream, polls each namespace's index status to record a watermark, and per query injects a hidden upsert-time-bounded predicate only while the index is updating, retrying once with the filter forced on after a rate-limit. Every response reports the most recent watermark, omitted only on a cold-start gateway.",
      "facts": [
        {
          "kind": "code",
          "literal": "consistency=eventual",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "index.status",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "poll_start - safety_margin",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "Updating",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at <= watermark",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "Stable",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "Unknown",
          "chunkId": "api/query#strong-consistent-reads"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/query#strong-consistent-reads"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#strong-consistent-reads",
          "url": "/docs/api/query#strong-consistent-reads",
          "anchor": "strong-consistent-reads"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "strong",
        "consistent",
        "reads",
        "because",
        "upstream",
        "indexes",
        "upserts",
        "asynchronously",
        "naive",
        "read",
        "after",
        "upsert",
        "partial",
        "rate",
        "limited",
        "under",
        "write",
        "pressure",
        "layer",
        "runs",
        "queries",
        "eventual",
        "consistency",
        "polls",
        "namespace",
        "index",
        "status",
        "record",
        "watermark",
        "query",
        "injects",
        "hidden",
        "time",
        "bounded",
        "predicate",
        "only",
        "while",
        "updating",
        "retrying",
        "once"
      ]
    },
    {
      "id": "api/query#tunables",
      "kind": "section",
      "title": "Query & Fetch",
      "heading": "Tunables",
      "group": "API",
      "url": "/docs/api/query#tunables",
      "summary": "Two environment tunables control consistency: how often the watcher polls each namespace, and the cushion between poll time and the recorded watermark to cover in-flight upserts.",
      "facts": [
        {
          "kind": "code",
          "literal": "CONSISTENCY_POLL_INTERVAL_MS",
          "chunkId": "api/query#tunables"
        },
        {
          "kind": "code",
          "literal": "CONSISTENCY_SAFETY_MARGIN_MS",
          "chunkId": "api/query#tunables"
        }
      ],
      "sources": [
        {
          "chunkId": "api/query#tunables",
          "url": "/docs/api/query#tunables",
          "anchor": "tunables"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "tunables",
        "environment",
        "control",
        "consistency",
        "often",
        "watcher",
        "polls",
        "namespace",
        "cushion",
        "between",
        "poll",
        "time",
        "recorded",
        "watermark",
        "cover",
        "flight",
        "upserts",
        "interval",
        "safety",
        "margin",
        "variable",
        "default",
        "purpose",
        "consistencypollintervalms",
        "1000",
        "consistencysafetymarginms"
      ]
    },
    {
      "id": "api/result-count",
      "kind": "section",
      "title": "Result Count",
      "heading": null,
      "group": "API",
      "url": "/docs/api/result-count",
      "summary": "Result count answers how many rows match a ranked FTS or vector query, distinct from scan count which counts rows matching a plain filter. It supports a bounded single-pass mode and an exhaustive recursive mode, carries a request deadline with a server-side maximum, and on timeout returns the partial count flagged as bounded and timed out.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/result-count\nContent-Type: application/json\n\n{\n  \"query\": {\"field\": \"title\", \"fts\": \"wireless headphones\"},\n  \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n  \"mode\": \"bounded\",\n  \"timeout_seconds\": 30\n}",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "{\n  \"count\": 4210,\n  \"bounded\": false,\n  \"timed_out\": false,\n  \"shards_saturated\": 0,\n  \"shards_total\": 1,\n  \"elapsed_ms\": 42\n}",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "field",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "fts",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "vector",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "max_distance",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "bounded",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "top_k",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "exhaustive",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "bounded: true",
          "chunkId": "api/result-count"
        },
        {
          "kind": "code",
          "literal": "timed_out: true",
          "chunkId": "api/result-count"
        }
      ],
      "sources": [
        {
          "chunkId": "api/result-count",
          "url": "/docs/api/result-count",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "result",
        "count",
        "answers",
        "many",
        "rows",
        "match",
        "ranked",
        "vector",
        "query",
        "distinct",
        "scan",
        "counts",
        "matching",
        "plain",
        "filter",
        "supports",
        "bounded",
        "single",
        "pass",
        "mode",
        "exhaustive",
        "recursive",
        "carries",
        "request",
        "deadline",
        "server",
        "side",
        "maximum",
        "timeout",
        "returns",
        "partial",
        "flagged",
        "timed",
        "post",
        "namespaces",
        "products",
        "content",
        "type",
        "application",
        "json"
      ]
    },
    {
      "id": "api/scans",
      "kind": "section",
      "title": "Scan",
      "heading": null,
      "group": "API",
      "url": "/docs/api/scans",
      "summary": "Scans iterate a namespace by filter: ID mode creates an asynchronous job that returns matching IDs through a results route, while count mode returns a single number synchronously.",
      "facts": [
        {
          "kind": "code",
          "literal": "mode: ids",
          "chunkId": "api/scans"
        },
        {
          "kind": "code",
          "literal": "mode: count",
          "chunkId": "api/scans"
        }
      ],
      "sources": [
        {
          "chunkId": "api/scans",
          "url": "/docs/api/scans",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "scans",
        "iterate",
        "namespace",
        "filter",
        "mode",
        "creates",
        "asynchronous",
        "returns",
        "matching",
        "through",
        "results",
        "route",
        "while",
        "count",
        "single",
        "number",
        "synchronously"
      ]
    },
    {
      "id": "api/scans#auto-mode-policy",
      "kind": "section",
      "title": "Scan",
      "heading": "Auto-Mode Policy",
      "group": "API",
      "url": "/docs/api/scans#auto-mode-policy",
      "summary": "Auto mode ties cache freshness to the consistency watermark by tracking a per-namespace warmed-through marker; depending on whether the cache is empty, populated and fresh, or populated but stale, the gateway runs origin, serves cache, or serves cache while starting a background warm. When cache is used it adds a warmed-through upper-bound predicate so the scan is a stable warmed view.",
      "facts": [
        {
          "kind": "code",
          "literal": "cache_warmed_through",
          "chunkId": "api/scans#auto-mode-policy"
        },
        {
          "kind": "code",
          "literal": "cache_warmed_through >= watermark",
          "chunkId": "api/scans#auto-mode-policy"
        },
        {
          "kind": "code",
          "literal": "cache_warmed_through < watermark",
          "chunkId": "api/scans#auto-mode-policy"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at <= cache_warmed_through",
          "chunkId": "api/scans#auto-mode-policy"
        }
      ],
      "sources": [
        {
          "chunkId": "api/scans#auto-mode-policy",
          "url": "/docs/api/scans#auto-mode-policy",
          "anchor": "auto-mode-policy"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "auto",
        "mode",
        "policy",
        "ties",
        "cache",
        "freshness",
        "consistency",
        "watermark",
        "tracking",
        "namespace",
        "warmed",
        "through",
        "marker",
        "depending",
        "whether",
        "empty",
        "populated",
        "fresh",
        "stale",
        "gateway",
        "runs",
        "origin",
        "serves",
        "while",
        "starting",
        "background",
        "warm",
        "adds",
        "upper",
        "bound",
        "predicate",
        "scan",
        "stable",
        "view",
        "hevlayer",
        "upserted",
        "same",
        "strong",
        "consistent",
        "queries"
      ]
    },
    {
      "id": "api/scans#count-mode",
      "kind": "section",
      "title": "Scan",
      "heading": "Count Mode",
      "group": "API",
      "url": "/docs/api/scans#count-mode",
      "summary": "Count mode posts a filter and a source and returns a single count with the serving source and timing; snapshot reads are eligible only for a single leaf equality/membership filter on a field present in the latest snapshot, and unsupported filters fall through under auto or fail with a precondition error under an explicit snapshot source. Live count responses add bounded, timed-out, and shard fields.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/scans\nContent-Type: application/json\n\n{\n  \"mode\": \"count\",\n  \"source\": \"auto\",\n  \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n  \"timeout_seconds\": 30\n}",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "{\n  \"count\": 4210,\n  \"served_by\": \"snapshot\",\n  \"snapshot_sha\": \"3f9e8b21\",\n  \"watermark_ms\": 1747300000123,\n  \"elapsed_ms\": 3\n}",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "{\n  \"count\": 4210,\n  \"served_by\": \"origin\",\n  \"bounded\": false,\n  \"timed_out\": false,\n  \"shards_saturated\": 0,\n  \"shards_total\": 1,\n  \"elapsed_ms\": 42\n}",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "snapshot",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "cache",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "origin",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "Eq",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "In",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "fields[]",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "And",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "Or",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "Not",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "412 precondition_failed",
          "chunkId": "api/scans#count-mode"
        },
        {
          "kind": "code",
          "literal": "source: snapshot",
          "chunkId": "api/scans#count-mode"
        }
      ],
      "sources": [
        {
          "chunkId": "api/scans#count-mode",
          "url": "/docs/api/scans#count-mode",
          "anchor": "count-mode"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "count",
        "mode",
        "posts",
        "filter",
        "source",
        "returns",
        "single",
        "serving",
        "timing",
        "snapshot",
        "reads",
        "eligible",
        "only",
        "leaf",
        "equality",
        "membership",
        "field",
        "present",
        "latest",
        "unsupported",
        "filters",
        "fall",
        "through",
        "under",
        "auto",
        "fail",
        "precondition",
        "error",
        "explicit",
        "live",
        "responses",
        "bounded",
        "timed",
        "shard",
        "fields",
        "post",
        "namespaces",
        "products",
        "scans",
        "content"
      ]
    },
    {
      "id": "api/scans#id-mode",
      "kind": "section",
      "title": "Scan",
      "heading": "ID Mode",
      "group": "API",
      "url": "/docs/api/scans#id-mode",
      "summary": "ID mode posts a filter and returns an accepted job; once the job reports completed, the matching IDs are read paginated from a results route. Valid sources are auto, cache, and origin.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/scans\nContent-Type: application/json\n\n{\n  \"source\": \"auto\",\n  \"mode\": \"ids\",\n  \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n  \"page_size\": 1000\n}",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "{\n  \"id\": \"scan-uuid\",\n  \"namespace\": \"products\",\n  \"source\": \"auto\",\n  \"effective_source\": \"origin\",\n  \"status\": \"running\",\n  \"progress\": 0,\n  \"documents_scanned\": 0,\n  \"created_at\": \"2026-05-26T10:00:00Z\"\n}",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/scans/scan-uuid/results?limit=1000&offset=0",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "{\n  \"ids\": [\"doc-1\", \"doc-2\"],\n  \"total\": 2\n}",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "mode",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "ids",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "cache",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "origin",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "202 Accepted",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "status",
          "chunkId": "api/scans#id-mode"
        },
        {
          "kind": "code",
          "literal": "completed",
          "chunkId": "api/scans#id-mode"
        }
      ],
      "sources": [
        {
          "chunkId": "api/scans#id-mode",
          "url": "/docs/api/scans#id-mode",
          "anchor": "id-mode"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "mode",
        "posts",
        "filter",
        "returns",
        "accepted",
        "once",
        "reports",
        "completed",
        "matching",
        "read",
        "paginated",
        "results",
        "route",
        "valid",
        "sources",
        "auto",
        "cache",
        "origin",
        "post",
        "namespaces",
        "products",
        "scans",
        "content",
        "type",
        "application",
        "json",
        "source",
        "filters",
        "category",
        "electronics",
        "page",
        "size",
        "1000",
        "scan",
        "uuid",
        "namespace",
        "effective",
        "status",
        "running",
        "progress"
      ]
    },
    {
      "id": "api/scans#routes",
      "kind": "section",
      "title": "Scan",
      "heading": "Routes",
      "group": "API",
      "url": "/docs/api/scans#routes",
      "summary": "Lists the scan routes: create an ID job or return a count, list jobs, read one job, read completed results, and drop the in-memory job.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/scans",
          "chunkId": "api/scans#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/scans",
          "chunkId": "api/scans#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/scans/{id}",
          "chunkId": "api/scans#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/scans/{id}/results",
          "chunkId": "api/scans#routes"
        },
        {
          "kind": "code",
          "literal": "DELETE /v2/namespaces/{ns}/scans/{id}",
          "chunkId": "api/scans#routes"
        }
      ],
      "sources": [
        {
          "chunkId": "api/scans#routes",
          "url": "/docs/api/scans#routes",
          "anchor": "routes"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "routes",
        "lists",
        "scan",
        "create",
        "return",
        "count",
        "list",
        "jobs",
        "read",
        "completed",
        "results",
        "drop",
        "memory",
        "post",
        "namespaces",
        "scans",
        "delete",
        "route",
        "method",
        "behavior",
        "namespace"
      ]
    },
    {
      "id": "api/search-history",
      "kind": "section",
      "title": "Query History",
      "heading": null,
      "group": "API",
      "url": "/docs/api/search-history",
      "summary": "Layer logs every served query into a durable per-namespace JSONL trail in S3 mirrored to NVMe for fast recent reads, and records fetch events that consumers tag back to a query in a sibling clickstream feed, making a search session reconstructable for relevance tuning, A/B comparison, or incident review. Both surfaces are Layer-only.",
      "facts": [],
      "sources": [
        {
          "chunkId": "api/search-history",
          "url": "/docs/api/search-history",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "layer",
        "logs",
        "every",
        "served",
        "query",
        "durable",
        "namespace",
        "jsonl",
        "trail",
        "mirrored",
        "nvme",
        "fast",
        "recent",
        "reads",
        "records",
        "fetch",
        "events",
        "consumers",
        "back",
        "sibling",
        "clickstream",
        "feed",
        "making",
        "search",
        "session",
        "reconstructable",
        "relevance",
        "tuning",
        "comparison",
        "incident",
        "review",
        "both",
        "surfaces",
        "only",
        "history",
        "backed",
        "gateway",
        "serves",
        "cache",
        "downstream"
      ]
    },
    {
      "id": "api/search-history#clickstream-entry",
      "kind": "section",
      "title": "Query History",
      "heading": "Clickstream entry",
      "group": "API",
      "url": "/docs/api/search-history#clickstream-entry",
      "summary": "A clickstream entry records timestamps, a trace id joining it to the originating search-history entry, namespace, document id, tags, source, and whether the result was served from cache or an upstream fetch; the trace id is queryable to pull every event for a session.",
      "facts": [
        {
          "kind": "code",
          "literal": "{\n  \"events\": [\n    {\n      \"timestamp\": \"2026-05-22T08:00:02.143Z\",\n      \"timestamp_nanos\": 1747900802143000000,\n      \"trace_id\": \"f81d4fae-7dec-11d0-a765-00a0c91e6bf6\",\n      \"namespace\": \"products\",\n      \"doc_id\": \"asin-B08N5WRWNW\",\n      \"tags\": [\"session:abc123\"],\n      \"source\": \"fetch\",\n      \"served_from\": \"cache\"\n    }\n  ],\n  \"next_cursor\": \"1747900802142000000\"\n}",
          "chunkId": "api/search-history#clickstream-entry"
        },
        {
          "kind": "code",
          "literal": "trace_id",
          "chunkId": "api/search-history#clickstream-entry"
        },
        {
          "kind": "code",
          "literal": "served_from",
          "chunkId": "api/search-history#clickstream-entry"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#clickstream-entry",
          "url": "/docs/api/search-history#clickstream-entry",
          "anchor": "clickstream-entry"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "clickstream",
        "entry",
        "records",
        "timestamps",
        "trace",
        "joining",
        "originating",
        "search",
        "history",
        "namespace",
        "document",
        "tags",
        "source",
        "whether",
        "result",
        "served",
        "cache",
        "upstream",
        "fetch",
        "queryable",
        "pull",
        "every",
        "event",
        "session",
        "events",
        "timestamp",
        "2026",
        "22t08",
        "143z",
        "nanos",
        "1747900802143000000",
        "f81d4fae",
        "7dec",
        "11d0",
        "a765",
        "00a0c91e6bf6",
        "products",
        "asin",
        "b08n5wrwnw",
        "abc123"
      ]
    },
    {
      "id": "api/search-history#query-parameters",
      "kind": "section",
      "title": "Query History",
      "heading": "Query parameters",
      "group": "API",
      "url": "/docs/api/search-history#query-parameters",
      "summary": "History list calls accept a comma-separated tag filter with AND semantics, RFC3339 from/to time bounds, a pagination cursor returning entries strictly older than a given timestamp, and a capped limit.",
      "facts": [
        {
          "kind": "code",
          "literal": "tag",
          "chunkId": "api/search-history#query-parameters"
        },
        {
          "kind": "code",
          "literal": "from",
          "chunkId": "api/search-history#query-parameters"
        },
        {
          "kind": "code",
          "literal": "to",
          "chunkId": "api/search-history#query-parameters"
        },
        {
          "kind": "code",
          "literal": "before",
          "chunkId": "api/search-history#query-parameters"
        },
        {
          "kind": "code",
          "literal": "timestamp_nanos",
          "chunkId": "api/search-history#query-parameters"
        },
        {
          "kind": "code",
          "literal": "limit",
          "chunkId": "api/search-history#query-parameters"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#query-parameters",
          "url": "/docs/api/search-history#query-parameters",
          "anchor": "query-parameters"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "query",
        "parameters",
        "history",
        "list",
        "calls",
        "accept",
        "comma",
        "separated",
        "filter",
        "semantics",
        "rfc3339",
        "time",
        "bounds",
        "pagination",
        "cursor",
        "returning",
        "entries",
        "strictly",
        "older",
        "given",
        "timestamp",
        "capped",
        "limit",
        "before",
        "nanos",
        "param",
        "purpose",
        "every",
        "must",
        "match",
        "return",
        "timestampnanos",
        "default"
      ]
    },
    {
      "id": "api/search-history#routes",
      "kind": "section",
      "title": "Query History",
      "heading": "Routes",
      "group": "API",
      "url": "/docs/api/search-history#routes",
      "summary": "Two routes return the per-namespace query log and the correlated clickstream feed, both newest-first, with versioned aliases held for client compatibility.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/search-history",
          "chunkId": "api/search-history#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/clickstream",
          "chunkId": "api/search-history#routes"
        },
        {
          "kind": "code",
          "literal": "/v1/",
          "chunkId": "api/search-history#routes"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#routes",
          "url": "/docs/api/search-history#routes",
          "anchor": "routes"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "routes",
        "return",
        "namespace",
        "query",
        "correlated",
        "clickstream",
        "feed",
        "both",
        "newest",
        "first",
        "versioned",
        "aliases",
        "held",
        "client",
        "compatibility",
        "namespaces",
        "search",
        "history",
        "route",
        "behavior",
        "fetch",
        "events",
        "versions",
        "identical"
      ]
    },
    {
      "id": "api/search-history#search-history-entry",
      "kind": "section",
      "title": "Query History",
      "heading": "Search history entry",
      "group": "API",
      "url": "/docs/api/search-history#search-history-entry",
      "summary": "A search-history entry records wall-clock and nanosecond timestamps (the cursor), a trace id joining to the clickstream, the caller-supplied raw query string, the freshness watermark used, a structured query summary, the top result ids in rank order, and caller-supplied segmentation tags.",
      "facts": [
        {
          "kind": "code",
          "literal": "timestamp",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "timestamp_nanos",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "trace_id",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "raw_query",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "x-hevlayer-search-query",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "query",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "top_result_ids",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "code",
          "literal": "tags",
          "chunkId": "api/search-history#search-history-entry"
        },
        {
          "kind": "value",
          "literal": "e.g",
          "chunkId": "api/search-history#search-history-entry"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#search-history-entry",
          "url": "/docs/api/search-history#search-history-entry",
          "anchor": "search-history-entry"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "search",
        "history",
        "entry",
        "records",
        "wall",
        "clock",
        "nanosecond",
        "timestamps",
        "cursor",
        "trace",
        "joining",
        "clickstream",
        "caller",
        "supplied",
        "query",
        "string",
        "freshness",
        "watermark",
        "structured",
        "summary",
        "result",
        "rank",
        "order",
        "segmentation",
        "tags",
        "timestamp",
        "nanos",
        "hevlayer",
        "stable",
        "entries",
        "2026",
        "22t08",
        "000z",
        "timestampnanos",
        "1747900800000000000",
        "namespace",
        "products",
        "traceid",
        "f81d4fae",
        "7dec"
      ]
    },
    {
      "id": "api/search-history#storage",
      "kind": "section",
      "title": "Query History",
      "heading": "Storage",
      "group": "API",
      "url": "/docs/api/search-history#storage",
      "summary": "History is stored as date-partitioned JSONL keyed by nanosecond timestamp; writes are best-effort and never block the query response, with the cache holding a recent window for fast reads and S3 as the durable store, so a cache outage degrades read latency but not durability.",
      "facts": [
        {
          "kind": "code",
          "literal": "search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl",
          "chunkId": "api/search-history#storage"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#storage",
          "url": "/docs/api/search-history#storage",
          "anchor": "storage"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "storage",
        "history",
        "stored",
        "date",
        "partitioned",
        "jsonl",
        "keyed",
        "nanosecond",
        "timestamp",
        "writes",
        "best",
        "effort",
        "never",
        "block",
        "query",
        "response",
        "cache",
        "holding",
        "recent",
        "window",
        "fast",
        "reads",
        "durable",
        "store",
        "outage",
        "degrades",
        "read",
        "latency",
        "durability",
        "search",
        "namespace",
        "yyyy",
        "nanos",
        "timestampnanos",
        "aerospike",
        "holds",
        "list",
        "calls",
        "walk",
        "prefix"
      ]
    },
    {
      "id": "api/search-history#tag-contract",
      "kind": "section",
      "title": "Query History",
      "heading": "Tag contract",
      "group": "API",
      "url": "/docs/api/search-history#tag-contract",
      "summary": "Layer splits, trims, sorts, and dedupes tags from a header and query param before storing or matching them; commas are unescapable separators, and there are caps on tag count, tag length, and allowed characters. List filtering uses AND semantics so all requested tags must match.",
      "facts": [
        {
          "kind": "code",
          "literal": "x-hevlayer-tags",
          "chunkId": "api/search-history#tag-contract"
        },
        {
          "kind": "code",
          "literal": "?tag=",
          "chunkId": "api/search-history#tag-contract"
        },
        {
          "kind": "code",
          "literal": "?tag=a,b",
          "chunkId": "api/search-history#tag-contract"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#tag-contract",
          "url": "/docs/api/search-history#tag-contract",
          "anchor": "tag-contract"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "contract",
        "layer",
        "splits",
        "trims",
        "sorts",
        "dedupes",
        "tags",
        "header",
        "query",
        "param",
        "before",
        "storing",
        "matching",
        "commas",
        "unescapable",
        "separators",
        "there",
        "caps",
        "count",
        "length",
        "allowed",
        "characters",
        "list",
        "filtering",
        "uses",
        "semantics",
        "requested",
        "must",
        "match",
        "hevlayer",
        "whitespace",
        "drops",
        "empty",
        "values",
        "cannot",
        "escaped",
        "limits",
        "limit",
        "value",
        "unique"
      ]
    },
    {
      "id": "api/search-history#writing-metadata",
      "kind": "section",
      "title": "Query History",
      "heading": "Writing metadata",
      "group": "API",
      "url": "/docs/api/search-history#writing-metadata",
      "summary": "Callers set a header to capture the human query input and another header for comma-separated segmentation tags, both exposed by the Python SDK on the query and history-list calls; the guidance is to keep the query text in the raw-query field and use tags only for segmentation.",
      "facts": [
        {
          "kind": "code",
          "literal": "query = await client.query_namespace(\n    \"products\",\n    {\"vector\": embedding, \"top_k\": 10, \"include_attributes\": [\"title\"]},\n    raw_query=\"wireless headphones\",\n    tags=[\"app:hev-shop\", \"surface:storefront\", \"route:search\", \"page:first\"],\n)\n\nhistory = await client.list_search_history(\n    \"products\",\n    tags=[\"app:hev-shop\", \"route:search\", \"page:first\"],\n    limit=20,\n)",
          "chunkId": "api/search-history#writing-metadata"
        },
        {
          "kind": "code",
          "literal": "x-hevlayer-search-query",
          "chunkId": "api/search-history#writing-metadata"
        },
        {
          "kind": "code",
          "literal": "x-hevlayer-tags",
          "chunkId": "api/search-history#writing-metadata"
        },
        {
          "kind": "code",
          "literal": "raw_query",
          "chunkId": "api/search-history#writing-metadata"
        },
        {
          "kind": "code",
          "literal": "tags",
          "chunkId": "api/search-history#writing-metadata"
        }
      ],
      "sources": [
        {
          "chunkId": "api/search-history#writing-metadata",
          "url": "/docs/api/search-history#writing-metadata",
          "anchor": "writing-metadata"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "writing",
        "metadata",
        "callers",
        "header",
        "capture",
        "human",
        "query",
        "input",
        "another",
        "comma",
        "separated",
        "segmentation",
        "tags",
        "both",
        "exposed",
        "python",
        "history",
        "list",
        "calls",
        "guidance",
        "keep",
        "text",
        "field",
        "only",
        "await",
        "client",
        "namespace",
        "products",
        "vector",
        "embedding",
        "include",
        "attributes",
        "title",
        "wireless",
        "headphones",
        "shop",
        "surface",
        "storefront",
        "route",
        "search"
      ]
    },
    {
      "id": "api/snapshots",
      "kind": "section",
      "title": "Snapshot History",
      "heading": null,
      "group": "API",
      "url": "/docs/api/snapshots",
      "summary": "Snapshots are materialized facet histograms for a namespace carrying facet listings and counts, stored durably in S3 and mirrored into the cache for the latest body; a route materializes a field on demand, and history and body routes read the durable chronology written by the consistency watcher.",
      "facts": [
        {
          "kind": "code",
          "literal": "values[].v",
          "chunkId": "api/snapshots"
        },
        {
          "kind": "code",
          "literal": "values[].n",
          "chunkId": "api/snapshots"
        },
        {
          "kind": "code",
          "literal": "POST /snapshots",
          "chunkId": "api/snapshots"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots",
          "url": "/docs/api/snapshots",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "snapshots",
        "materialized",
        "facet",
        "histograms",
        "namespace",
        "carrying",
        "listings",
        "counts",
        "stored",
        "durably",
        "mirrored",
        "cache",
        "latest",
        "body",
        "route",
        "materializes",
        "field",
        "demand",
        "history",
        "routes",
        "read",
        "durable",
        "chronology",
        "written",
        "consistency",
        "watcher",
        "values",
        "post",
        "snapshot",
        "jobs",
        "bodies",
        "activity",
        "streams",
        "carry",
        "aerospike",
        "materialize"
      ]
    },
    {
      "id": "api/snapshots#activity",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "Activity",
      "group": "API",
      "url": "/docs/api/snapshots#activity",
      "summary": "The snapshot activity stream returns snapshot lifecycle events filtered by a required epoch-ms lower bound, with optional limit, namespace filter, and pagination cursor; it covers snapshots only, as search history and clickstream have separate feeds.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/activity/snapshots?since=1747200000000&limit=50",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "since",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "ts_ms",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "limit",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "namespace",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "cursor",
          "chunkId": "api/snapshots#activity"
        },
        {
          "kind": "code",
          "literal": "next_cursor",
          "chunkId": "api/snapshots#activity"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#activity",
          "url": "/docs/api/snapshots#activity",
          "anchor": "activity"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "activity",
        "snapshot",
        "stream",
        "returns",
        "lifecycle",
        "events",
        "filtered",
        "required",
        "epoch",
        "lower",
        "bound",
        "optional",
        "limit",
        "namespace",
        "filter",
        "pagination",
        "cursor",
        "covers",
        "snapshots",
        "only",
        "search",
        "history",
        "clickstream",
        "separate",
        "feeds",
        "since",
        "1747200000000",
        "next",
        "query",
        "param",
        "purpose",
        "tsms",
        "default",
        "exact",
        "nextcursor"
      ]
    },
    {
      "id": "api/snapshots#configure-watched-fields",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "Configure watched fields",
      "group": "API",
      "url": "/docs/api/snapshots#configure-watched-fields",
      "summary": "The consistency watcher only materializes snapshots for facet fields it is told to watch, configured via an environment variable mapping each namespace to its facet fields (also a Helm value); the default is empty which disables the snapshot writer. Auto-discovered namespaces are registered but only listed fields are materialized, and a minimum-interval setting floors the time between writes.",
      "facts": [
        {
          "kind": "code",
          "literal": "export LAYER_FACET_FIELDS='{\n  \"products\": [\"category\", \"brand\"],\n  \"reviews\": [\"sentiment\", \"language\"]\n}'",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "LAYER_FACET_FIELDS",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "gateway.facetFields",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "source: stored",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "source: auto",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "LAYER_SNAPSHOT_MIN_INTERVAL_MS",
          "chunkId": "api/snapshots#configure-watched-fields"
        },
        {
          "kind": "code",
          "literal": "300000",
          "chunkId": "api/snapshots#configure-watched-fields"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#configure-watched-fields",
          "url": "/docs/api/snapshots#configure-watched-fields",
          "anchor": "configure-watched-fields"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "configure",
        "watched",
        "fields",
        "consistency",
        "watcher",
        "only",
        "materializes",
        "snapshots",
        "facet",
        "told",
        "watch",
        "configured",
        "environment",
        "variable",
        "mapping",
        "namespace",
        "also",
        "helm",
        "value",
        "default",
        "empty",
        "disables",
        "snapshot",
        "writer",
        "auto",
        "discovered",
        "namespaces",
        "registered",
        "listed",
        "materialized",
        "minimum",
        "interval",
        "setting",
        "floors",
        "time",
        "between",
        "writes",
        "export",
        "layer",
        "products"
      ]
    },
    {
      "id": "api/snapshots#create-a-snapshot-job",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "Create a snapshot job",
      "group": "API",
      "url": "/docs/api/snapshots#create-a-snapshot-job",
      "summary": "Creating a snapshot job posts a field, source, and optional filter and returns an accepted job to poll; valid sources are auto, stored, cache, and origin, where stored is fastest for configured fields, cache supports filters it can evaluate, and origin is authoritative and persists the computed body to S3. Completed jobs include a content SHA when a body was materialized.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/snapshots\nContent-Type: application/json\n\n{\n  \"field\": \"category\",\n  \"source\": \"auto\",\n  \"filters\": [\"brand\", \"Eq\", \"Acme\"],\n  \"page_size\": 1000\n}",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "{\n  \"id\": \"snapshot-job-uuid\",\n  \"namespace\": \"products\",\n  \"field\": \"category\",\n  \"source\": \"auto\",\n  \"status\": \"running\",\n  \"progress\": 0,\n  \"documents_scanned\": 0,\n  \"created_at\": \"2026-05-26T10:00:00Z\"\n}",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/snapshot-jobs/snapshot-job-uuid",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "{\n  \"id\": \"snapshot-job-uuid\",\n  \"namespace\": \"products\",\n  \"field\": \"category\",\n  \"source\": \"origin\",\n  \"status\": \"completed\",\n  \"documents_scanned\": 12844,\n  \"sha\": \"3f9e8b21\",\n  \"stable_as_of\": 1747300000123\n}",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "stored",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "cache",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "origin",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "202 Accepted",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        },
        {
          "kind": "code",
          "literal": "sha",
          "chunkId": "api/snapshots#create-a-snapshot-job"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#create-a-snapshot-job",
          "url": "/docs/api/snapshots#create-a-snapshot-job",
          "anchor": "create-a-snapshot-job"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "create",
        "snapshot",
        "creating",
        "posts",
        "field",
        "source",
        "optional",
        "filter",
        "returns",
        "accepted",
        "poll",
        "valid",
        "sources",
        "auto",
        "stored",
        "cache",
        "origin",
        "fastest",
        "configured",
        "fields",
        "supports",
        "filters",
        "evaluate",
        "authoritative",
        "persists",
        "computed",
        "body",
        "completed",
        "jobs",
        "include",
        "content",
        "materialized",
        "post",
        "namespaces",
        "products",
        "snapshots",
        "type",
        "application",
        "json",
        "category"
      ]
    },
    {
      "id": "api/snapshots#history",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "History",
      "group": "API",
      "url": "/docs/api/snapshots#history",
      "summary": "The history route lists durable snapshots newest-first as watermark/SHA pairs, accepting a capped limit and a before-cursor that takes 7-character SHA prefixes; it lists S3 keys only and does not read snapshot bodies.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/history?limit=20",
          "chunkId": "api/snapshots#history"
        },
        {
          "kind": "code",
          "literal": "[\n  {\"watermark_ms\": 1747300000123, \"sha\": \"3f9e8b21...\"},\n  {\"watermark_ms\": 1747299600045, \"sha\": \"a1c5b09f...\"}\n]",
          "chunkId": "api/snapshots#history"
        },
        {
          "kind": "code",
          "literal": "limit",
          "chunkId": "api/snapshots#history"
        },
        {
          "kind": "code",
          "literal": "before",
          "chunkId": "api/snapshots#history"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#history",
          "url": "/docs/api/snapshots#history",
          "anchor": "history"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "history",
        "route",
        "lists",
        "durable",
        "snapshots",
        "newest",
        "first",
        "watermark",
        "pairs",
        "accepting",
        "capped",
        "limit",
        "before",
        "cursor",
        "takes",
        "character",
        "prefixes",
        "keys",
        "only",
        "does",
        "read",
        "snapshot",
        "bodies",
        "namespaces",
        "products",
        "1747300000123",
        "3f9e8b21",
        "1747299600045",
        "a1c5b09f",
        "watermarkms",
        "query",
        "param",
        "default",
        "purpose",
        "maximum",
        "entries",
        "returned",
        "none",
        "return",
        "older"
      ]
    },
    {
      "id": "api/snapshots#routes",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "Routes",
      "group": "API",
      "url": "/docs/api/snapshots#routes",
      "summary": "Lists the snapshot routes: create an on-demand job for one field, list and read jobs, read durable history, fetch a full body by SHA or prefix, and read the cross-namespace snapshot-write activity stream.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/snapshots",
          "chunkId": "api/snapshots#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/snapshot-jobs",
          "chunkId": "api/snapshots#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/snapshot-jobs/{id}",
          "chunkId": "api/snapshots#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/history",
          "chunkId": "api/snapshots#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/{ns}/snapshots/{sha}",
          "chunkId": "api/snapshots#routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/activity/snapshots",
          "chunkId": "api/snapshots#routes"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#routes",
          "url": "/docs/api/snapshots#routes",
          "anchor": "routes"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "routes",
        "lists",
        "snapshot",
        "create",
        "demand",
        "field",
        "list",
        "read",
        "jobs",
        "durable",
        "history",
        "fetch",
        "full",
        "body",
        "prefix",
        "cross",
        "namespace",
        "write",
        "activity",
        "stream",
        "post",
        "namespaces",
        "snapshots",
        "route",
        "method",
        "behavior",
        "memory",
        "newest",
        "first",
        "char"
      ]
    },
    {
      "id": "api/snapshots#snapshot-body",
      "kind": "section",
      "title": "Snapshot History",
      "heading": "Snapshot body",
      "group": "API",
      "url": "/docs/api/snapshots#snapshot-body",
      "summary": "A snapshot body returns the namespace, watermark, SHA, and per-field facet listings with their values and counts, plus a skipped-fields section for fields above the distinct-value cap; fields present in the listings are complete, while over-cap fields are reported as skipped rather than partially materialized.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/snapshots/3f9e8b2",
          "chunkId": "api/snapshots#snapshot-body"
        },
        {
          "kind": "code",
          "literal": "{\n  \"namespace\": \"products\",\n  \"watermark_ms\": 1747300000123,\n  \"sha\": \"3f9e8b21\",\n  \"fields\": [\n    {\n      \"name\": \"category\",\n      \"values\": [\n        {\"v\": \"books\", \"n\": 1240},\n        {\"v\": \"electronics\", \"n\": 873}\n      ]\n    }\n  ],\n  \"fields_skipped\": [\n    {\n      \"name\": \"tags\",\n      \"reason\": \"exceeded_cap\",\n      \"distinct_observed\": 247000,\n      \"cap\": 10000\n    }\n  ]\n}",
          "chunkId": "api/snapshots#snapshot-body"
        },
        {
          "kind": "code",
          "literal": "fields[].values[].v",
          "chunkId": "api/snapshots#snapshot-body"
        },
        {
          "kind": "code",
          "literal": "fields[].values[].n",
          "chunkId": "api/snapshots#snapshot-body"
        },
        {
          "kind": "code",
          "literal": "fields[]",
          "chunkId": "api/snapshots#snapshot-body"
        },
        {
          "kind": "code",
          "literal": "fields_skipped[]",
          "chunkId": "api/snapshots#snapshot-body"
        }
      ],
      "sources": [
        {
          "chunkId": "api/snapshots#snapshot-body",
          "url": "/docs/api/snapshots#snapshot-body",
          "anchor": "snapshot-body"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "snapshot",
        "body",
        "returns",
        "namespace",
        "watermark",
        "field",
        "facet",
        "listings",
        "their",
        "values",
        "counts",
        "plus",
        "skipped",
        "fields",
        "section",
        "above",
        "distinct",
        "value",
        "present",
        "complete",
        "while",
        "reported",
        "rather",
        "partially",
        "materialized",
        "namespaces",
        "products",
        "snapshots",
        "3f9e8b2",
        "1747300000123",
        "3f9e8b21",
        "name",
        "category",
        "books",
        "1240",
        "electronics",
        "tags",
        "reason",
        "exceeded",
        "observed"
      ]
    },
    {
      "id": "api/warm-cache",
      "kind": "section",
      "title": "Warm cache",
      "heading": null,
      "group": "API",
      "url": "/docs/api/warm-cache",
      "summary": "Layer exposes two warm surfaces: a Turbopuffer-compatible warm hint that advises the upstream index to preload and additionally runs Layer-side warm steps, and a Layer-only shortcut that creates a gateway warm job.",
      "facts": [
        {
          "kind": "code",
          "literal": "hint_cache_warm",
          "chunkId": "api/warm-cache"
        },
        {
          "kind": "code",
          "literal": "warm",
          "chunkId": "api/warm-cache"
        },
        {
          "kind": "code",
          "literal": "GET /v1/namespaces/{ns}/hint_cache_warm",
          "chunkId": "api/warm-cache"
        },
        {
          "kind": "value",
          "literal": "Upstream.astro",
          "chunkId": "api/warm-cache"
        },
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "api/warm-cache"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/warm-cache"
        }
      ],
      "sources": [
        {
          "chunkId": "api/warm-cache",
          "url": "/docs/api/warm-cache",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "layer",
        "exposes",
        "warm",
        "surfaces",
        "turbopuffer",
        "compatible",
        "hint",
        "advises",
        "upstream",
        "index",
        "preload",
        "additionally",
        "runs",
        "side",
        "steps",
        "only",
        "shortcut",
        "creates",
        "gateway",
        "cache",
        "namespaces",
        "astro",
        "callout",
        "namespace",
        "nvme",
        "snapshot",
        "mirror",
        "hintcachewarm",
        "matches",
        "call",
        "load"
      ]
    },
    {
      "id": "api/warm-cache#cache-cold-behavior",
      "kind": "section",
      "title": "Warm cache",
      "heading": "Cache-cold behavior",
      "group": "API",
      "url": "/docs/api/warm-cache#cache-cold-behavior",
      "summary": "Warm jobs, cache scans, cache snapshot jobs, and pipeline chunk reads return a cache-cold error when the NVMe cache is unavailable, while fetch falls through to upstream with a miss-on-error marker; the split is deliberate because fetch is correctness-first and warming on a cold cache would be wasted work.",
      "facts": [
        {
          "kind": "code",
          "literal": "cache_cold",
          "chunkId": "api/warm-cache#cache-cold-behavior"
        },
        {
          "kind": "code",
          "literal": "x-layer-cache: miss-on-error",
          "chunkId": "api/warm-cache#cache-cold-behavior"
        }
      ],
      "sources": [
        {
          "chunkId": "api/warm-cache#cache-cold-behavior",
          "url": "/docs/api/warm-cache#cache-cold-behavior",
          "anchor": "cache-cold-behavior"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "cache",
        "cold",
        "behavior",
        "warm",
        "jobs",
        "scans",
        "snapshot",
        "pipeline",
        "chunk",
        "reads",
        "return",
        "error",
        "nvme",
        "unavailable",
        "while",
        "fetch",
        "falls",
        "through",
        "upstream",
        "miss",
        "marker",
        "split",
        "deliberate",
        "because",
        "correctness",
        "first",
        "warming",
        "would",
        "wasted",
        "work",
        "layer",
        "cachecold",
        "many",
        "fall",
        "turbopuffer",
        "instead",
        "outage",
        "must",
        "turn",
        "missing"
      ]
    },
    {
      "id": "api/warm-cache#hint-cache-warm",
      "kind": "section",
      "title": "Warm cache",
      "heading": "Hint-cache warm",
      "group": "API",
      "url": "/docs/api/warm-cache#hint-cache-warm",
      "summary": "The warm-hint route runs three default-on Layer steps (forward the hint upstream, start an origin warm job to backfill the cache, and mirror the latest snapshot body into NVMe), each independently disableable via query params; the response reports per-step status and includes a pollable warm job when the documents step is enabled.",
      "facts": [
        {
          "kind": "code",
          "literal": "GET /v1/namespaces/products/hint_cache_warm",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "GET /v1/namespaces/products/hint_cache_warm?turbopuffer=false&documents=false&snapshots=true",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "turbopuffer=true",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "documents=true",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "snapshots=true",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "documents",
          "chunkId": "api/warm-cache#hint-cache-warm"
        },
        {
          "kind": "code",
          "literal": "/warm-jobs/{id}",
          "chunkId": "api/warm-cache#hint-cache-warm"
        }
      ],
      "sources": [
        {
          "chunkId": "api/warm-cache#hint-cache-warm",
          "url": "/docs/api/warm-cache#hint-cache-warm",
          "anchor": "hint-cache-warm"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "hint",
        "cache",
        "warm",
        "route",
        "runs",
        "three",
        "default",
        "layer",
        "steps",
        "forward",
        "upstream",
        "start",
        "origin",
        "backfill",
        "mirror",
        "latest",
        "snapshot",
        "body",
        "nvme",
        "independently",
        "disableable",
        "query",
        "params",
        "response",
        "reports",
        "step",
        "status",
        "includes",
        "pollable",
        "documents",
        "enabled",
        "namespaces",
        "products",
        "turbopuffer",
        "false",
        "snapshots",
        "true",
        "jobs",
        "hintcachewarm",
        "side"
      ]
    },
    {
      "id": "api/warm-cache#layer-warm",
      "kind": "section",
      "title": "Warm cache",
      "heading": "Layer warm",
      "group": "API",
      "url": "/docs/api/warm-cache#layer-warm",
      "summary": "The Layer warm route creates an asynchronous job that pages through Turbopuffer, backfills the cache, and refreshes the warmed-through marker, intended for bootstrapping a namespace whose data was written outside the gateway; it returns an accepted warm job to poll.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products/warm?page_size=1000",
          "chunkId": "api/warm-cache#layer-warm"
        },
        {
          "kind": "code",
          "literal": "{\n  \"id\": \"warm-job-uuid\",\n  \"namespace\": \"products\",\n  \"status\": \"running\",\n  \"progress\": 0,\n  \"documents_scanned\": 0,\n  \"created_at\": \"2026-05-26T10:00:00Z\"\n}",
          "chunkId": "api/warm-cache#layer-warm"
        },
        {
          "kind": "code",
          "literal": "GET /v2/namespaces/products/warm-jobs/warm-job-uuid",
          "chunkId": "api/warm-cache#layer-warm"
        },
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/warm",
          "chunkId": "api/warm-cache#layer-warm"
        },
        {
          "kind": "code",
          "literal": "cache_warmed_through",
          "chunkId": "api/warm-cache#layer-warm"
        },
        {
          "kind": "code",
          "literal": "202 Accepted",
          "chunkId": "api/warm-cache#layer-warm"
        }
      ],
      "sources": [
        {
          "chunkId": "api/warm-cache#layer-warm",
          "url": "/docs/api/warm-cache#layer-warm",
          "anchor": "layer-warm"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "layer",
        "warm",
        "route",
        "creates",
        "asynchronous",
        "pages",
        "through",
        "turbopuffer",
        "backfills",
        "cache",
        "refreshes",
        "warmed",
        "marker",
        "intended",
        "bootstrapping",
        "namespace",
        "whose",
        "data",
        "written",
        "outside",
        "gateway",
        "returns",
        "accepted",
        "poll",
        "post",
        "namespaces",
        "products",
        "page",
        "size",
        "1000",
        "uuid",
        "status",
        "running",
        "progress",
        "documents",
        "scanned",
        "created",
        "2026",
        "26t10",
        "jobs"
      ]
    },
    {
      "id": "api/write",
      "kind": "section",
      "title": "Write & Stage",
      "heading": null,
      "group": "API",
      "url": "/docs/api/write",
      "summary": "The write path is wire-compatible with the upstream write endpoint, with the documented shape showing only what Layer adds and the upstream docs covering the full request schema.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}",
          "chunkId": "api/write"
        },
        {
          "kind": "value",
          "literal": "Upstream.astro",
          "chunkId": "api/write"
        },
        {
          "kind": "value",
          "literal": "turbopuffer.com",
          "chunkId": "api/write"
        }
      ],
      "sources": [
        {
          "chunkId": "api/write",
          "url": "/docs/api/write",
          "anchor": null
        }
      ],
      "mode": "source-primary",
      "terms": [
        "write",
        "path",
        "wire",
        "compatible",
        "upstream",
        "endpoint",
        "documented",
        "shape",
        "showing",
        "only",
        "layer",
        "adds",
        "docs",
        "covering",
        "full",
        "request",
        "schema",
        "post",
        "namespaces",
        "astro",
        "turbopuffer",
        "upsert",
        "delete",
        "patch",
        "stage",
        "rows",
        "namespace",
        "below",
        "shows"
      ]
    },
    {
      "id": "api/write#patch",
      "kind": "section",
      "title": "Write & Stage",
      "heading": "Patch",
      "group": "API",
      "url": "/docs/api/write#patch",
      "summary": "Patch preserves unspecified attributes and maps to the upstream patch-rows operation, but vectors cannot be patched (re-upsert the full document instead), and the upsert-time stamp is bumped on every patch so watermark-filtered reads see the patched row only after it is indexed.",
      "facts": [
        {
          "kind": "code",
          "literal": "PATCH /v2/namespaces/products\nContent-Type: application/json\n\n{\n  \"patches\": [\n    {\"id\": \"asin-B08N5WRWNW\", \"attributes\": {\"category\": \"Audio\"}}\n  ]\n}",
          "chunkId": "api/write#patch"
        },
        {
          "kind": "code",
          "literal": "patch_rows",
          "chunkId": "api/write#patch"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "api/write#patch"
        }
      ],
      "sources": [
        {
          "chunkId": "api/write#patch",
          "url": "/docs/api/write#patch",
          "anchor": "patch"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "patch",
        "preserves",
        "unspecified",
        "attributes",
        "maps",
        "upstream",
        "rows",
        "operation",
        "vectors",
        "cannot",
        "patched",
        "upsert",
        "full",
        "document",
        "instead",
        "time",
        "stamp",
        "bumped",
        "every",
        "watermark",
        "filtered",
        "reads",
        "only",
        "after",
        "indexed",
        "namespaces",
        "products",
        "content",
        "type",
        "application",
        "json",
        "patches",
        "asin",
        "b08n5wrwnw",
        "category",
        "audio",
        "hevlayer",
        "upserted",
        "turbopuffer",
        "patchrows"
      ]
    },
    {
      "id": "api/write#pipeline-stage",
      "kind": "section",
      "title": "Write & Stage",
      "heading": "Pipeline stage",
      "group": "API",
      "url": "/docs/api/write#pipeline-stage",
      "summary": "When a document is part of a pipeline the writer does not talk to the namespace directly; a CPU worker hands chunks to the pipeline, a GPU worker writes vectors back, and the gateway performs the namespace upsert. Staging stores chunks in the cache and marks the document pending, and re-staging the same id replaces the chunks and resets state.",
      "facts": [
        {
          "kind": "code",
          "literal": "PUT /v2/pipelines/product-images/documents/asin-B08N5WRWNW\nContent-Type: application/json\n\n{\n  \"chunks\": [\n    {\"id\": \"asin-B08N5WRWNW-0\", \"text\": \"Wireless noise-cancelling headphones\"},\n    {\"id\": \"asin-B08N5WRWNW-1\", \"text\": \"40-hour battery life\", \"metadata\": {\"page\": 2}}\n  ]\n}",
          "chunkId": "api/write#pipeline-stage"
        },
        {
          "kind": "code",
          "literal": "pending",
          "chunkId": "api/write#pipeline-stage"
        }
      ],
      "sources": [
        {
          "chunkId": "api/write#pipeline-stage",
          "url": "/docs/api/write#pipeline-stage",
          "anchor": "pipeline-stage"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "pipeline",
        "stage",
        "document",
        "part",
        "writer",
        "does",
        "talk",
        "namespace",
        "directly",
        "worker",
        "hands",
        "chunks",
        "writes",
        "vectors",
        "back",
        "gateway",
        "performs",
        "upsert",
        "staging",
        "stores",
        "cache",
        "marks",
        "pending",
        "same",
        "replaces",
        "resets",
        "state",
        "pipelines",
        "product",
        "images",
        "documents",
        "asin",
        "b08n5wrwnw",
        "content",
        "type",
        "application",
        "json",
        "text",
        "wireless",
        "noise"
      ]
    },
    {
      "id": "api/write#side-effects",
      "kind": "section",
      "title": "Write & Stage",
      "heading": "Side effects",
      "group": "API",
      "url": "/docs/api/write#side-effects",
      "summary": "Writes have two side effects: a best-effort NVMe cache mirror written before the upstream call that does not roll back on failure (resolved by re-sending the upsert), and the snapshot watcher re-evaluating freshness on its next poll and materializing a new snapshot if the histogram shape changed.",
      "facts": [],
      "sources": [
        {
          "chunkId": "api/write#side-effects",
          "url": "/docs/api/write#side-effects",
          "anchor": "side-effects"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "side",
        "effects",
        "writes",
        "best",
        "effort",
        "nvme",
        "cache",
        "mirror",
        "written",
        "before",
        "upstream",
        "call",
        "does",
        "roll",
        "back",
        "failure",
        "resolved",
        "sending",
        "upsert",
        "snapshot",
        "watcher",
        "evaluating",
        "freshness",
        "next",
        "poll",
        "materializing",
        "histogram",
        "shape",
        "changed",
        "effect",
        "behavior",
        "here",
        "doesn",
        "gateway",
        "briefly",
        "didn",
        "reach",
        "index",
        "resolves",
        "evaluates"
      ]
    },
    {
      "id": "api/write#upsert-and-delete",
      "kind": "section",
      "title": "Write & Stage",
      "heading": "Upsert and delete",
      "group": "API",
      "url": "/docs/api/write#upsert-and-delete",
      "summary": "Upsert and delete post lists of documents to upsert and ids to delete, returning success once the upstream write succeeds, an error when both lists are empty, and an upstream-failure error otherwise; NVMe cache writes happen first as a non-blocking best-effort side effect, and every upsert is server-stamped with a hidden upsert-time attribute that powers query consistency.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/products\nContent-Type: application/json\n\n{\n  \"upserts\": [\n    {\n      \"id\": \"asin-B08N5WRWNW\",\n      \"vector\": [0.0012, -0.043],\n      \"attributes\": {\"title\": \"Wireless headphones\", \"category\": \"Electronics\"}\n    }\n  ],\n  \"deletes\": [\"asin-old-001\"]\n}",
          "chunkId": "api/write#upsert-and-delete"
        },
        {
          "kind": "code",
          "literal": "upserts",
          "chunkId": "api/write#upsert-and-delete"
        },
        {
          "kind": "code",
          "literal": "deletes",
          "chunkId": "api/write#upsert-and-delete"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "api/write#upsert-and-delete"
        }
      ],
      "sources": [
        {
          "chunkId": "api/write#upsert-and-delete",
          "url": "/docs/api/write#upsert-and-delete",
          "anchor": "upsert-and-delete"
        }
      ],
      "mode": "source-primary",
      "terms": [
        "upsert",
        "delete",
        "post",
        "lists",
        "documents",
        "returning",
        "success",
        "once",
        "upstream",
        "write",
        "succeeds",
        "error",
        "both",
        "empty",
        "failure",
        "otherwise",
        "nvme",
        "cache",
        "writes",
        "happen",
        "first",
        "blocking",
        "best",
        "effort",
        "side",
        "effect",
        "every",
        "server",
        "stamped",
        "hidden",
        "time",
        "attribute",
        "powers",
        "query",
        "consistency",
        "namespaces",
        "products",
        "content",
        "type",
        "application"
      ]
    },
    {
      "id": "concepts",
      "kind": "section",
      "title": "Concepts",
      "heading": null,
      "group": "Overview",
      "url": "/docs/concepts",
      "summary": "Introduces how the gateway composes Turbopuffer, the NVMe cache, PostgreSQL, S3, and metrics, and the core nouns the reader will work with.",
      "facts": [],
      "sources": [
        {
          "chunkId": "concepts",
          "url": "/docs/concepts",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "introduces",
        "gateway",
        "composes",
        "turbopuffer",
        "nvme",
        "cache",
        "postgresql",
        "metrics",
        "core",
        "nouns",
        "reader",
        "work"
      ]
    },
    {
      "id": "concepts#control-loops",
      "kind": "section",
      "title": "Concepts",
      "heading": "Control loops",
      "group": "Overview",
      "url": "/docs/concepts#control-loops",
      "summary": "Layer uses a control loop as a core primitive that reconciles index state against metrics from the search system, which is how it applies row-level transformations and keeps an index's stable view current; related concepts are UDFs, snapshots, and the stable watermark.",
      "facts": [],
      "sources": [
        {
          "chunkId": "concepts#control-loops",
          "url": "/docs/concepts#control-loops",
          "anchor": "control-loops"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "control",
        "loops",
        "layer",
        "uses",
        "loop",
        "core",
        "primitive",
        "reconciles",
        "index",
        "state",
        "against",
        "metrics",
        "search",
        "system",
        "applies",
        "level",
        "transformations",
        "keeps",
        "stable",
        "view",
        "current",
        "related",
        "concepts",
        "udfs",
        "snapshots",
        "watermark",
        "managing",
        "indexes",
        "emitted"
      ]
    },
    {
      "id": "concepts#gateway-enhancements",
      "kind": "section",
      "title": "Concepts",
      "heading": "Gateway enhancements",
      "group": "Overview",
      "url": "/docs/concepts#gateway-enhancements",
      "summary": "The gateway extends the search system with common query and filtering primitives using reserved attributes, and exposes everything through a single client so applications route every call through the gateway; Layer works best when traffic flows through it consistently, and schema changes on reserved attributes degrade gracefully rather than breaking outright.",
      "facts": [
        {
          "kind": "code",
          "literal": "_hevlayer_*",
          "chunkId": "concepts#gateway-enhancements"
        }
      ],
      "sources": [
        {
          "chunkId": "concepts#gateway-enhancements",
          "url": "/docs/concepts#gateway-enhancements",
          "anchor": "gateway-enhancements"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "gateway",
        "enhancements",
        "extends",
        "search",
        "system",
        "common",
        "query",
        "filtering",
        "primitives",
        "reserved",
        "attributes",
        "exposes",
        "everything",
        "through",
        "single",
        "client",
        "applications",
        "route",
        "every",
        "call",
        "layer",
        "works",
        "best",
        "traffic",
        "flows",
        "consistently",
        "schema",
        "changes",
        "degrade",
        "gracefully",
        "rather",
        "breaking",
        "outright",
        "hevlayer",
        "helpful",
        "patterns",
        "changing",
        "those",
        "breaks",
        "guarantees"
      ]
    },
    {
      "id": "concepts#glossary",
      "kind": "section",
      "title": "Concepts",
      "heading": "Glossary",
      "group": "Overview",
      "url": "/docs/concepts#glossary",
      "summary": "Defines Layer's core nouns: namespace, document, cache, stable watermark, pipeline, snapshot, facet listing and count, result count, scan, UDF, gateway, operator, shard, CRD, and PromQL, with a one-line current meaning for each.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/namespaces/{namespace}",
          "chunkId": "concepts#glossary"
        },
        {
          "kind": "code",
          "literal": "fields[].values[].v",
          "chunkId": "concepts#glossary"
        },
        {
          "kind": "code",
          "literal": "fields[].values[].n",
          "chunkId": "concepts#glossary"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_shard",
          "chunkId": "concepts#glossary"
        }
      ],
      "sources": [
        {
          "chunkId": "concepts#glossary",
          "url": "/docs/concepts#glossary",
          "anchor": "glossary"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "glossary",
        "defines",
        "layer",
        "core",
        "nouns",
        "namespace",
        "document",
        "cache",
        "stable",
        "watermark",
        "pipeline",
        "snapshot",
        "facet",
        "listing",
        "count",
        "result",
        "scan",
        "gateway",
        "operator",
        "shard",
        "promql",
        "line",
        "current",
        "meaning",
        "namespaces",
        "fields",
        "values",
        "hevlayer",
        "concept",
        "turbopuffer",
        "addressed",
        "through",
        "plus",
        "attributes",
        "optionally",
        "vector",
        "writing",
        "searching",
        "nvme",
        "backed"
      ]
    },
    {
      "id": "concepts#kubernetes-autoscaling",
      "kind": "section",
      "title": "Concepts",
      "heading": "Kubernetes autoscaling",
      "group": "Overview",
      "url": "/docs/concepts#kubernetes-autoscaling",
      "summary": "Because Layer is stateless, every tier autoscales independently: a node autoscaler handles node-level scaling and a pod autoscaler scales against signals from an embedded PostgreSQL queue whose data is used for scaling decisions only and carries no non-recoverable state.",
      "facts": [],
      "sources": [
        {
          "chunkId": "concepts#kubernetes-autoscaling",
          "url": "/docs/concepts#kubernetes-autoscaling",
          "anchor": "kubernetes-autoscaling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "kubernetes",
        "autoscaling",
        "because",
        "layer",
        "stateless",
        "every",
        "tier",
        "autoscales",
        "independently",
        "node",
        "autoscaler",
        "handles",
        "level",
        "scaling",
        "scales",
        "against",
        "signals",
        "embedded",
        "postgresql",
        "queue",
        "whose",
        "data",
        "decisions",
        "only",
        "carries",
        "recoverable",
        "state",
        "autoscale",
        "karpenter",
        "keda",
        "pods",
        "system"
      ]
    },
    {
      "id": "concepts#observability-as-code",
      "kind": "section",
      "title": "Concepts",
      "heading": "Observability as code",
      "group": "Overview",
      "url": "/docs/concepts#observability-as-code",
      "summary": "Layer's observability contract lives in the service itself: the gateway emits a self-describing catalog of every metric (names, labels, example PromQL) so the metric surface is code rather than hand-maintained config, and the bundled dashboard, external automation, and an embedded Prometheus-compatible metrics store all read from it.",
      "facts": [],
      "sources": [
        {
          "chunkId": "concepts#observability-as-code",
          "url": "/docs/concepts#observability-as-code",
          "anchor": "observability-as-code"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "observability",
        "code",
        "layer",
        "contract",
        "lives",
        "service",
        "itself",
        "gateway",
        "emits",
        "self",
        "describing",
        "catalog",
        "every",
        "metric",
        "names",
        "labels",
        "example",
        "promql",
        "surface",
        "rather",
        "hand",
        "maintained",
        "config",
        "bundled",
        "dashboard",
        "external",
        "automation",
        "embedded",
        "prometheus",
        "compatible",
        "metrics",
        "store",
        "read",
        "defined",
        "exports",
        "victoriametrics",
        "instance",
        "lets",
        "against",
        "series"
      ]
    },
    {
      "id": "concepts#pull-through-cache",
      "kind": "section",
      "title": "Concepts",
      "heading": "Pull-through cache",
      "group": "Overview",
      "url": "/docs/concepts#pull-through-cache",
      "summary": "Document reads are served by a pull-through cache that checks the NVMe-backed cache first and on a miss reads through to origin (or S3 for snapshots), returns the row, and backfills best-effort; the cache is a read accelerator, not a hard dependency, so reads still succeed if it is unavailable, and one logical cache serves every read path separated by set.",
      "facts": [
        {
          "kind": "code",
          "literal": "set",
          "chunkId": "concepts#pull-through-cache"
        }
      ],
      "sources": [
        {
          "chunkId": "concepts#pull-through-cache",
          "url": "/docs/concepts#pull-through-cache",
          "anchor": "pull-through-cache"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pull",
        "through",
        "cache",
        "document",
        "reads",
        "served",
        "checks",
        "nvme",
        "backed",
        "first",
        "miss",
        "origin",
        "snapshots",
        "returns",
        "backfills",
        "best",
        "effort",
        "read",
        "accelerator",
        "hard",
        "dependency",
        "still",
        "succeed",
        "unavailable",
        "logical",
        "serves",
        "every",
        "path",
        "separated",
        "gateway",
        "aerospike",
        "turbopuffer",
        "fall",
        "different",
        "uses",
        "fetch",
        "snapshot",
        "field",
        "values"
      ]
    },
    {
      "id": "concepts#scattergather",
      "kind": "section",
      "title": "Concepts",
      "heading": "Scatter/gather",
      "group": "Overview",
      "url": "/docs/concepts#scattergather",
      "summary": "Layer can partition a single namespace into hash-bucket shards by assigning each row a reserved shard attribute, then scatters a query to every bucket in parallel and gathers, merges, and re-ranks the results down to the requested top-k; sharding is invisible to the client and the same path backs result count, scans, and UDF discovery scans.",
      "facts": [
        {
          "kind": "code",
          "literal": "_hevlayer_shard",
          "chunkId": "concepts#scattergather"
        },
        {
          "kind": "code",
          "literal": "top_k",
          "chunkId": "concepts#scattergather"
        },
        {
          "kind": "flag",
          "literal": "-filtered",
          "chunkId": "concepts#scattergather"
        }
      ],
      "sources": [
        {
          "chunkId": "concepts#scattergather",
          "url": "/docs/concepts#scattergather",
          "anchor": "scattergather"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scatter",
        "gather",
        "layer",
        "partition",
        "single",
        "namespace",
        "hash",
        "bucket",
        "shards",
        "assigning",
        "reserved",
        "shard",
        "attribute",
        "scatters",
        "query",
        "every",
        "parallel",
        "gathers",
        "merges",
        "ranks",
        "results",
        "down",
        "requested",
        "sharding",
        "invisible",
        "client",
        "same",
        "path",
        "backs",
        "result",
        "count",
        "scans",
        "discovery",
        "hevlayer",
        "filtered",
        "buckets",
        "hevlayershard",
        "xxh64",
        "modulo",
        "gateway"
      ]
    },
    {
      "id": "dashboard",
      "kind": "section",
      "title": "Dashboard",
      "heading": null,
      "group": "Guides",
      "url": "/docs/dashboard",
      "summary": "The Layer dashboard is the in-cluster operator surface that reads only from the same gateway API customers use, surfacing the views that justify Layer's role between an application and its vector store; managed deployments reach it at a hosted URL and self-hosted installs expose it via a Service.",
      "facts": [
        {
          "kind": "code",
          "literal": "https://dashboard.hevlayer.com",
          "chunkId": "dashboard"
        },
        {
          "kind": "code",
          "literal": "layer-dashboard",
          "chunkId": "dashboard"
        },
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "dashboard"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard",
          "url": "/docs/dashboard",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "dashboard",
        "cluster",
        "operator",
        "surface",
        "reads",
        "only",
        "same",
        "gateway",
        "customers",
        "surfacing",
        "views",
        "justify",
        "role",
        "between",
        "application",
        "vector",
        "store",
        "managed",
        "deployments",
        "reach",
        "hosted",
        "self",
        "installs",
        "expose",
        "service",
        "https",
        "hevlayer",
        "callout",
        "astro",
        "pipeline",
        "worker",
        "scaling",
        "read",
        "write",
        "health",
        "cost",
        "observability",
        "operators",
        "ships"
      ]
    },
    {
      "id": "dashboard#console",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Console",
      "group": "Guides",
      "url": "/docs/dashboard#console",
      "summary": "The console is the first operator view, with an at-a-glance stripe of single-number cards (queries/s, indexed rows/s, fetch latency, cache hit ratio, error budget burn) that link into matching panels, and a newest-first activity log backed by the snapshot-activity and search-history feeds, with URL-persisted filters.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/activity/snapshots",
          "chunkId": "dashboard#console"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#console",
          "url": "/docs/dashboard#console",
          "anchor": "console"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "console",
        "first",
        "operator",
        "view",
        "glance",
        "stripe",
        "single",
        "number",
        "cards",
        "queries",
        "indexed",
        "rows",
        "fetch",
        "latency",
        "cache",
        "ratio",
        "error",
        "budget",
        "burn",
        "link",
        "matching",
        "panels",
        "newest",
        "activity",
        "backed",
        "snapshot",
        "search",
        "history",
        "feeds",
        "persisted",
        "filters",
        "snapshots",
        "opens",
        "stripes",
        "card",
        "links",
        "read",
        "write",
        "observe",
        "panel"
      ]
    },
    {
      "id": "dashboard#cost",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Cost",
      "group": "Guides",
      "url": "/docs/dashboard#cost",
      "summary": "The cost view is a stacked-area chart driven by cost endpoints that splits spend across AWS infrastructure lines (from CloudWatch and the AWS Pricing API) and Turbopuffer lines (from usage metrics times a code-resident rate card), with an instance picker projecting the impact of changing instance types; per-namespace attribution is intentionally not modeled.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/cost",
          "chunkId": "dashboard#cost"
        },
        {
          "kind": "code",
          "literal": "/v2/cost/timeseries",
          "chunkId": "dashboard#cost"
        },
        {
          "kind": "code",
          "literal": "/v2/cost/rate-card",
          "chunkId": "dashboard#cost"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#cost",
          "url": "/docs/dashboard#cost",
          "anchor": "cost"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "cost",
        "view",
        "stacked",
        "area",
        "chart",
        "driven",
        "endpoints",
        "splits",
        "spend",
        "across",
        "infrastructure",
        "lines",
        "cloudwatch",
        "pricing",
        "turbopuffer",
        "usage",
        "metrics",
        "times",
        "code",
        "resident",
        "rate",
        "card",
        "instance",
        "picker",
        "projecting",
        "impact",
        "changing",
        "types",
        "namespace",
        "attribution",
        "intentionally",
        "modeled",
        "timeseries",
        "compute",
        "computed",
        "storage",
        "writes",
        "queries",
        "uses",
        "endpoint"
      ]
    },
    {
      "id": "dashboard#data",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Data",
      "group": "Guides",
      "url": "/docs/dashboard#data",
      "summary": "The data view is the namespace inventory; drilling into a namespace shows schema and approximate row count, recent snapshot SHAs with histograms and skipped-field markers, current freshness signals, the governing Index policy fields, and a unified jobs panel. Two operator actions live here: trigger an on-demand snapshot and delete the namespace behind a confirm dialog.",
      "facts": [
        {
          "kind": "code",
          "literal": "stable_as_of",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "is_stable",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "distanceMetric",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "cache.warming.threads",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "POST /v2/namespaces/{ns}/snapshots",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "origin",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "stored",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "cache",
          "chunkId": "dashboard#data"
        },
        {
          "kind": "code",
          "literal": "DELETE /v2/namespaces/{ns}",
          "chunkId": "dashboard#data"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#data",
          "url": "/docs/dashboard#data",
          "anchor": "data"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "data",
        "view",
        "namespace",
        "inventory",
        "drilling",
        "shows",
        "schema",
        "approximate",
        "count",
        "recent",
        "snapshot",
        "shas",
        "histograms",
        "skipped",
        "field",
        "markers",
        "current",
        "freshness",
        "signals",
        "governing",
        "index",
        "policy",
        "fields",
        "unified",
        "jobs",
        "panel",
        "operator",
        "actions",
        "live",
        "here",
        "trigger",
        "demand",
        "delete",
        "behind",
        "confirm",
        "dialog",
        "stable",
        "distancemetric",
        "cache",
        "warming"
      ]
    },
    {
      "id": "dashboard#layout",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Layout",
      "group": "Guides",
      "url": "/docs/dashboard#layout",
      "summary": "The dashboard groups everything into six tabs: console (what is happening now), data (what is in the indexes), read (query health), write (write flow and pipelines), cost (spend over time), and observe (the metrics catalog by family).",
      "facts": [],
      "sources": [
        {
          "chunkId": "dashboard#layout",
          "url": "/docs/dashboard#layout",
          "anchor": "layout"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layout",
        "dashboard",
        "groups",
        "everything",
        "tabs",
        "console",
        "happening",
        "data",
        "indexes",
        "read",
        "query",
        "health",
        "write",
        "flow",
        "pipelines",
        "cost",
        "spend",
        "time",
        "observe",
        "metrics",
        "catalog",
        "family",
        "operators",
        "care",
        "about",
        "answers",
        "right",
        "glance",
        "gauges",
        "activity",
        "namespace",
        "inventory",
        "snapshot",
        "history",
        "schema",
        "queries",
        "healthy",
        "latency",
        "overhead",
        "aerospike"
      ]
    },
    {
      "id": "dashboard#observe",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Observe",
      "group": "Guides",
      "url": "/docs/dashboard#observe",
      "summary": "The observe view shows the full metrics catalog grouped by family, with each metric expanding into a sparkline that runs its PromQL through the range-query passthrough, used to confirm a behavior hypothesis without leaving the dashboard for an external tool.",
      "facts": [
        {
          "kind": "code",
          "literal": "/v2/metrics/api/v1/query_range",
          "chunkId": "dashboard#observe"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#observe",
          "url": "/docs/dashboard#observe",
          "anchor": "observe"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "observe",
        "view",
        "shows",
        "full",
        "metrics",
        "catalog",
        "grouped",
        "family",
        "metric",
        "expanding",
        "sparkline",
        "runs",
        "promql",
        "through",
        "range",
        "query",
        "passthrough",
        "confirm",
        "behavior",
        "hypothesis",
        "without",
        "leaving",
        "dashboard",
        "external",
        "tool",
        "turbopuffer",
        "cache",
        "fetch",
        "pipeline",
        "progress",
        "resource",
        "saturation",
        "expands",
        "corresponding",
        "queryrange",
        "surface",
        "operators",
        "need",
        "about",
        "grafana"
      ]
    },
    {
      "id": "dashboard#operational-notes",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Operational notes",
      "group": "Guides",
      "url": "/docs/dashboard#operational-notes",
      "summary": "Pipeline status is cached in-memory in the gateway to protect PostgreSQL during repeated dashboard or autoscaler polling; the dashboard treats a recoverable cache-cold state and a non-recoverable upstream failure as separate operator states, never receives the dashboard URL for customer workloads, and is intentionally read-mostly with mutations gated behind CRD apply or confirm dialogs.",
      "facts": [
        {
          "kind": "code",
          "literal": "PIPELINE_STATUS_CACHE_TTL_MS",
          "chunkId": "dashboard#operational-notes"
        },
        {
          "kind": "code",
          "literal": "cache_cold",
          "chunkId": "dashboard#operational-notes"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#operational-notes",
          "url": "/docs/dashboard#operational-notes",
          "anchor": "operational-notes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "operational",
        "notes",
        "pipeline",
        "status",
        "cached",
        "memory",
        "gateway",
        "protect",
        "postgresql",
        "during",
        "repeated",
        "dashboard",
        "autoscaler",
        "polling",
        "treats",
        "recoverable",
        "cache",
        "cold",
        "state",
        "upstream",
        "failure",
        "separate",
        "operator",
        "states",
        "never",
        "receives",
        "customer",
        "workloads",
        "intentionally",
        "read",
        "mostly",
        "mutations",
        "gated",
        "behind",
        "apply",
        "confirm",
        "dialogs",
        "keda",
        "pipelinestatuscachettlms",
        "defaults"
      ]
    },
    {
      "id": "dashboard#read",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Read",
      "group": "Guides",
      "url": "/docs/dashboard#read",
      "summary": "The read view answers whether queries are healthy, pulling from query and cache metric families to show query latency percentiles, Layer-side overhead so operators can tell upstream from local slowness, per-namespace cache hit ratio, and cache pool depth, node state, and stop-writes as a silent-failure surface.",
      "facts": [
        {
          "kind": "code",
          "literal": "layer_query_*",
          "chunkId": "dashboard#read"
        },
        {
          "kind": "code",
          "literal": "query_overhead_seconds",
          "chunkId": "dashboard#read"
        },
        {
          "kind": "code",
          "literal": "layer_cache_lookups_total",
          "chunkId": "dashboard#read"
        },
        {
          "kind": "code",
          "literal": "layer_aerospike_op_duration_seconds{status=\"aerospike_stop_writes\"}",
          "chunkId": "dashboard#read"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#read",
          "url": "/docs/dashboard#read",
          "anchor": "read"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "read",
        "view",
        "answers",
        "whether",
        "queries",
        "healthy",
        "pulling",
        "query",
        "cache",
        "metric",
        "families",
        "show",
        "latency",
        "percentiles",
        "layer",
        "side",
        "overhead",
        "operators",
        "tell",
        "upstream",
        "local",
        "slowness",
        "namespace",
        "ratio",
        "pool",
        "depth",
        "node",
        "state",
        "stop",
        "writes",
        "silent",
        "failure",
        "surface",
        "seconds",
        "lookups",
        "total",
        "aerospike",
        "duration",
        "status",
        "operator"
      ]
    },
    {
      "id": "dashboard#write",
      "kind": "section",
      "title": "Dashboard",
      "heading": "Write",
      "group": "Guides",
      "url": "/docs/dashboard#write",
      "summary": "The write view is the pipeline operator surface showing pending/in-flight/failed counts per pipeline and UDF (the same numbers the autoscaler uses), per-stage counts, active claims with lease and heartbeat state, embed pool size, and reset/pause/resume controls; an infra sub-view leads with the logical compute pools above the node pools, and it is the first stop for PostgreSQL pressure, pointing operators to the failure-mode runbook before resizing queue state.",
      "facts": [
        {
          "kind": "code",
          "literal": "pending",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "embedding",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "indexed",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "failed",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "worker_id",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "/v2/udfs/{id}/{pause,resume,reset-failed}",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "maxReplicasPerWorkload",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "spec.scaling.pool",
          "chunkId": "dashboard#write"
        },
        {
          "kind": "code",
          "literal": "layer_pg_query_duration_seconds{status=\"pg_error\"}",
          "chunkId": "dashboard#write"
        }
      ],
      "sources": [
        {
          "chunkId": "dashboard#write",
          "url": "/docs/dashboard#write",
          "anchor": "write"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "write",
        "view",
        "pipeline",
        "operator",
        "surface",
        "showing",
        "pending",
        "flight",
        "failed",
        "counts",
        "same",
        "numbers",
        "autoscaler",
        "uses",
        "stage",
        "active",
        "claims",
        "lease",
        "heartbeat",
        "state",
        "embed",
        "pool",
        "size",
        "reset",
        "pause",
        "resume",
        "controls",
        "infra",
        "leads",
        "logical",
        "compute",
        "pools",
        "above",
        "node",
        "first",
        "stop",
        "postgresql",
        "pressure",
        "pointing",
        "operators"
      ]
    },
    {
      "id": "document-model",
      "kind": "section",
      "title": "Document model",
      "heading": null,
      "group": "Overview",
      "url": "/docs/document-model",
      "summary": "A Layer document is a Turbopuffer row (id, attributes, optional vector) read and written through the pull-through cache, with Layer reserving an attribute prefix for its own bookkeeping that callers and UDFs must not set; the gateway manages an upsert-time stamp that holds the read-consistency cut and a shard attribute for scatter/gather, and editing reserved attributes directly breaks guarantees but degrades gracefully.",
      "facts": [
        {
          "kind": "code",
          "literal": "_hevlayer_*",
          "chunkId": "document-model"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "document-model"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at <= watermark",
          "chunkId": "document-model"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_shard",
          "chunkId": "document-model"
        },
        {
          "kind": "code",
          "literal": "xxh64(id) % shard_count",
          "chunkId": "document-model"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_",
          "chunkId": "document-model"
        }
      ],
      "sources": [
        {
          "chunkId": "document-model",
          "url": "/docs/document-model",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "document",
        "turbopuffer",
        "attributes",
        "optional",
        "vector",
        "read",
        "written",
        "through",
        "pull",
        "cache",
        "reserving",
        "attribute",
        "prefix",
        "bookkeeping",
        "callers",
        "udfs",
        "must",
        "gateway",
        "manages",
        "upsert",
        "time",
        "stamp",
        "holds",
        "consistency",
        "shard",
        "scatter",
        "gather",
        "editing",
        "reserved",
        "directly",
        "breaks",
        "guarantees",
        "degrades",
        "gracefully",
        "hevlayer",
        "upserted",
        "watermark",
        "xxh64",
        "count"
      ]
    },
    {
      "id": "failure-modes",
      "kind": "section",
      "title": "Failure Modes",
      "heading": null,
      "group": "Operations",
      "url": "/docs/failure-modes",
      "summary": "Introduces how reads and writes degrade when the gateway, cache, or pipeline runs into trouble.",
      "facts": [],
      "sources": [
        {
          "chunkId": "failure-modes",
          "url": "/docs/failure-modes",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "introduces",
        "reads",
        "writes",
        "degrade",
        "gateway",
        "cache",
        "pipeline",
        "runs",
        "trouble"
      ]
    },
    {
      "id": "failure-modes#read",
      "kind": "section",
      "title": "Failure Modes",
      "heading": "Read",
      "group": "Operations",
      "url": "/docs/failure-modes#read",
      "summary": "If the gateway is down, queries are down; the document cache is stateless and can scale to zero with no disruption, and no other component sits on the read path.",
      "facts": [],
      "sources": [
        {
          "chunkId": "failure-modes#read",
          "url": "/docs/failure-modes#read",
          "anchor": "read"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "read",
        "gateway",
        "down",
        "queries",
        "document",
        "cache",
        "stateless",
        "scale",
        "zero",
        "disruption",
        "other",
        "component",
        "sits",
        "path",
        "components",
        "impact"
      ]
    },
    {
      "id": "failure-modes#write",
      "kind": "section",
      "title": "Failure Modes",
      "heading": "Write",
      "group": "Operations",
      "url": "/docs/failure-modes#write",
      "summary": "The primary write failure is a cache stop-writes during a multi-stage pipeline job: staged documents stay warm but lack vectors, and exceeding the cache drive allocation halts writes and degrades the pipeline to S3-backed chunk reads. Recovery works because chunk bodies are durable in S3 and pending state is in PostgreSQL, so workers resume after the cache refills; the Helm cache restarts on stop-writes and clears its backing file on start, making a pod restart a valid recovery action, with S3 and PostgreSQL as the durable recovery boundary.",
      "facts": [
        {
          "kind": "code",
          "literal": "documentCache.autoRestartOnStopWrites: true",
          "chunkId": "failure-modes#write"
        },
        {
          "kind": "code",
          "literal": "documentCache.storage.resetOnStart: true",
          "chunkId": "failure-modes#write"
        }
      ],
      "sources": [
        {
          "chunkId": "failure-modes#write",
          "url": "/docs/failure-modes#write",
          "anchor": "write"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "write",
        "primary",
        "failure",
        "cache",
        "stop",
        "writes",
        "during",
        "multi",
        "stage",
        "pipeline",
        "staged",
        "documents",
        "stay",
        "warm",
        "lack",
        "vectors",
        "exceeding",
        "drive",
        "allocation",
        "halts",
        "degrades",
        "backed",
        "chunk",
        "reads",
        "recovery",
        "works",
        "because",
        "bodies",
        "durable",
        "pending",
        "state",
        "postgresql",
        "workers",
        "resume",
        "after",
        "refills",
        "helm",
        "restarts",
        "clears",
        "backing"
      ]
    },
    {
      "id": "guarantees",
      "kind": "section",
      "title": "No Guarantees",
      "heading": null,
      "group": "Overview",
      "url": "/docs/guarantees",
      "summary": "Layer does not offer hard guarantees; instead it makes a set of design, security, and distribution promises intended to make the software easy to use and durable, and this page tracks the status of those promises for infrastructure the customer is ultimately responsible for.",
      "facts": [
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "guarantees"
        }
      ],
      "sources": [
        {
          "chunkId": "guarantees",
          "url": "/docs/guarantees",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "does",
        "offer",
        "hard",
        "guarantees",
        "instead",
        "makes",
        "design",
        "security",
        "distribution",
        "promises",
        "intended",
        "make",
        "software",
        "easy",
        "durable",
        "page",
        "tracks",
        "status",
        "those",
        "infrastructure",
        "customer",
        "ultimately",
        "responsible",
        "callout",
        "astro",
        "here",
        "commit",
        "best",
        "provide",
        "secure",
        "hands",
        "while",
        "distribute",
        "believe",
        "stand",
        "test",
        "time",
        "covers",
        "specific"
      ]
    },
    {
      "id": "guarantees#commitments",
      "kind": "section",
      "title": "No Guarantees",
      "heading": "Commitments",
      "group": "Overview",
      "url": "/docs/guarantees#commitments",
      "summary": "The commitments are: the search index stays in the customer's own search system (Layer will not reimplement indexing), history is backed up to a customer-specified S3 bucket (format may change before v1.0), customer document and chunk data is served from NVMe, the docs are accurate (inaccuracy is a bug to report), observability is documented as code and tested, the gateway degrades gracefully, and Layer stays client-compatible except where divergence is a deliberate improvement. It notes Layer was built by one person orchestrating agentic coding tools.",
      "facts": [
        {
          "kind": "value",
          "literal": "v1.0",
          "chunkId": "guarantees#commitments"
        }
      ],
      "sources": [
        {
          "chunkId": "guarantees#commitments",
          "url": "/docs/guarantees#commitments",
          "anchor": "commitments"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "commitments",
        "search",
        "index",
        "stays",
        "customer",
        "system",
        "layer",
        "reimplement",
        "indexing",
        "history",
        "backed",
        "specified",
        "bucket",
        "format",
        "change",
        "before",
        "document",
        "chunk",
        "data",
        "served",
        "nvme",
        "docs",
        "accurate",
        "inaccuracy",
        "report",
        "observability",
        "documented",
        "code",
        "tested",
        "gateway",
        "degrades",
        "gracefully",
        "client",
        "compatible",
        "except",
        "divergence",
        "deliberate",
        "improvement",
        "notes",
        "built"
      ]
    },
    {
      "id": "hev-shop",
      "kind": "section",
      "title": "hev-shop",
      "heading": null,
      "group": "Guides",
      "url": "/docs/hev-shop",
      "summary": "hev-shop is a reference semantic-search application built on Layer, with source included for design-preview participants.",
      "facts": [
        {
          "kind": "value",
          "literal": "LinkGrid.astro",
          "chunkId": "hev-shop"
        }
      ],
      "sources": [
        {
          "chunkId": "hev-shop",
          "url": "/docs/hev-shop",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "shop",
        "reference",
        "semantic",
        "search",
        "application",
        "built",
        "layer",
        "source",
        "included",
        "design",
        "preview",
        "participants",
        "linkgrid",
        "astro"
      ]
    },
    {
      "id": "hev-shop#reference-starter-kit",
      "kind": "section",
      "title": "hev-shop",
      "heading": "Reference starter kit",
      "group": "Guides",
      "url": "/docs/hev-shop#reference-starter-kit",
      "summary": "Design-preview participants get private repo access and fork hev-shop as a starting point; the pieces worth knowing are the single HTTP client path to the gateway, the claim/heartbeat/stage/completion pipeline lifecycle, the search route preserving the freshness timestamp, and the Helm chart with pipeline-metric scaling and optional CPU/GPU node pools.",
      "facts": [
        {
          "kind": "value",
          "literal": "pipeline.py",
          "chunkId": "hev-shop#reference-starter-kit"
        },
        {
          "kind": "value",
          "literal": "route.ts",
          "chunkId": "hev-shop#reference-starter-kit"
        },
        {
          "kind": "value",
          "literal": "backend.ts",
          "chunkId": "hev-shop#reference-starter-kit"
        }
      ],
      "sources": [
        {
          "chunkId": "hev-shop#reference-starter-kit",
          "url": "/docs/hev-shop#reference-starter-kit",
          "anchor": "reference-starter-kit"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "reference",
        "starter",
        "design",
        "preview",
        "participants",
        "private",
        "repo",
        "access",
        "fork",
        "shop",
        "starting",
        "point",
        "pieces",
        "worth",
        "knowing",
        "single",
        "http",
        "client",
        "path",
        "gateway",
        "claim",
        "heartbeat",
        "stage",
        "completion",
        "pipeline",
        "lifecycle",
        "search",
        "route",
        "preserving",
        "freshness",
        "timestamp",
        "helm",
        "chart",
        "metric",
        "scaling",
        "optional",
        "node",
        "pools",
        "backend",
        "their"
      ]
    },
    {
      "id": "hev-shop#what-hev-shop-is",
      "kind": "section",
      "title": "hev-shop",
      "heading": "What hev-shop is",
      "group": "Guides",
      "url": "/docs/hev-shop#what-hev-shop-is",
      "summary": "hev-shop is a live semantic shopping app built on the gateway that turns a public product/review dataset into vectors written through Layer into Turbopuffer and serves search, filters, product pages, and review-derived tags; the storefront is public but the source ships only as a reference starter kit to design-preview participants, not as open source.",
      "facts": [
        {
          "kind": "value",
          "literal": "hev-shop.com",
          "chunkId": "hev-shop#what-hev-shop-is"
        }
      ],
      "sources": [
        {
          "chunkId": "hev-shop#what-hev-shop-is",
          "url": "/docs/hev-shop#what-hev-shop-is",
          "anchor": "what-hev-shop-is"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "shop",
        "live",
        "semantic",
        "shopping",
        "built",
        "gateway",
        "turns",
        "public",
        "product",
        "review",
        "dataset",
        "vectors",
        "written",
        "through",
        "layer",
        "turbopuffer",
        "serves",
        "search",
        "filters",
        "pages",
        "derived",
        "tags",
        "storefront",
        "source",
        "ships",
        "only",
        "reference",
        "starter",
        "design",
        "preview",
        "participants",
        "open",
        "amazon",
        "reviews",
        "2023",
        "data",
        "writes",
        "running",
        "backed",
        "workload"
      ]
    },
    {
      "id": "hev-shop#why-it-matters",
      "kind": "section",
      "title": "hev-shop",
      "heading": "Why it matters",
      "group": "Guides",
      "url": "/docs/hev-shop#why-it-matters",
      "summary": "The repo is not a generic ecommerce starter but a concrete application contract (stage, claim, embed, write vectors, query with freshness signals, let the gateway own the Turbopuffer edge) so teams start from a working pattern rather than a blank slate.",
      "facts": [],
      "sources": [
        {
          "chunkId": "hev-shop#why-it-matters",
          "url": "/docs/hev-shop#why-it-matters",
          "anchor": "why-it-matters"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "matters",
        "repo",
        "generic",
        "ecommerce",
        "starter",
        "concrete",
        "application",
        "contract",
        "stage",
        "claim",
        "embed",
        "write",
        "vectors",
        "query",
        "freshness",
        "signals",
        "gateway",
        "turbopuffer",
        "edge",
        "teams",
        "start",
        "working",
        "pattern",
        "rather",
        "blank",
        "slate",
        "makes",
        "work",
        "team",
        "starts"
      ]
    },
    {
      "id": "index",
      "kind": "section",
      "title": "Introduction",
      "heading": null,
      "group": "Overview",
      "url": "/docs",
      "summary": "Layer is a gateway and function runtime for retrieval systems that scales compute over multi-stage indexing pipelines and runs functions across every row of an index, with durable state in object storage. The customer runs two server components in their cluster: a Rust gateway that transparently proxies Turbopuffer (adding fetch, scans, snapshots, result count, and cache/write/pipeline semantics, and driving the function runtime) and a Kubernetes operator. The stateless compute tier is fully elastic, an optional dashboard manages config through CRDs, a node autoscaler provisions nodes for bursty GPU work, and the backing services (document cache, indexing-state store, metrics store) are all open source.",
      "facts": [
        {
          "kind": "value",
          "literal": "Apache-2",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "AGPL-3",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "Diagram.astro",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "karpenter.sh",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "Apache-2.0",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "aerospike.com",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "AGPL-3.0",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "www.postgresql.org",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "victoriametrics.com",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "2.0",
          "chunkId": "index"
        },
        {
          "kind": "value",
          "literal": "3.0",
          "chunkId": "index"
        }
      ],
      "sources": [
        {
          "chunkId": "index",
          "url": "/docs",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "gateway",
        "function",
        "runtime",
        "retrieval",
        "systems",
        "scales",
        "compute",
        "multi",
        "stage",
        "indexing",
        "pipelines",
        "runs",
        "functions",
        "across",
        "every",
        "index",
        "durable",
        "state",
        "object",
        "storage",
        "customer",
        "server",
        "components",
        "their",
        "cluster",
        "rust",
        "transparently",
        "proxies",
        "turbopuffer",
        "adding",
        "fetch",
        "scans",
        "snapshots",
        "result",
        "count",
        "cache",
        "write",
        "pipeline",
        "semantics"
      ]
    },
    {
      "id": "install",
      "kind": "section",
      "title": "Install",
      "heading": null,
      "group": "Operations",
      "url": "/docs/install",
      "summary": "A Layer install has two stages: Terraform provisions the required AWS resources (IAM, S3, ECR, networking, cost-read roles, and optionally a fresh cluster), and Helm installs the gateway, operator, and document cache into that cluster wired to those resources. Terraform can be skipped if the AWS resources already exist, at minimum an S3 bucket and gateway IAM role for snapshots and history.",
      "facts": [
        {
          "kind": "value",
          "literal": "LinkGrid.astro",
          "chunkId": "install"
        }
      ],
      "sources": [
        {
          "chunkId": "install",
          "url": "/docs/install",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "install",
        "stages",
        "terraform",
        "provisions",
        "required",
        "resources",
        "networking",
        "cost",
        "read",
        "roles",
        "optionally",
        "fresh",
        "cluster",
        "helm",
        "installs",
        "gateway",
        "operator",
        "document",
        "cache",
        "wired",
        "those",
        "skipped",
        "already",
        "exist",
        "minimum",
        "bucket",
        "role",
        "snapshots",
        "history",
        "linkgrid",
        "astro",
        "bring",
        "environment",
        "runtime",
        "recommended",
        "path",
        "wires",
        "produced",
        "skip"
      ]
    },
    {
      "id": "install#what-ships-in-01",
      "kind": "section",
      "title": "Install",
      "heading": "What ships in 0.1",
      "group": "Operations",
      "url": "/docs/install#what-ships-in-01",
      "summary": "The 0.1 install is single-tenant: one Helm release per environment, one Turbopuffer credential per release, and one S3 bucket for snapshot and history data, with multi-tenant gateway scoping on the later roadmap and not yet exposed.",
      "facts": [
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "install#what-ships-in-01"
        },
        {
          "kind": "value",
          "literal": "0.2",
          "chunkId": "install#what-ships-in-01"
        }
      ],
      "sources": [
        {
          "chunkId": "install#what-ships-in-01",
          "url": "/docs/install#what-ships-in-01",
          "anchor": "what-ships-in-01"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "ships",
        "install",
        "single",
        "tenant",
        "helm",
        "release",
        "environment",
        "turbopuffer",
        "credential",
        "bucket",
        "snapshot",
        "history",
        "data",
        "multi",
        "gateway",
        "scoping",
        "later",
        "roadmap",
        "exposed",
        "layer"
      ]
    },
    {
      "id": "install/helm",
      "kind": "section",
      "title": "Helm Install",
      "heading": null,
      "group": "Operations",
      "url": "/docs/install/helm",
      "summary": "The Helm chart installs the gateway, operator, and document cache into a cluster that already has the AWS resources from Terraform or equivalent customer-managed resources.",
      "facts": [
        {
          "kind": "code",
          "literal": "infra/helm/layer/",
          "chunkId": "install/helm"
        },
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "install/helm"
        }
      ],
      "sources": [
        {
          "chunkId": "install/helm",
          "url": "/docs/install/helm",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "helm",
        "chart",
        "installs",
        "gateway",
        "operator",
        "document",
        "cache",
        "cluster",
        "already",
        "resources",
        "terraform",
        "equivalent",
        "customer",
        "managed",
        "infra",
        "layer",
        "callout",
        "astro",
        "install",
        "kubernetes",
        "manage"
      ]
    },
    {
      "id": "install/helm#install",
      "kind": "section",
      "title": "Helm Install",
      "heading": "Install",
      "group": "Operations",
      "url": "/docs/install/helm#install",
      "summary": "Install with a Helm upgrade-install into a dedicated namespace using a customer values file; the chart is not published to a public Helm repo in 0.1, so it is installed from the source path or an artifact provided during onboarding.",
      "facts": [
        {
          "kind": "code",
          "literal": "helm upgrade --install layer ./infra/helm/layer \\\n  --namespace layer --create-namespace \\\n  -f values.customer.yaml",
          "chunkId": "install/helm#install"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "install/helm#install"
        }
      ],
      "sources": [
        {
          "chunkId": "install/helm#install",
          "url": "/docs/install/helm#install",
          "anchor": "install"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "install",
        "helm",
        "upgrade",
        "dedicated",
        "namespace",
        "customer",
        "values",
        "file",
        "chart",
        "published",
        "public",
        "repo",
        "installed",
        "source",
        "path",
        "artifact",
        "provided",
        "during",
        "onboarding",
        "layer",
        "infra",
        "create",
        "yaml",
        "repository"
      ]
    },
    {
      "id": "install/helm#required-values",
      "kind": "section",
      "title": "Helm Install",
      "heading": "Required values",
      "group": "Operations",
      "url": "/docs/install/helm#required-values",
      "summary": "Most of the chart is opinionated defaults; the values that must be brought from outside are the Turbopuffer API key (the one credential Layer cannot generate), the gateway image URL, the client bearer token, the S3 bucket, and the gateway IAM role ARN, with optional values for index GC, the dashboard cost role, and public ingress.",
      "facts": [
        {
          "kind": "code",
          "literal": "turbopuffer.apiKey",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "gateway.image",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "gateway.apiKey",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "Authorization: Bearer …",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "s3.bucket",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "serviceAccount.roleArn",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "gateway.indexGc.enabled",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "gateway.indexGc.indexNamespace",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "operator.discovery.indexNamespace",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "dashboard.serviceAccount.roleArn",
          "chunkId": "install/helm#required-values"
        },
        {
          "kind": "code",
          "literal": "ingress.host",
          "chunkId": "install/helm#required-values"
        }
      ],
      "sources": [
        {
          "chunkId": "install/helm#required-values",
          "url": "/docs/install/helm#required-values",
          "anchor": "required-values"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "required",
        "values",
        "most",
        "chart",
        "opinionated",
        "defaults",
        "must",
        "brought",
        "outside",
        "turbopuffer",
        "credential",
        "layer",
        "cannot",
        "generate",
        "gateway",
        "image",
        "client",
        "bearer",
        "token",
        "bucket",
        "role",
        "optional",
        "index",
        "dashboard",
        "cost",
        "public",
        "ingress",
        "apikey",
        "authorization",
        "serviceaccount",
        "rolearn",
        "indexgc",
        "enabled",
        "indexnamespace",
        "operator",
        "discovery",
        "host",
        "typical",
        "install",
        "only"
      ]
    },
    {
      "id": "install/helm#what-gets-installed",
      "kind": "section",
      "title": "Helm Install",
      "heading": "What gets installed",
      "group": "Operations",
      "url": "/docs/install/helm#what-gets-installed",
      "summary": "Helm installs the Rust gateway for compatible routes plus Layer extensions, the operator that reconciles the four CRDs, and the cache (scale-to-zero by default), along with supporting service accounts, IAM bindings, ingress, and the CRDs.",
      "facts": [
        {
          "kind": "code",
          "literal": "layer-gateway",
          "chunkId": "install/helm#what-gets-installed"
        },
        {
          "kind": "code",
          "literal": "layer-operator",
          "chunkId": "install/helm#what-gets-installed"
        },
        {
          "kind": "code",
          "literal": "layer-document-cache",
          "chunkId": "install/helm#what-gets-installed"
        }
      ],
      "sources": [
        {
          "chunkId": "install/helm#what-gets-installed",
          "url": "/docs/install/helm#what-gets-installed",
          "anchor": "what-gets-installed"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "gets",
        "installed",
        "helm",
        "installs",
        "rust",
        "gateway",
        "compatible",
        "routes",
        "plus",
        "layer",
        "extensions",
        "operator",
        "reconciles",
        "four",
        "crds",
        "cache",
        "scale",
        "zero",
        "default",
        "along",
        "supporting",
        "service",
        "accounts",
        "bindings",
        "ingress",
        "document",
        "turbopuffer",
        "fetch",
        "scans",
        "snapshots",
        "warm",
        "jobs",
        "pipeline",
        "state",
        "reconciler",
        "index",
        "infrarules",
        "function",
        "documented",
        "kubernetes"
      ]
    },
    {
      "id": "install/terraform",
      "kind": "section",
      "title": "Terraform",
      "heading": null,
      "group": "Operations",
      "url": "/docs/install/terraform",
      "summary": "The Terraform configuration provisions the AWS resources the gateway and operator need, being opinionated about what Layer requires and conservative about surrounding resources; DNS zones and TLS certificates are opt-in since most installs bring existing DNS and TLS.",
      "facts": [
        {
          "kind": "code",
          "literal": "infra/terraform/",
          "chunkId": "install/terraform"
        },
        {
          "kind": "value",
          "literal": "Callout.astro",
          "chunkId": "install/terraform"
        }
      ],
      "sources": [
        {
          "chunkId": "install/terraform",
          "url": "/docs/install/terraform",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "terraform",
        "configuration",
        "provisions",
        "resources",
        "gateway",
        "operator",
        "need",
        "being",
        "opinionated",
        "about",
        "layer",
        "requires",
        "conservative",
        "surrounding",
        "zones",
        "certificates",
        "since",
        "most",
        "installs",
        "bring",
        "existing",
        "infra",
        "callout",
        "astro",
        "leaves",
        "needs",
        "behave",
        "correctly",
        "around",
        "route53",
        "hosted"
      ]
    },
    {
      "id": "install/terraform#cluster-recommended",
      "kind": "section",
      "title": "Terraform",
      "heading": "Cluster: recommended",
      "group": "Operations",
      "url": "/docs/install/terraform#cluster-recommended",
      "summary": "Design-partner installs should use a fresh cluster unless there is a specific reason not to; the cluster path provisions the VPC, control plane and node groups, a node autoscaler, a load balancer controller, and shared persistent storage. Installs reusing an existing cluster must supply the functional prerequisites themselves (S3 bucket, gateway and dashboard IAM, registry access, node autoscaling, and a load balancer controller for public ingress).",
      "facts": [
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "install/terraform#cluster-recommended"
        }
      ],
      "sources": [
        {
          "chunkId": "install/terraform#cluster-recommended",
          "url": "/docs/install/terraform#cluster-recommended",
          "anchor": "cluster-recommended"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "cluster",
        "recommended",
        "design",
        "partner",
        "installs",
        "should",
        "fresh",
        "unless",
        "there",
        "specific",
        "reason",
        "path",
        "provisions",
        "control",
        "plane",
        "node",
        "groups",
        "autoscaler",
        "load",
        "balancer",
        "controller",
        "shared",
        "persistent",
        "storage",
        "reusing",
        "existing",
        "must",
        "supply",
        "functional",
        "prerequisites",
        "themselves",
        "bucket",
        "gateway",
        "dashboard",
        "registry",
        "access",
        "autoscaling",
        "public",
        "ingress",
        "bind"
      ]
    },
    {
      "id": "install/terraform#cost-notes",
      "kind": "section",
      "title": "Terraform",
      "heading": "Cost notes",
      "group": "Operations",
      "url": "/docs/install/terraform#cost-notes",
      "summary": "The Terraform deploys a cost-efficient footprint with autoscaling for on-demand indexing; at-rest fixed costs are mostly the cluster, NAT, and small storage, indexing bursts scale worker nodes up and back down, and heavier read use cases may need more read-side infrastructure with sizing help available from the vendor.",
      "facts": [],
      "sources": [
        {
          "chunkId": "install/terraform#cost-notes",
          "url": "/docs/install/terraform#cost-notes",
          "anchor": "cost-notes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "cost",
        "notes",
        "terraform",
        "deploys",
        "efficient",
        "footprint",
        "autoscaling",
        "demand",
        "indexing",
        "rest",
        "fixed",
        "costs",
        "mostly",
        "cluster",
        "small",
        "storage",
        "bursts",
        "scale",
        "worker",
        "nodes",
        "back",
        "down",
        "heavier",
        "read",
        "cases",
        "need",
        "more",
        "side",
        "infrastructure",
        "sizing",
        "help",
        "available",
        "vendor",
        "designed",
        "deploy",
        "work",
        "private",
        "workers",
        "third",
        "party"
      ]
    },
    {
      "id": "install/terraform#outputs",
      "kind": "section",
      "title": "Terraform",
      "heading": "Outputs",
      "group": "Operations",
      "url": "/docs/install/terraform#outputs",
      "summary": "Terraform emits the values the Helm chart needs (S3 bucket name, gateway and dashboard IAM role ARNs, image URLs, and cluster metadata) to be passed into the Helm values file.",
      "facts": [],
      "sources": [
        {
          "chunkId": "install/terraform#outputs",
          "url": "/docs/install/terraform#outputs",
          "anchor": "outputs"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "outputs",
        "terraform",
        "emits",
        "values",
        "helm",
        "chart",
        "needs",
        "bucket",
        "name",
        "gateway",
        "dashboard",
        "role",
        "arns",
        "image",
        "urls",
        "cluster",
        "metadata",
        "passed",
        "file",
        "install",
        "irsa",
        "cost",
        "read",
        "pass",
        "these",
        "described"
      ]
    },
    {
      "id": "install/terraform#what-it-sets-up",
      "kind": "section",
      "title": "Terraform",
      "heading": "What it sets up",
      "group": "Operations",
      "url": "/docs/install/terraform#what-it-sets-up",
      "summary": "Terraform sets up an S3 bucket for durable snapshot/history/clickstream storage, IAM roles and policies for gateway/dashboard/worker access, image repositories for the gateway/operator/customer function images, an optional fresh cluster with VPC and node pools, and optional DNS zones and certificates.",
      "facts": [
        {
          "kind": "code",
          "literal": "manage_public_dns=true",
          "chunkId": "install/terraform#what-it-sets-up"
        }
      ],
      "sources": [
        {
          "chunkId": "install/terraform#what-it-sets-up",
          "url": "/docs/install/terraform#what-it-sets-up",
          "anchor": "what-it-sets-up"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "sets",
        "terraform",
        "bucket",
        "durable",
        "snapshot",
        "history",
        "clickstream",
        "storage",
        "roles",
        "policies",
        "gateway",
        "dashboard",
        "worker",
        "access",
        "image",
        "repositories",
        "operator",
        "customer",
        "function",
        "images",
        "optional",
        "fresh",
        "cluster",
        "node",
        "pools",
        "zones",
        "certificates",
        "manage",
        "public",
        "true",
        "resource",
        "purpose",
        "namespace",
        "snapshots",
        "search",
        "events",
        "irsa",
        "cost",
        "read",
        "registry"
      ]
    },
    {
      "id": "kubernetes/function-crd",
      "kind": "section",
      "title": "Function CRD",
      "heading": null,
      "group": "Operations",
      "url": "/docs/kubernetes/function-crd",
      "summary": "The Function CRD declares row-preserving compute over an index; the operator creates worker resources while the gateway owns discovery, queueing, retries, leases, and writeback. The spec names target namespaces, input columns, the output attribute and kind, a discovery filter, the worker image and dispatch settings, a discovery/lease schedule, retry policy, triggers, and inline scaling.",
      "facts": [
        {
          "kind": "code",
          "literal": "Function",
          "chunkId": "kubernetes/function-crd"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/function-crd",
          "url": "/docs/kubernetes/function-crd",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "function",
        "declares",
        "preserving",
        "compute",
        "index",
        "operator",
        "creates",
        "worker",
        "resources",
        "while",
        "gateway",
        "owns",
        "discovery",
        "queueing",
        "retries",
        "leases",
        "writeback",
        "spec",
        "names",
        "target",
        "namespaces",
        "input",
        "columns",
        "output",
        "attribute",
        "kind",
        "filter",
        "image",
        "dispatch",
        "settings",
        "lease",
        "schedule",
        "retry",
        "policy",
        "triggers",
        "inline",
        "scaling",
        "stateless",
        "user",
        "defined"
      ]
    },
    {
      "id": "kubernetes/function-crd#output",
      "kind": "section",
      "title": "Function CRD",
      "heading": "Output",
      "group": "Operations",
      "url": "/docs/kubernetes/function-crd#output",
      "summary": "An embedding-kind output should declare its dimension so consumers can validate vector shape; outputs are patched onto the target row through the gateway, and deleting a Function garbage-collects operator-managed resources but does not delete already-written attributes.",
      "facts": [
        {
          "kind": "code",
          "literal": "output.kind: embedding",
          "chunkId": "kubernetes/function-crd#output"
        },
        {
          "kind": "code",
          "literal": "output.dim",
          "chunkId": "kubernetes/function-crd#output"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/function-crd#output",
          "url": "/docs/kubernetes/function-crd#output",
          "anchor": "output"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "output",
        "embedding",
        "kind",
        "should",
        "declare",
        "dimension",
        "consumers",
        "validate",
        "vector",
        "shape",
        "outputs",
        "patched",
        "onto",
        "target",
        "through",
        "gateway",
        "deleting",
        "function",
        "garbage",
        "collects",
        "operator",
        "managed",
        "resources",
        "does",
        "delete",
        "already",
        "written",
        "attributes",
        "kubernetes"
      ]
    },
    {
      "id": "kubernetes/function-crd#scaling",
      "kind": "section",
      "title": "Function CRD",
      "heading": "Scaling",
      "group": "Operations",
      "url": "/docs/kubernetes/function-crd#scaling",
      "summary": "Function scaling is inline under the spec; in autoscale mode the operator emits a scaling object triggered by UDF queue depth, the named pool must exist in the cluster infra rules, and a replica maximum above the pool's per-workload ceiling is rejected in status.",
      "facts": [
        {
          "kind": "code",
          "literal": "spec.scaling",
          "chunkId": "kubernetes/function-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "ScaledObject",
          "chunkId": "kubernetes/function-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "mode: autoscale",
          "chunkId": "kubernetes/function-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "layer_udf_queue_depth",
          "chunkId": "kubernetes/function-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "kubernetes/function-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "maxReplicasPerWorkload",
          "chunkId": "kubernetes/function-crd#scaling"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/function-crd#scaling",
          "url": "/docs/kubernetes/function-crd#scaling",
          "anchor": "scaling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scaling",
        "function",
        "inline",
        "under",
        "spec",
        "autoscale",
        "mode",
        "operator",
        "emits",
        "object",
        "triggered",
        "queue",
        "depth",
        "named",
        "pool",
        "must",
        "exist",
        "cluster",
        "infra",
        "rules",
        "replica",
        "maximum",
        "above",
        "workload",
        "ceiling",
        "rejected",
        "status",
        "scaledobject",
        "layer",
        "infrarules",
        "default",
        "maxreplicasperworkload",
        "keda",
        "layerudfqueuedepth",
        "trigger",
        "selected",
        "maxima"
      ]
    },
    {
      "id": "kubernetes/function-crd#selection",
      "kind": "section",
      "title": "Function CRD",
      "heading": "Selection",
      "group": "Operations",
      "url": "/docs/kubernetes/function-crd#selection",
      "summary": "Functions select namespaces either explicitly by target list or by label selector on Index resources, and the filter preserves arbitrary JSON including array-form upstream filters, stored as-is by the operator and evaluated by the gateway during discovery.",
      "facts": [
        {
          "kind": "code",
          "literal": "targetNamespaces",
          "chunkId": "kubernetes/function-crd#selection"
        },
        {
          "kind": "code",
          "literal": "indexSelector",
          "chunkId": "kubernetes/function-crd#selection"
        },
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "kubernetes/function-crd#selection"
        },
        {
          "kind": "code",
          "literal": "filter",
          "chunkId": "kubernetes/function-crd#selection"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/function-crd#selection",
          "url": "/docs/kubernetes/function-crd#selection",
          "anchor": "selection"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "selection",
        "functions",
        "select",
        "namespaces",
        "either",
        "explicitly",
        "target",
        "list",
        "label",
        "selector",
        "index",
        "resources",
        "filter",
        "preserves",
        "arbitrary",
        "json",
        "including",
        "array",
        "form",
        "upstream",
        "filters",
        "stored",
        "operator",
        "evaluated",
        "gateway",
        "during",
        "discovery",
        "targetnamespaces",
        "indexselector",
        "explicit",
        "labels",
        "should",
        "choose",
        "turbopuffer",
        "stores",
        "shape",
        "evaluates"
      ]
    },
    {
      "id": "kubernetes/function-crd#worker",
      "kind": "section",
      "title": "Function CRD",
      "heading": "Worker",
      "group": "Operations",
      "url": "/docs/kubernetes/function-crd#worker",
      "summary": "The worker block sets the image, dispatch mode (pull for SDK claim/poll workers, push for HTTP workers), push port, batch size, call timeout, and an optional pod-level merge patch; pull dispatch creates a Deployment while push dispatch also creates a Service and readiness probe.",
      "facts": [
        {
          "kind": "code",
          "literal": "image",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "dispatch",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "pull",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "push",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "/run",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "port",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "batchSize",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "timeoutSeconds",
          "chunkId": "kubernetes/function-crd#worker"
        },
        {
          "kind": "code",
          "literal": "podSpec",
          "chunkId": "kubernetes/function-crd#worker"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/function-crd#worker",
          "url": "/docs/kubernetes/function-crd#worker",
          "anchor": "worker"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "worker",
        "block",
        "sets",
        "image",
        "dispatch",
        "mode",
        "pull",
        "claim",
        "poll",
        "workers",
        "push",
        "http",
        "port",
        "batch",
        "size",
        "call",
        "timeout",
        "optional",
        "level",
        "merge",
        "patch",
        "creates",
        "deployment",
        "while",
        "also",
        "service",
        "readiness",
        "probe",
        "batchsize",
        "timeoutseconds",
        "podspec",
        "field",
        "purpose",
        "rows"
      ]
    },
    {
      "id": "kubernetes/index-crd",
      "kind": "section",
      "title": "Index CRD",
      "heading": null,
      "group": "Operations",
      "url": "/docs/kubernetes/index-crd",
      "summary": "An Index represents one namespace exposed through the gateway, declaring the backend, snapshot policy, cache posture, consistency mode, and access metadata.",
      "facts": [
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "kubernetes/index-crd"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/index-crd",
          "url": "/docs/kubernetes/index-crd",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "index",
        "represents",
        "namespace",
        "exposed",
        "through",
        "gateway",
        "declaring",
        "backend",
        "snapshot",
        "policy",
        "cache",
        "posture",
        "consistency",
        "mode",
        "access",
        "metadata",
        "declarative",
        "representation",
        "managed",
        "layer",
        "declares",
        "apiversion",
        "hevlayer",
        "kind",
        "name",
        "products",
        "spec",
        "turbopuffer",
        "region",
        "east",
        "distancemetric",
        "cosinedistance",
        "labels",
        "shop",
        "tags",
        "catalog",
        "interval",
        "retention",
        "never",
        "facetfields"
      ]
    },
    {
      "id": "kubernetes/index-crd#backend",
      "kind": "section",
      "title": "Index CRD",
      "heading": "Backend",
      "group": "Operations",
      "url": "/docs/kubernetes/index-crd#backend",
      "summary": "The backend block sets the backend kind (Turbopuffer in 0.1), the region, an optional upstream namespace override defaulting to the Index name, and the vector distance metric.",
      "facts": [
        {
          "kind": "code",
          "literal": "backend.kind",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "code",
          "literal": "turbopuffer",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "code",
          "literal": "backend.region",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "code",
          "literal": "backend.namespace",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "code",
          "literal": "backend.distanceMetric",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "code",
          "literal": "cosine_distance",
          "chunkId": "kubernetes/index-crd#backend"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "kubernetes/index-crd#backend"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/index-crd#backend",
          "url": "/docs/kubernetes/index-crd#backend",
          "anchor": "backend"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "backend",
        "block",
        "sets",
        "kind",
        "turbopuffer",
        "region",
        "optional",
        "upstream",
        "namespace",
        "override",
        "defaulting",
        "index",
        "name",
        "vector",
        "distance",
        "metric",
        "distancemetric",
        "cosine",
        "field",
        "purpose",
        "runtime",
        "identifier",
        "defaults",
        "default",
        "cosinedistance"
      ]
    },
    {
      "id": "kubernetes/index-crd#cache-policy",
      "kind": "section",
      "title": "Index CRD",
      "heading": "Cache policy",
      "group": "Operations",
      "url": "/docs/kubernetes/index-crd#cache-policy",
      "summary": "The cache policy sets a warming thread count default while the cache remains ephemeral and durable snapshot history stays in S3.",
      "facts": [
        {
          "kind": "code",
          "literal": "cache.warming.threads",
          "chunkId": "kubernetes/index-crd#cache-policy"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/index-crd#cache-policy",
          "url": "/docs/kubernetes/index-crd#cache-policy",
          "anchor": "cache-policy"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "cache",
        "policy",
        "sets",
        "warming",
        "thread",
        "count",
        "default",
        "while",
        "remains",
        "ephemeral",
        "durable",
        "snapshot",
        "history",
        "stays",
        "threads",
        "defaults",
        "aerospike"
      ]
    },
    {
      "id": "kubernetes/index-crd#snapshot-policy",
      "kind": "section",
      "title": "Index CRD",
      "heading": "Snapshot policy",
      "group": "Operations",
      "url": "/docs/kubernetes/index-crd#snapshot-policy",
      "summary": "The snapshot policy's facet-fields list is the user-facing source of fields the gateway materializes into durable snapshots, with retention defaulting to never in 0.1 because automatic snapshot garbage collection has not shipped.",
      "facts": [
        {
          "kind": "code",
          "literal": "snapshot.facetFields",
          "chunkId": "kubernetes/index-crd#snapshot-policy"
        },
        {
          "kind": "code",
          "literal": "retention",
          "chunkId": "kubernetes/index-crd#snapshot-policy"
        },
        {
          "kind": "code",
          "literal": "never",
          "chunkId": "kubernetes/index-crd#snapshot-policy"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "kubernetes/index-crd#snapshot-policy"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/index-crd#snapshot-policy",
          "url": "/docs/kubernetes/index-crd#snapshot-policy",
          "anchor": "snapshot-policy"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "snapshot",
        "policy",
        "facet",
        "fields",
        "list",
        "user",
        "facing",
        "source",
        "gateway",
        "materializes",
        "durable",
        "snapshots",
        "retention",
        "defaulting",
        "never",
        "because",
        "automatic",
        "garbage",
        "collection",
        "shipped",
        "facetfields",
        "defaults"
      ]
    },
    {
      "id": "kubernetes/index-crd#status",
      "kind": "section",
      "title": "Index CRD",
      "heading": "Status",
      "group": "Operations",
      "url": "/docs/kubernetes/index-crd#status",
      "summary": "The operator reports observed generation, snapshot scheduling metadata, metadata sync state, and conditions on the Index status.",
      "facts": [],
      "sources": [
        {
          "chunkId": "kubernetes/index-crd#status",
          "url": "/docs/kubernetes/index-crd#status",
          "anchor": "status"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "status",
        "operator",
        "reports",
        "observed",
        "generation",
        "snapshot",
        "scheduling",
        "metadata",
        "sync",
        "state",
        "conditions",
        "index"
      ]
    },
    {
      "id": "kubernetes/operator",
      "kind": "section",
      "title": "Operator Overview",
      "heading": null,
      "group": "Operations",
      "url": "/docs/kubernetes/operator",
      "summary": "The operator manages declarative state for a deployment, monitoring index changes and managing scaling through custom resource definitions; the gateway owns the read/write path while the operator owns everything expressed as desired cluster state, such as which indexes exist, how worker pools scale, and which functions run against which indexes.",
      "facts": [
        {
          "kind": "code",
          "literal": "layer-operator",
          "chunkId": "kubernetes/operator"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/operator",
          "url": "/docs/kubernetes/operator",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "operator",
        "manages",
        "declarative",
        "state",
        "deployment",
        "monitoring",
        "index",
        "changes",
        "managing",
        "scaling",
        "through",
        "custom",
        "resource",
        "definitions",
        "gateway",
        "owns",
        "read",
        "write",
        "path",
        "while",
        "everything",
        "expressed",
        "desired",
        "cluster",
        "such",
        "indexes",
        "exist",
        "worker",
        "pools",
        "scale",
        "functions",
        "against",
        "layer",
        "reconciles",
        "relates",
        "serves",
        "crucial",
        "does",
        "abstractions",
        "known"
      ]
    },
    {
      "id": "kubernetes/operator#crds",
      "kind": "section",
      "title": "Operator Overview",
      "heading": "CRDs",
      "group": "Operations",
      "url": "/docs/kubernetes/operator#crds",
      "summary": "The operator reconciles four resource kinds, each on its own page: Index (one per managed namespace), InfraRules (cluster-wide compute pools, cache rules, and shared scaling policy), Pipeline (staged work that changes row count), and Function (stateless functions that read and write attributes).",
      "facts": [],
      "sources": [
        {
          "chunkId": "kubernetes/operator#crds",
          "url": "/docs/kubernetes/operator#crds",
          "anchor": "crds"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "crds",
        "operator",
        "reconciles",
        "four",
        "resource",
        "kinds",
        "page",
        "index",
        "managed",
        "namespace",
        "infrarules",
        "cluster",
        "wide",
        "compute",
        "pools",
        "cache",
        "rules",
        "shared",
        "scaling",
        "policy",
        "pipeline",
        "staged",
        "work",
        "changes",
        "count",
        "function",
        "stateless",
        "functions",
        "read",
        "write",
        "attributes",
        "documented",
        "turbopuffer",
        "gateway",
        "should",
        "manage",
        "document",
        "user",
        "defined"
      ]
    },
    {
      "id": "kubernetes/operator#relationship-to-the-gateway",
      "kind": "section",
      "title": "Operator Overview",
      "heading": "Relationship to the gateway",
      "group": "Operations",
      "url": "/docs/kubernetes/operator#relationship-to-the-gateway",
      "summary": "The gateway and operator are decoupled and neither sits in the other's hot path, so the gateway keeps serving even if the operator restarts or lags; the link is one-directional and read-only, with the gateway reading CRD status to inform what it serves but never writing CRDs, which are authored by the customer and reconciled by the operator.",
      "facts": [],
      "sources": [
        {
          "chunkId": "kubernetes/operator#relationship-to-the-gateway",
          "url": "/docs/kubernetes/operator#relationship-to-the-gateway",
          "anchor": "relationship-to-the-gateway"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "relationship",
        "gateway",
        "operator",
        "decoupled",
        "neither",
        "sits",
        "other",
        "path",
        "keeps",
        "serving",
        "even",
        "restarts",
        "lags",
        "link",
        "directional",
        "read",
        "only",
        "reading",
        "status",
        "inform",
        "serves",
        "never",
        "writing",
        "crds",
        "authored",
        "customer",
        "reconciled",
        "reconciles",
        "declarative",
        "state",
        "write",
        "restarted",
        "lagging",
        "between",
        "some",
        "features",
        "reads",
        "indexes",
        "exist",
        "worker"
      ]
    },
    {
      "id": "kubernetes/operator#scheduling-and-node-pools",
      "kind": "section",
      "title": "Operator Overview",
      "heading": "Scheduling and node pools",
      "group": "Operations",
      "url": "/docs/kubernetes/operator#scheduling-and-node-pools",
      "summary": "The operator does not schedule pipeline and function pods onto general capacity; each compute pool pins to a dedicated labeled node pool via node selectors and tolerations so CPU and GPU work lands on the right isolated nodes. The shipped defaults assume the node autoscaler's pool label but any labeled pool works, and this is configured once on the cluster infra-rules object rather than per workload.",
      "facts": [
        {
          "kind": "code",
          "literal": "nodeSelector",
          "chunkId": "kubernetes/operator#scheduling-and-node-pools"
        },
        {
          "kind": "code",
          "literal": "tolerations",
          "chunkId": "kubernetes/operator#scheduling-and-node-pools"
        },
        {
          "kind": "code",
          "literal": "karpenter.sh/nodepool",
          "chunkId": "kubernetes/operator#scheduling-and-node-pools"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "kubernetes/operator#scheduling-and-node-pools"
        },
        {
          "kind": "value",
          "literal": "karpenter.sh",
          "chunkId": "kubernetes/operator#scheduling-and-node-pools"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/operator#scheduling-and-node-pools",
          "url": "/docs/kubernetes/operator#scheduling-and-node-pools",
          "anchor": "scheduling-and-node-pools"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scheduling",
        "node",
        "pools",
        "operator",
        "does",
        "schedule",
        "pipeline",
        "function",
        "pods",
        "onto",
        "general",
        "capacity",
        "compute",
        "pool",
        "pins",
        "dedicated",
        "labeled",
        "selectors",
        "tolerations",
        "work",
        "lands",
        "right",
        "isolated",
        "nodes",
        "shipped",
        "defaults",
        "assume",
        "autoscaler",
        "label",
        "works",
        "configured",
        "once",
        "cluster",
        "infra",
        "rules",
        "object",
        "rather",
        "workload",
        "nodeselector",
        "karpenter"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": null,
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd",
      "summary": "The Pipeline CRD declares worker-owned indexing work whose row count can change between input and output (ingestion, chunking, fan-out), as opposed to a Function used when existing rows gain a derived attribute without changing count; Pipelines and Functions share the same worker and scaling envelopes, with the cluster infra-rules object owning placement and pool limits and each workload choosing a pool. The spec names a target namespace, an open source reference, the worker, and inline scaling.",
      "facts": [
        {
          "kind": "code",
          "literal": "Pipeline",
          "chunkId": "kubernetes/pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "spec.worker",
          "chunkId": "kubernetes/pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "spec.scaling",
          "chunkId": "kubernetes/pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "kubernetes/pipeline-crd"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd",
          "url": "/docs/kubernetes/pipeline-crd",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pipeline",
        "declares",
        "worker",
        "owned",
        "indexing",
        "work",
        "whose",
        "count",
        "change",
        "between",
        "input",
        "output",
        "ingestion",
        "chunking",
        "opposed",
        "function",
        "existing",
        "rows",
        "gain",
        "derived",
        "attribute",
        "without",
        "changing",
        "pipelines",
        "functions",
        "share",
        "same",
        "scaling",
        "envelopes",
        "cluster",
        "infra",
        "rules",
        "object",
        "owning",
        "placement",
        "pool",
        "limits",
        "workload",
        "choosing",
        "spec"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd#scaling",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": "Scaling",
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd#scaling",
      "summary": "Pipeline scaling sets a pool that must exist in the cluster infra rules and a mode: autoscale creates a scaling object backed by pipeline queue depth, fixed pins the deployment to the minimum, and disabled (or pausing) scales it to zero.",
      "facts": [
        {
          "kind": "code",
          "literal": "scaling:\n  pool: cpu\n  mode: autoscale\n  replicas:\n    min: 0\n    max: 8",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "spec.scaling.pool",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "mode: autoscale",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "ScaledObject",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "mode: fixed",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "replicas.min",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        },
        {
          "kind": "code",
          "literal": "spec.paused: true",
          "chunkId": "kubernetes/pipeline-crd#scaling"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd#scaling",
          "url": "/docs/kubernetes/pipeline-crd#scaling",
          "anchor": "scaling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scaling",
        "pipeline",
        "sets",
        "pool",
        "must",
        "exist",
        "cluster",
        "infra",
        "rules",
        "mode",
        "autoscale",
        "creates",
        "object",
        "backed",
        "queue",
        "depth",
        "fixed",
        "pins",
        "deployment",
        "minimum",
        "disabled",
        "pausing",
        "scales",
        "zero",
        "replicas",
        "spec",
        "infrarules",
        "default",
        "scaledobject",
        "paused",
        "true",
        "name",
        "keda",
        "also",
        "worker"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd#source",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": "Source",
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd#source",
      "summary": "The source reference is intentionally open JSON so operators can record the external feed (queue, stream, object events, partner API, or migration); the operator passes it through as declarative metadata and the worker image owns source-specific behavior.",
      "facts": [
        {
          "kind": "code",
          "literal": "spec.sourceRef",
          "chunkId": "kubernetes/pipeline-crd#source"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd#source",
          "url": "/docs/kubernetes/pipeline-crd#source",
          "anchor": "source"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "source",
        "reference",
        "intentionally",
        "open",
        "json",
        "operators",
        "record",
        "external",
        "feed",
        "queue",
        "stream",
        "object",
        "events",
        "partner",
        "migration",
        "operator",
        "passes",
        "through",
        "declarative",
        "metadata",
        "worker",
        "image",
        "owns",
        "specific",
        "behavior",
        "spec",
        "sourceref",
        "feeds",
        "kafka"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd#status",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": "Status",
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd#status",
      "summary": "The operator reports managed object references and readiness conditions on the Pipeline status, while queue counts and worker progress are served by the gateway pipeline status API.",
      "facts": [],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd#status",
          "url": "/docs/kubernetes/pipeline-crd#status",
          "anchor": "status"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "status",
        "operator",
        "reports",
        "managed",
        "object",
        "references",
        "readiness",
        "conditions",
        "pipeline",
        "while",
        "queue",
        "counts",
        "worker",
        "progress",
        "served",
        "gateway"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd#target",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": "Target",
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd#target",
      "summary": "The target namespace is the namespace the pipeline writes, and the gateway pipeline API owns document state, chunks, and vector writes for that target namespace.",
      "facts": [
        {
          "kind": "code",
          "literal": "spec.target.namespace",
          "chunkId": "kubernetes/pipeline-crd#target"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd#target",
          "url": "/docs/kubernetes/pipeline-crd#target",
          "anchor": "target"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "target",
        "namespace",
        "pipeline",
        "writes",
        "gateway",
        "owns",
        "document",
        "state",
        "chunks",
        "vector",
        "spec",
        "turbopuffer"
      ]
    },
    {
      "id": "kubernetes/pipeline-crd#worker",
      "kind": "section",
      "title": "Pipeline CRD",
      "heading": "Worker",
      "group": "Operations",
      "url": "/docs/kubernetes/pipeline-crd#worker",
      "summary": "The Pipeline worker block sets the image, batch size, call timeout, and an optional pod-level merge patch, with the operator creating one Deployment per Pipeline.",
      "facts": [
        {
          "kind": "code",
          "literal": "image",
          "chunkId": "kubernetes/pipeline-crd#worker"
        },
        {
          "kind": "code",
          "literal": "batchSize",
          "chunkId": "kubernetes/pipeline-crd#worker"
        },
        {
          "kind": "code",
          "literal": "timeoutSeconds",
          "chunkId": "kubernetes/pipeline-crd#worker"
        },
        {
          "kind": "code",
          "literal": "podSpec",
          "chunkId": "kubernetes/pipeline-crd#worker"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/pipeline-crd#worker",
          "url": "/docs/kubernetes/pipeline-crd#worker",
          "anchor": "worker"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "worker",
        "pipeline",
        "block",
        "sets",
        "image",
        "batch",
        "size",
        "call",
        "timeout",
        "optional",
        "level",
        "merge",
        "patch",
        "operator",
        "creating",
        "deployment",
        "batchsize",
        "timeoutseconds",
        "podspec",
        "field",
        "purpose",
        "work",
        "items",
        "creates"
      ]
    },
    {
      "id": "kubernetes/scaling-crd",
      "kind": "section",
      "title": "InfraRules CRD",
      "heading": null,
      "group": "Operations",
      "url": "/docs/kubernetes/scaling-crd",
      "summary": "InfraRules is the cluster-scoped policy object for Layer-managed runtime infrastructure, with exactly one object in 0.1; Pipelines and Functions do not reference a separate autoscaling resource but set inline scaling and choose a compute pool defined here.",
      "facts": [
        {
          "kind": "code",
          "literal": "InfraRules",
          "chunkId": "kubernetes/scaling-crd"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "kubernetes/scaling-crd"
        },
        {
          "kind": "code",
          "literal": "spec.scaling",
          "chunkId": "kubernetes/scaling-crd"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default.spec.computePools",
          "chunkId": "kubernetes/scaling-crd"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "kubernetes/scaling-crd"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/scaling-crd",
          "url": "/docs/kubernetes/scaling-crd",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "infrarules",
        "cluster",
        "scoped",
        "policy",
        "object",
        "layer",
        "managed",
        "runtime",
        "infrastructure",
        "exactly",
        "pipelines",
        "functions",
        "reference",
        "separate",
        "autoscaling",
        "resource",
        "inline",
        "scaling",
        "choose",
        "compute",
        "pool",
        "defined",
        "here",
        "default",
        "spec",
        "computepools",
        "wide",
        "pools",
        "document",
        "cache",
        "rules",
        "workload",
        "surface"
      ]
    },
    {
      "id": "kubernetes/scaling-crd#compute-pools",
      "kind": "section",
      "title": "InfraRules CRD",
      "heading": "Compute pools",
      "group": "Operations",
      "url": "/docs/kubernetes/scaling-crd#compute-pools",
      "summary": "Each compute pool declares a name referenced by workloads, a class label, an optional GPU type, node selectors and tolerations applied to chosen worker pods, container resources, and a hard per-workload replica ceiling; a workload naming an unknown pool or exceeding the ceiling is left unready with a status condition.",
      "facts": [
        {
          "kind": "code",
          "literal": "name",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "spec.scaling.pool",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "kind",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "cpu",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "gpu",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "gpuType",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "nodeSelector",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "tolerations",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "resources",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        },
        {
          "kind": "code",
          "literal": "maxReplicasPerWorkload",
          "chunkId": "kubernetes/scaling-crd#compute-pools"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/scaling-crd#compute-pools",
          "url": "/docs/kubernetes/scaling-crd#compute-pools",
          "anchor": "compute-pools"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "compute",
        "pools",
        "pool",
        "declares",
        "name",
        "referenced",
        "workloads",
        "class",
        "label",
        "optional",
        "type",
        "node",
        "selectors",
        "tolerations",
        "applied",
        "chosen",
        "worker",
        "pods",
        "container",
        "resources",
        "hard",
        "workload",
        "replica",
        "ceiling",
        "naming",
        "unknown",
        "exceeding",
        "left",
        "unready",
        "status",
        "condition",
        "spec",
        "scaling",
        "kind",
        "gputype",
        "nodeselector",
        "maxreplicasperworkload",
        "field",
        "purpose",
        "pipeline"
      ]
    },
    {
      "id": "kubernetes/scaling-crd#document-cache-rules",
      "kind": "section",
      "title": "InfraRules CRD",
      "heading": "Document cache rules",
      "group": "Operations",
      "url": "/docs/kubernetes/scaling-crd#document-cache-rules",
      "summary": "The document-cache block captures the operator-owned cache envelope (capacity, replication factor, node count); in 0.1 Helm still renders the cache scaling object directly while this section is the declared policy shape the operator reports and validates against.",
      "facts": [
        {
          "kind": "code",
          "literal": "documentCache",
          "chunkId": "kubernetes/scaling-crd#document-cache-rules"
        },
        {
          "kind": "code",
          "literal": "InfraRules",
          "chunkId": "kubernetes/scaling-crd#document-cache-rules"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "kubernetes/scaling-crd#document-cache-rules"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/scaling-crd#document-cache-rules",
          "url": "/docs/kubernetes/scaling-crd#document-cache-rules",
          "anchor": "document-cache-rules"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "document",
        "cache",
        "rules",
        "block",
        "captures",
        "operator",
        "owned",
        "envelope",
        "capacity",
        "replication",
        "factor",
        "node",
        "count",
        "helm",
        "still",
        "renders",
        "scaling",
        "object",
        "directly",
        "while",
        "section",
        "declared",
        "policy",
        "shape",
        "reports",
        "validates",
        "against",
        "documentcache",
        "infrarules",
        "keda"
      ]
    },
    {
      "id": "kubernetes/scaling-crd#infrarules",
      "kind": "section",
      "title": "InfraRules CRD",
      "heading": "InfraRules",
      "group": "Operations",
      "url": "/docs/kubernetes/scaling-crd#infrarules",
      "summary": "The InfraRules object (which must be named default and can be rendered by Helm) declares compute pools, the document-cache envelope, and node scaling for the cluster's Layer infrastructure.",
      "facts": [
        {
          "kind": "code",
          "literal": "default",
          "chunkId": "kubernetes/scaling-crd#infrarules"
        },
        {
          "kind": "code",
          "literal": "operator.infraRules.create=true",
          "chunkId": "kubernetes/scaling-crd#infrarules"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/scaling-crd#infrarules",
          "url": "/docs/kubernetes/scaling-crd#infrarules",
          "anchor": "infrarules"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "infrarules",
        "object",
        "must",
        "named",
        "default",
        "rendered",
        "helm",
        "declares",
        "compute",
        "pools",
        "document",
        "cache",
        "envelope",
        "node",
        "scaling",
        "cluster",
        "layer",
        "infrastructure",
        "operator",
        "create",
        "true",
        "apiversion",
        "hevlayer",
        "v1alpha1",
        "kind",
        "metadata",
        "name",
        "spec",
        "computepools",
        "maxreplicasperworkload",
        "nodeselector",
        "karpenter",
        "nodepool",
        "tolerations",
        "resources",
        "requests",
        "500m",
        "memory",
        "512mi",
        "limits"
      ]
    },
    {
      "id": "kubernetes/scaling-crd#workload-scaling",
      "kind": "section",
      "title": "InfraRules CRD",
      "heading": "Workload scaling",
      "group": "Operations",
      "url": "/docs/kubernetes/scaling-crd#workload-scaling",
      "summary": "Workload scaling chooses a pool and a mode: autoscale emits a scaling object letting queue depth scale the deployment between min and max, fixed sets replicas to the minimum with no scaling object, and disabled (or pausing) scales to zero; keep a cold-start-heavy worker warm by autoscaling with a minimum of one.",
      "facts": [
        {
          "kind": "code",
          "literal": "scaling:\n  pool: cpu\n  mode: autoscale\n  replicas:\n    min: 0\n    max: 4",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "autoscale",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "ScaledObject",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "min",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "max",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "fixed",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "replicas.min",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "disabled",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "mode: autoscale",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        },
        {
          "kind": "code",
          "literal": "replicas.min: 1",
          "chunkId": "kubernetes/scaling-crd#workload-scaling"
        }
      ],
      "sources": [
        {
          "chunkId": "kubernetes/scaling-crd#workload-scaling",
          "url": "/docs/kubernetes/scaling-crd#workload-scaling",
          "anchor": "workload-scaling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "workload",
        "scaling",
        "chooses",
        "pool",
        "mode",
        "autoscale",
        "emits",
        "object",
        "letting",
        "queue",
        "depth",
        "scale",
        "deployment",
        "between",
        "fixed",
        "sets",
        "replicas",
        "minimum",
        "disabled",
        "pausing",
        "scales",
        "zero",
        "keep",
        "cold",
        "start",
        "heavy",
        "worker",
        "warm",
        "autoscaling",
        "scaledobject",
        "behavior",
        "emit",
        "keda",
        "emitted",
        "paused",
        "workloads",
        "also"
      ]
    },
    {
      "id": "limits",
      "kind": "section",
      "title": "Limits",
      "heading": null,
      "group": "Overview",
      "url": "/docs/limits",
      "summary": "Layer inherits ceilings from its bundled components that will lift as demand grows: a single-node cache, a maximum number of Turbopuffer namespaces, a maximum cache size, and a distinct-value cap per scan facet field (fields over the cap are reported as skipped rather than partially materialized so emitted fields are always complete).",
      "facts": [
        {
          "kind": "code",
          "literal": "fields_skipped[]",
          "chunkId": "limits"
        },
        {
          "kind": "code",
          "literal": "fields[]",
          "chunkId": "limits"
        }
      ],
      "sources": [
        {
          "chunkId": "limits",
          "url": "/docs/limits",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "layer",
        "inherits",
        "ceilings",
        "bundled",
        "components",
        "lift",
        "demand",
        "grows",
        "single",
        "node",
        "cache",
        "maximum",
        "number",
        "turbopuffer",
        "namespaces",
        "size",
        "distinct",
        "value",
        "scan",
        "facet",
        "field",
        "fields",
        "reported",
        "skipped",
        "rather",
        "partially",
        "materialized",
        "emitted",
        "always",
        "complete",
        "current",
        "inherited",
        "ship",
        "limited",
        "certain",
        "constraints",
        "underlying",
        "these",
        "increases",
        "aerospike"
      ]
    },
    {
      "id": "limits#no-limits",
      "kind": "section",
      "title": "Limits",
      "heading": "No limits",
      "group": "Overview",
      "url": "/docs/limits#no-limits",
      "summary": "Several things have no enforced ceiling but practical limits under load: CRD instance counts (bounded by cluster throughput), snapshot/search/clickstream history (durable in S3 with no automatic expiry, bounded by storage cost), UDF concurrency (bounded by cluster capacity), pipeline queue depth (kept compact via S3 manifests), and document size and attribute count (bounded by the underlying stores, not by Layer).",
      "facts": [
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "limits#no-limits"
        },
        {
          "kind": "code",
          "literal": "Function",
          "chunkId": "limits#no-limits"
        },
        {
          "kind": "code",
          "literal": "Pipeline",
          "chunkId": "limits#no-limits"
        },
        {
          "kind": "code",
          "literal": "Scaling",
          "chunkId": "limits#no-limits"
        }
      ],
      "sources": [
        {
          "chunkId": "limits#no-limits",
          "url": "/docs/limits#no-limits",
          "anchor": "no-limits"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "limits",
        "several",
        "things",
        "enforced",
        "ceiling",
        "practical",
        "under",
        "load",
        "instance",
        "counts",
        "bounded",
        "cluster",
        "throughput",
        "snapshot",
        "search",
        "clickstream",
        "history",
        "durable",
        "automatic",
        "expiry",
        "storage",
        "cost",
        "concurrency",
        "capacity",
        "pipeline",
        "queue",
        "depth",
        "kept",
        "compact",
        "manifests",
        "document",
        "size",
        "attribute",
        "count",
        "underlying",
        "stores",
        "layer",
        "index",
        "function",
        "scaling"
      ]
    },
    {
      "id": "pipelines",
      "kind": "section",
      "title": "Pipelines",
      "heading": null,
      "group": "Guides",
      "url": "/docs/pipelines",
      "summary": "A pipeline indexes documents through staged work whose row count changes, commonly CPU extract then GPU embed; the gateway tracks document state in PostgreSQL and exports queue depth so the operator can autoscale workers, and once vectors land in Turbopuffer they are queried and fetched through the namespace API.",
      "facts": [
        {
          "kind": "value",
          "literal": "Diagram.astro",
          "chunkId": "pipelines"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines",
          "url": "/docs/pipelines",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pipeline",
        "indexes",
        "documents",
        "through",
        "staged",
        "work",
        "whose",
        "count",
        "changes",
        "commonly",
        "extract",
        "embed",
        "gateway",
        "tracks",
        "document",
        "state",
        "postgresql",
        "exports",
        "queue",
        "depth",
        "operator",
        "autoscale",
        "workers",
        "once",
        "vectors",
        "land",
        "turbopuffer",
        "queried",
        "fetched",
        "namespace",
        "diagram",
        "astro",
        "extraction",
        "embedding",
        "chunk",
        "handoff",
        "keda",
        "scaling",
        "signals",
        "common"
      ]
    },
    {
      "id": "pipelines#autoscaling",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Autoscaling",
      "group": "Guides",
      "url": "/docs/pipelines#autoscaling",
      "summary": "The operator emits the scaling object directly from a Pipeline's scaling spec; manual workers not represented by a Pipeline CR can use the same Prometheus pending-count signal via a scaling object so autoscaling stays close to the same source of truth Layer uses for claims while keeping PostgreSQL private to the gateway pod.",
      "facts": [
        {
          "kind": "code",
          "literal": "Pipeline.spec.scaling",
          "chunkId": "pipelines#autoscaling"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#autoscaling",
          "url": "/docs/pipelines#autoscaling",
          "anchor": "autoscaling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "autoscaling",
        "operator",
        "emits",
        "scaling",
        "object",
        "directly",
        "pipeline",
        "spec",
        "manual",
        "workers",
        "represented",
        "same",
        "prometheus",
        "pending",
        "count",
        "signal",
        "stays",
        "close",
        "source",
        "truth",
        "layer",
        "uses",
        "claims",
        "while",
        "keeping",
        "postgresql",
        "private",
        "gateway",
        "keda",
        "apiversion",
        "v1alpha1",
        "kind",
        "scaledobject",
        "metadata",
        "name",
        "embed",
        "worker",
        "scaletargetref",
        "minreplicacount",
        "maxreplicacount"
      ]
    },
    {
      "id": "pipelines#claim-heartbeat-stage",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Claim, heartbeat, stage",
      "group": "Guides",
      "url": "/docs/pipelines#claim-heartbeat-stage",
      "summary": "Workers claim staged documents through the gateway rather than mutating PostgreSQL directly; the gateway records claim ownership and time, moves rows to the requested stage, recovers stale claims past their lease, and uses skip-locked semantics for concurrent claims, with heartbeat and stage routes to extend leases and move documents to final stages. Pipeline queues are segmented: document and chunk id lists go into compressed S3 manifests while only segment leases and counters live in PostgreSQL, so queues scale by segment count, with manifests treated as queue state rather than durable history and cleaned up as segments split, complete, or the pipeline is deleted.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/pipelines/product-images/claim\n{\n  \"stage\": \"pending\",\n  \"claim_stage\": \"embedding\",\n  \"limit\": 2000,\n  \"worker_id\": \"gpu-worker-0\",\n  \"lease_seconds\": 900\n}",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "POST /v2/pipelines/product-images/documents/heartbeat\n{\n  \"document_ids\": [\"B07XYZ123\"],\n  \"stage\": \"embedding\",\n  \"worker_id\": \"gpu-worker-0\"\n}",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "POST /v2/pipelines/product-images/documents/stage\n{\n  \"document_ids\": [\"B07XYZ123\"],\n  \"stage\": \"indexed\",\n  \"from_stage\": \"embedding\",\n  \"worker_id\": \"gpu-worker-0\"\n}",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "claimed_by",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "claimed_at",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "FOR UPDATE SKIP LOCKED",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "stage: \"pending\"",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "stage: \"failed\"",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "create_missing: true",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "from_stage",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "worker_id",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "PIPELINE_SEGMENT_SIZE",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "code",
          "literal": "indexed",
          "chunkId": "pipelines#claim-heartbeat-stage"
        },
        {
          "kind": "value",
          "literal": "e.g",
          "chunkId": "pipelines#claim-heartbeat-stage"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#claim-heartbeat-stage",
          "url": "/docs/pipelines#claim-heartbeat-stage",
          "anchor": "claim-heartbeat-stage"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "claim",
        "heartbeat",
        "stage",
        "workers",
        "staged",
        "documents",
        "through",
        "gateway",
        "rather",
        "mutating",
        "postgresql",
        "directly",
        "records",
        "ownership",
        "time",
        "moves",
        "rows",
        "requested",
        "recovers",
        "stale",
        "claims",
        "past",
        "their",
        "lease",
        "uses",
        "skip",
        "locked",
        "semantics",
        "concurrent",
        "routes",
        "extend",
        "leases",
        "move",
        "final",
        "stages",
        "pipeline",
        "queues",
        "segmented",
        "document",
        "chunk"
      ]
    },
    {
      "id": "pipelines#cpu-workers--scale-on-input-source",
      "kind": "section",
      "title": "Pipelines",
      "heading": "CPU workers — scale on input source",
      "group": "Guides",
      "url": "/docs/pipelines#cpu-workers--scale-on-input-source",
      "summary": "CPU workers scale on whatever feeds them (queue depth, consumer lag, object event notifications), independent of the pipeline API, via a scaling object triggered on the input source.",
      "facts": [
        {
          "kind": "code",
          "literal": "apiVersion: keda.sh/v1alpha1\nkind: ScaledObject\nmetadata:\n  name: cpu-extract-worker\nspec:\n  triggers:\n    - type: aws-sqs-queue\n      metadata:\n        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/product-images\n        queueLength: \"10\"\n        awsRegion: us-east-1",
          "chunkId": "pipelines#cpu-workers--scale-on-input-source"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#cpu-workers--scale-on-input-source",
          "url": "/docs/pipelines#cpu-workers--scale-on-input-source",
          "anchor": "cpu-workers--scale-on-input-source"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "workers",
        "scale",
        "input",
        "source",
        "whatever",
        "feeds",
        "queue",
        "depth",
        "consumer",
        "object",
        "event",
        "notifications",
        "independent",
        "pipeline",
        "scaling",
        "triggered",
        "apiversion",
        "keda",
        "v1alpha1",
        "kind",
        "scaledobject",
        "metadata",
        "name",
        "extract",
        "worker",
        "spec",
        "triggers",
        "type",
        "queueurl",
        "https",
        "east",
        "amazonaws",
        "123456789",
        "product",
        "images",
        "queuelength",
        "awsregion",
        "kafka"
      ]
    },
    {
      "id": "pipelines#create-a-pipeline",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Create a pipeline",
      "group": "Guides",
      "url": "/docs/pipelines#create-a-pipeline",
      "summary": "A pipeline is created by posting an id, target namespace, and distance metric (defaulting to cosine); the call conflicts if the pipeline already exists.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl -X POST http://gateway:8080/v2/pipelines \\\n  -H 'content-type: application/json' \\\n  -d '{\n    \"id\": \"product-images\",\n    \"target_namespace\": \"products\",\n    \"distance_metric\": \"cosine_distance\"\n  }'",
          "chunkId": "pipelines#create-a-pipeline"
        },
        {
          "kind": "code",
          "literal": "distance_metric",
          "chunkId": "pipelines#create-a-pipeline"
        },
        {
          "kind": "code",
          "literal": "cosine_distance",
          "chunkId": "pipelines#create-a-pipeline"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#create-a-pipeline",
          "url": "/docs/pipelines#create-a-pipeline",
          "anchor": "create-a-pipeline"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "create",
        "pipeline",
        "created",
        "posting",
        "target",
        "namespace",
        "distance",
        "metric",
        "defaulting",
        "cosine",
        "call",
        "conflicts",
        "already",
        "exists",
        "curl",
        "post",
        "http",
        "gateway",
        "8080",
        "pipelines",
        "content",
        "type",
        "application",
        "json",
        "product",
        "images",
        "products",
        "targetnamespace",
        "distancemetric",
        "cosinedistance",
        "defaults",
        "returns"
      ]
    },
    {
      "id": "pipelines#document-lifecycle",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Document lifecycle",
      "group": "Guides",
      "url": "/docs/pipelines#document-lifecycle",
      "summary": "A staged document moves from pending (chunks stored in the cache awaiting embedding) to indexed (vectors written to Turbopuffer); re-staging a document idempotently resets it to pending with new chunks, useful for reprocessing after source data changes.",
      "facts": [
        {
          "kind": "code",
          "literal": "stage_document()           write_vectors()\n  (new doc) ──────────────────► pending ──────────────────► indexed\n                                  ▲\n                                  │ re-stage (idempotent)",
          "chunkId": "pipelines#document-lifecycle"
        },
        {
          "kind": "code",
          "literal": "pending",
          "chunkId": "pipelines#document-lifecycle"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#document-lifecycle",
          "url": "/docs/pipelines#document-lifecycle",
          "anchor": "document-lifecycle"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "document",
        "lifecycle",
        "staged",
        "moves",
        "pending",
        "chunks",
        "stored",
        "cache",
        "awaiting",
        "embedding",
        "indexed",
        "vectors",
        "written",
        "turbopuffer",
        "staging",
        "idempotently",
        "resets",
        "useful",
        "reprocessing",
        "after",
        "source",
        "data",
        "changes",
        "stage",
        "write",
        "idempotent",
        "stagedocument",
        "writevectors",
        "aerospike",
        "waiting"
      ]
    },
    {
      "id": "pipelines#failure-model",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Failure model",
      "group": "Guides",
      "url": "/docs/pipelines#failure-model",
      "summary": "Upstream write failures are hard: the vectors route errors and the document stays in the embedding stage for re-claim; cache failures do not block chunk reads when S3 backing is present, PostgreSQL connectivity surfaces as a retryable error, and lease expiry is handled server-side so a worker crashing mid-embedding has its documents recovered on the next claim sweep.",
      "facts": [
        {
          "kind": "code",
          "literal": "embedding",
          "chunkId": "pipelines#failure-model"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#failure-model",
          "url": "/docs/pipelines#failure-model",
          "anchor": "failure-model"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "failure",
        "model",
        "upstream",
        "write",
        "failures",
        "hard",
        "vectors",
        "route",
        "errors",
        "document",
        "stays",
        "embedding",
        "stage",
        "claim",
        "cache",
        "block",
        "chunk",
        "reads",
        "backing",
        "present",
        "postgresql",
        "connectivity",
        "surfaces",
        "retryable",
        "error",
        "lease",
        "expiry",
        "handled",
        "server",
        "side",
        "worker",
        "crashing",
        "documents",
        "recovered",
        "next",
        "sweep",
        "turbopuffer",
        "returns",
        "aerospike",
        "should"
      ]
    },
    {
      "id": "pipelines#gateway-api",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Gateway API",
      "group": "Guides",
      "url": "/docs/pipelines#gateway-api",
      "summary": "Section header introducing the gateway pipeline API.",
      "facts": [],
      "sources": [
        {
          "chunkId": "pipelines#gateway-api",
          "url": "/docs/pipelines#gateway-api",
          "anchor": "gateway-api"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "gateway",
        "section",
        "header",
        "introducing",
        "pipeline"
      ]
    },
    {
      "id": "pipelines#get-pipeline-status-keda-polling",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Get pipeline status (KEDA polling)",
      "group": "Guides",
      "url": "/docs/pipelines#get-pipeline-status-keda-polling",
      "summary": "The pipeline status route returns per-stage counts and a pending-count field that the autoscaler watches; when it reaches zero, GPU workers scale to zero.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl http://gateway:8080/v2/pipelines/product-images/status",
          "chunkId": "pipelines#get-pipeline-status-keda-polling"
        },
        {
          "kind": "code",
          "literal": "{\n  \"pipeline_id\": \"product-images\",\n  \"counts\": {\"pending\": 142, \"indexed\": 8530},\n  \"pending_count\": 142\n}",
          "chunkId": "pipelines#get-pipeline-status-keda-polling"
        },
        {
          "kind": "code",
          "literal": "pending_count",
          "chunkId": "pipelines#get-pipeline-status-keda-polling"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#get-pipeline-status-keda-polling",
          "url": "/docs/pipelines#get-pipeline-status-keda-polling",
          "anchor": "get-pipeline-status-keda-polling"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pipeline",
        "status",
        "keda",
        "polling",
        "route",
        "returns",
        "stage",
        "counts",
        "pending",
        "count",
        "field",
        "autoscaler",
        "watches",
        "reaches",
        "zero",
        "workers",
        "scale",
        "curl",
        "http",
        "gateway",
        "8080",
        "pipelines",
        "product",
        "images",
        "indexed",
        "8530",
        "pipelineid",
        "pendingcount",
        "hits"
      ]
    },
    {
      "id": "pipelines#pipeline-crd",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Pipeline CRD",
      "group": "Guides",
      "url": "/docs/pipelines#pipeline-crd",
      "summary": "Declare a Pipeline CRD when the operator should own the worker Deployment and scaling object, naming a target namespace, the worker, and inline scaling whose pool must exist in the cluster infra rules; fixed mode pins to the minimum and disabled or paused scales to zero. Full detail is on the Pipeline CRD page.",
      "facts": [
        {
          "kind": "code",
          "literal": "apiVersion: hevlayer.com/v1alpha1\nkind: Pipeline\nmetadata:\n  name: product-images\n  namespace: layer\nspec:\n  target:\n    namespace: products\n  worker:\n    image: ghcr.io/hev/product-image-worker:latest\n    batchSize: 64\n    timeoutSeconds: 60\n  scaling:\n    pool: cpu\n    mode: autoscale\n    replicas:\n      min: 0\n      max: 8",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "spec.scaling.pool",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "mode: fixed",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "replicas.min",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "mode: disabled",
          "chunkId": "pipelines#pipeline-crd"
        },
        {
          "kind": "code",
          "literal": "spec.paused: true",
          "chunkId": "pipelines#pipeline-crd"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#pipeline-crd",
          "url": "/docs/pipelines#pipeline-crd",
          "anchor": "pipeline-crd"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pipeline",
        "declare",
        "operator",
        "should",
        "worker",
        "deployment",
        "scaling",
        "object",
        "naming",
        "target",
        "namespace",
        "inline",
        "whose",
        "pool",
        "must",
        "exist",
        "cluster",
        "infra",
        "rules",
        "fixed",
        "mode",
        "pins",
        "minimum",
        "disabled",
        "paused",
        "scales",
        "zero",
        "full",
        "detail",
        "page",
        "apiversion",
        "hevlayer",
        "v1alpha1",
        "kind",
        "metadata",
        "name",
        "product",
        "images",
        "layer",
        "spec"
      ]
    },
    {
      "id": "pipelines#pipeline-flow",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Pipeline flow",
      "group": "Guides",
      "url": "/docs/pipelines#pipeline-flow",
      "summary": "In the flow, a CPU worker reads source data, extracts and chunks it, and calls the stage endpoint (scaling on its input queue), while a GPU worker polls pipeline status for pending work, fetches chunks, embeds, and calls the vectors endpoint (scaling on pending count); the gateway handles chunk storage, vector upsert, and state tracking, and workers stay stateless and never connect to gateway-internal stores.",
      "facts": [
        {
          "kind": "code",
          "literal": "pending_count > 0",
          "chunkId": "pipelines#pipeline-flow"
        },
        {
          "kind": "code",
          "literal": "pending_count",
          "chunkId": "pipelines#pipeline-flow"
        },
        {
          "kind": "value",
          "literal": "e.g",
          "chunkId": "pipelines#pipeline-flow"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#pipeline-flow",
          "url": "/docs/pipelines#pipeline-flow",
          "anchor": "pipeline-flow"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "pipeline",
        "flow",
        "worker",
        "reads",
        "source",
        "data",
        "extracts",
        "chunks",
        "calls",
        "stage",
        "endpoint",
        "scaling",
        "input",
        "queue",
        "while",
        "polls",
        "status",
        "pending",
        "work",
        "fetches",
        "embeds",
        "vectors",
        "count",
        "gateway",
        "handles",
        "chunk",
        "storage",
        "vector",
        "upsert",
        "state",
        "tracking",
        "workers",
        "stay",
        "stateless",
        "never",
        "connect",
        "internal",
        "stores",
        "post",
        "pipelines"
      ]
    },
    {
      "id": "pipelines#prerequisites",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Prerequisites",
      "group": "Guides",
      "url": "/docs/pipelines#prerequisites",
      "summary": "Pipeline routes are registered only when the database connection is configured; the Helm chart points it at the gateway pod's loopback PostgreSQL sidecar and the migration runs automatically on startup.",
      "facts": [
        {
          "kind": "code",
          "literal": "export DATABASE_URL=postgres://hevlayer:hevlayer@localhost:5432/hevlayer",
          "chunkId": "pipelines#prerequisites"
        },
        {
          "kind": "code",
          "literal": "DATABASE_URL",
          "chunkId": "pipelines#prerequisites"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#prerequisites",
          "url": "/docs/pipelines#prerequisites",
          "anchor": "prerequisites"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "prerequisites",
        "pipeline",
        "routes",
        "registered",
        "only",
        "database",
        "connection",
        "configured",
        "helm",
        "chart",
        "points",
        "gateway",
        "loopback",
        "postgresql",
        "sidecar",
        "migration",
        "runs",
        "automatically",
        "startup",
        "export",
        "postgres",
        "hevlayer",
        "localhost",
        "5432",
        "databaseurl",
        "sets"
      ]
    },
    {
      "id": "pipelines#read-chunks-and-write-vectors-gpu-worker",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Read chunks and write vectors (GPU worker)",
      "group": "Guides",
      "url": "/docs/pipelines#read-chunks-and-write-vectors-gpu-worker",
      "summary": "A GPU worker reads a document's chunks from the gateway, then after embedding writes vectors back through a route that upserts to Turbopuffer and marks the document indexed.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/chunks",
          "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker"
        },
        {
          "kind": "code",
          "literal": "curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/vectors \\\n  -H 'content-type: application/json' \\\n  -d '{\n    \"vectors\": [\n      {\"id\": \"asin-B08N5WRWNW-0\", \"vector\": [0.0012, -0.043], \"attributes\": {\"text\": \"...\"}}\n    ]\n  }'",
          "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker"
        },
        {
          "kind": "code",
          "literal": "indexed",
          "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker",
          "url": "/docs/pipelines#read-chunks-and-write-vectors-gpu-worker",
          "anchor": "read-chunks-and-write-vectors-gpu-worker"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "read",
        "chunks",
        "write",
        "vectors",
        "worker",
        "reads",
        "document",
        "gateway",
        "after",
        "embedding",
        "writes",
        "back",
        "through",
        "route",
        "upserts",
        "turbopuffer",
        "marks",
        "indexed",
        "curl",
        "http",
        "8080",
        "pipelines",
        "product",
        "images",
        "documents",
        "asin",
        "b08n5wrwnw",
        "content",
        "type",
        "application",
        "json",
        "vector",
        "0012",
        "attributes",
        "text"
      ]
    },
    {
      "id": "pipelines#stage-a-document-cpu-worker",
      "kind": "section",
      "title": "Pipelines",
      "heading": "Stage a document (CPU worker)",
      "group": "Guides",
      "url": "/docs/pipelines#stage-a-document-cpu-worker",
      "summary": "A CPU worker stages a document by putting its chunks to the gateway; each chunk is stored durably in S3 and cached, the document is marked pending, and re-staging the same id replaces the previous chunk backing and resets it to pending.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW \\\n  -H 'content-type: application/json' \\\n  -d '{\n    \"chunks\": [\n      {\"id\": \"asin-B08N5WRWNW-0\", \"text\": \"Wireless noise-cancelling headphones\"},\n      {\"id\": \"asin-B08N5WRWNW-1\", \"text\": \"40-hour battery life\", \"metadata\": {\"page\": 2}}\n    ]\n  }'",
          "chunkId": "pipelines#stage-a-document-cpu-worker"
        },
        {
          "kind": "code",
          "literal": "pipe_{target_namespace}",
          "chunkId": "pipelines#stage-a-document-cpu-worker"
        },
        {
          "kind": "code",
          "literal": "pending",
          "chunkId": "pipelines#stage-a-document-cpu-worker"
        }
      ],
      "sources": [
        {
          "chunkId": "pipelines#stage-a-document-cpu-worker",
          "url": "/docs/pipelines#stage-a-document-cpu-worker",
          "anchor": "stage-a-document-cpu-worker"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "stage",
        "document",
        "worker",
        "stages",
        "putting",
        "chunks",
        "gateway",
        "chunk",
        "stored",
        "durably",
        "cached",
        "marked",
        "pending",
        "staging",
        "same",
        "replaces",
        "previous",
        "backing",
        "resets",
        "curl",
        "http",
        "8080",
        "pipelines",
        "product",
        "images",
        "documents",
        "asin",
        "b08n5wrwnw",
        "content",
        "type",
        "application",
        "json",
        "text",
        "wireless",
        "noise",
        "cancelling",
        "headphones",
        "hour",
        "battery",
        "life"
      ]
    },
    {
      "id": "roadmap",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": null,
      "group": "Overview",
      "url": "/docs/roadmap",
      "summary": "Introduces where hev layer is headed next and what has already shipped.",
      "facts": [],
      "sources": [
        {
          "chunkId": "roadmap",
          "url": "/docs/roadmap",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "introduces",
        "layer",
        "headed",
        "next",
        "already",
        "shipped"
      ]
    },
    {
      "id": "roadmap#01-release-uat",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "0.1 Release (UAT)",
      "group": "Overview",
      "url": "/docs/roadmap#01-release-uat",
      "summary": "Header marking the 0.1 release acceptance-testing milestone.",
      "facts": [
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "roadmap#01-release-uat"
        }
      ],
      "sources": [
        {
          "chunkId": "roadmap#01-release-uat",
          "url": "/docs/roadmap#01-release-uat",
          "anchor": "01-release-uat"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "release",
        "header",
        "marking",
        "acceptance",
        "testing",
        "milestone"
      ]
    },
    {
      "id": "roadmap#api-hardening",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "API hardening",
      "group": "Overview",
      "url": "/docs/roadmap#api-hardening",
      "summary": "Planned API hardening items: consolidating the scaling CRDs, redesigning the Index CRD, snapshot-scan naming conventions, and removing unused APIs, several tracked by RFCs.",
      "facts": [
        {
          "kind": "code",
          "literal": "Pipeline",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "code",
          "literal": "UDF",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "code",
          "literal": "InfraRules",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "code",
          "literal": "Index",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "value",
          "literal": "github.com",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "value",
          "literal": "0012-crd-scaling-consolidation.md",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "value",
          "literal": "0013-index-policy-surface.md",
          "chunkId": "roadmap#api-hardening"
        },
        {
          "kind": "value",
          "literal": "0014-snapshot-noun-scan-verb.md",
          "chunkId": "roadmap#api-hardening"
        }
      ],
      "sources": [
        {
          "chunkId": "roadmap#api-hardening",
          "url": "/docs/roadmap#api-hardening",
          "anchor": "api-hardening"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "hardening",
        "planned",
        "items",
        "consolidating",
        "scaling",
        "crds",
        "redesigning",
        "index",
        "snapshot",
        "scan",
        "naming",
        "conventions",
        "removing",
        "unused",
        "apis",
        "several",
        "tracked",
        "rfcs",
        "pipeline",
        "infrarules",
        "github",
        "0012",
        "consolidation",
        "0013",
        "policy",
        "surface",
        "0014",
        "noun",
        "verb",
        "redesign",
        "remove"
      ]
    },
    {
      "id": "roadmap#later",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "Later",
      "group": "Overview",
      "url": "/docs/roadmap#later",
      "summary": "Longer-horizon roadmap items including scoped API keys and entitlements, soft delete with restore, hybrid fuzzy text fusion, typeahead, temporal as-of queries, branching, an exact kNN result cache, A/B variant indexes, per-query LLM-judged quality, pipeline crash recovery, dead-letter listing, a Python UDF push dev experience, and a cost API.",
      "facts": [
        {
          "kind": "code",
          "literal": "as_of",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "/query",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "/scans",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "/fetch",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "/snapshots",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "copy_from_with_filter",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "code",
          "literal": "layer push",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "value",
          "literal": "github.com",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "value",
          "literal": "0022-hybrid-text-fusion.md",
          "chunkId": "roadmap#later"
        },
        {
          "kind": "value",
          "literal": "0020-temporal-queries.md",
          "chunkId": "roadmap#later"
        }
      ],
      "sources": [
        {
          "chunkId": "roadmap#later",
          "url": "/docs/roadmap#later",
          "anchor": "later"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "later",
        "longer",
        "horizon",
        "roadmap",
        "items",
        "including",
        "scoped",
        "keys",
        "entitlements",
        "soft",
        "delete",
        "restore",
        "hybrid",
        "fuzzy",
        "text",
        "fusion",
        "typeahead",
        "temporal",
        "queries",
        "branching",
        "exact",
        "result",
        "cache",
        "variant",
        "indexes",
        "query",
        "judged",
        "quality",
        "pipeline",
        "crash",
        "recovery",
        "dead",
        "letter",
        "listing",
        "python",
        "push",
        "experience",
        "cost",
        "scans",
        "fetch"
      ]
    },
    {
      "id": "roadmap#lifecycle-and-operability",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "Lifecycle and operability",
      "group": "Overview",
      "url": "/docs/roadmap#lifecycle-and-operability",
      "summary": "Shipped lifecycle and operability items: autoscaling compute for pipelines and UDFs, a document-cache endpoint for multi-stage pipelines, index snapshot history, coordinated delete, and the Helm and Terraform install scripts.",
      "facts": [],
      "sources": [
        {
          "chunkId": "roadmap#lifecycle-and-operability",
          "url": "/docs/roadmap#lifecycle-and-operability",
          "anchor": "lifecycle-and-operability"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "lifecycle",
        "operability",
        "shipped",
        "items",
        "autoscaling",
        "compute",
        "pipelines",
        "udfs",
        "document",
        "cache",
        "endpoint",
        "multi",
        "stage",
        "index",
        "snapshot",
        "history",
        "coordinated",
        "delete",
        "helm",
        "terraform",
        "install",
        "scripts",
        "building"
      ]
    },
    {
      "id": "roadmap#search",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "Search",
      "group": "Overview",
      "url": "/docs/roadmap#search",
      "summary": "Shipped search features: strongly consistent queries during heavy writes, result count over ranked queries via scatter/gather, precomputed facet listings and counts in snapshots, scans for filter IDs and counts, search-by-id, search history saved to S3, and enhanced namespace metadata.",
      "facts": [
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at",
          "chunkId": "roadmap#search"
        }
      ],
      "sources": [
        {
          "chunkId": "roadmap#search",
          "url": "/docs/roadmap#search",
          "anchor": "search"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "search",
        "shipped",
        "features",
        "strongly",
        "consistent",
        "queries",
        "during",
        "heavy",
        "writes",
        "result",
        "count",
        "ranked",
        "scatter",
        "gather",
        "precomputed",
        "facet",
        "listings",
        "counts",
        "snapshots",
        "scans",
        "filter",
        "history",
        "saved",
        "enhanced",
        "namespace",
        "metadata",
        "hevlayer",
        "upserted",
        "hevlayerupsertedat",
        "vector",
        "available",
        "snapshot",
        "document",
        "cached"
      ]
    },
    {
      "id": "roadmap#surfaces",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "Surfaces",
      "group": "Overview",
      "url": "/docs/roadmap#surfaces",
      "summary": "Shipped surfaces: a dashboard MVP with basic CRD management and observability, and an official Python SDK.",
      "facts": [],
      "sources": [
        {
          "chunkId": "roadmap#surfaces",
          "url": "/docs/roadmap#surfaces",
          "anchor": "surfaces"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "surfaces",
        "shipped",
        "dashboard",
        "basic",
        "management",
        "observability",
        "official",
        "python"
      ]
    },
    {
      "id": "roadmap#up-next",
      "kind": "section",
      "title": "Roadmap & Changelog",
      "heading": "Up Next",
      "group": "Overview",
      "url": "/docs/roadmap#up-next",
      "summary": "Near-term items: count and scan primitives and route renames, an indexing failure-mode runbook, embedding UDF writeback via re-upsert, a namespace-init UDF, a snapshot-aware ready signal, a full dashboard redesign, and a kube-style CLI over the gateway REST API.",
      "facts": [
        {
          "kind": "code",
          "literal": "layer.is_stable",
          "chunkId": "roadmap#up-next"
        },
        {
          "kind": "code",
          "literal": "layer",
          "chunkId": "roadmap#up-next"
        },
        {
          "kind": "value",
          "literal": "github.com",
          "chunkId": "roadmap#up-next"
        },
        {
          "kind": "value",
          "literal": "0019-count-and-scan-primitives.md",
          "chunkId": "roadmap#up-next"
        }
      ],
      "sources": [
        {
          "chunkId": "roadmap#up-next",
          "url": "/docs/roadmap#up-next",
          "anchor": "up-next"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "next",
        "near",
        "term",
        "items",
        "count",
        "scan",
        "primitives",
        "route",
        "renames",
        "indexing",
        "failure",
        "mode",
        "runbook",
        "embedding",
        "writeback",
        "upsert",
        "namespace",
        "init",
        "snapshot",
        "aware",
        "ready",
        "signal",
        "full",
        "dashboard",
        "redesign",
        "kube",
        "style",
        "gateway",
        "rest",
        "layer",
        "stable",
        "github",
        "0019",
        "filter",
        "truncation",
        "removal",
        "aerospike",
        "stop",
        "writes",
        "postgres"
      ]
    },
    {
      "id": "scans",
      "kind": "section",
      "title": "Scans",
      "heading": null,
      "group": "Guides",
      "url": "/docs/scans",
      "summary": "Scans answer ad hoc filter questions about a namespace: ID mode creates an async job returning matching ids, and count mode returns one number synchronously using the latest snapshot when the filter is covered; uses include bulk exports, manual inspection, UDF discovery debugging, cache/origin consistency checks, and exact filter row counts.",
      "facts": [],
      "sources": [
        {
          "chunkId": "scans",
          "url": "/docs/scans",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scans",
        "answer",
        "filter",
        "questions",
        "about",
        "namespace",
        "mode",
        "creates",
        "async",
        "returning",
        "matching",
        "count",
        "returns",
        "number",
        "synchronously",
        "latest",
        "snapshot",
        "covered",
        "uses",
        "include",
        "bulk",
        "exports",
        "manual",
        "inspection",
        "discovery",
        "debugging",
        "cache",
        "origin",
        "consistency",
        "checks",
        "exact",
        "counts",
        "shaped",
        "jobs",
        "synchronous",
        "asynchronous",
        "document"
      ]
    },
    {
      "id": "scans#count-scans",
      "kind": "section",
      "title": "Scans",
      "heading": "Count scans",
      "group": "Guides",
      "url": "/docs/scans#count-scans",
      "summary": "A count scan posts a filter and source and returns a count with the serving source; auto checks the latest snapshot first for single-field equality and membership filters and falls through to cache or origin otherwise, while an explicit snapshot source requires a supported filter and fails with a precondition error if unsupported.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl -X POST http://gateway:8080/v2/namespaces/products/scans \\\n  -H 'content-type: application/json' \\\n  -d '{\"mode\": \"count\", \"source\": \"auto\", \"filters\": [\"category\", \"Eq\", \"Electronics\"]}'",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "{\n  \"count\": 4210,\n  \"served_by\": \"snapshot\",\n  \"snapshot_sha\": \"3f9e8b21\",\n  \"watermark_ms\": 1747300000123,\n  \"elapsed_ms\": 3\n}",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "source: auto",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "Eq",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "In",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "snapshot",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "source: snapshot",
          "chunkId": "scans#count-scans"
        },
        {
          "kind": "code",
          "literal": "412 precondition_failed",
          "chunkId": "scans#count-scans"
        }
      ],
      "sources": [
        {
          "chunkId": "scans#count-scans",
          "url": "/docs/scans#count-scans",
          "anchor": "count-scans"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "count",
        "scans",
        "scan",
        "posts",
        "filter",
        "source",
        "returns",
        "serving",
        "auto",
        "checks",
        "latest",
        "snapshot",
        "first",
        "single",
        "field",
        "equality",
        "membership",
        "filters",
        "falls",
        "through",
        "cache",
        "origin",
        "otherwise",
        "while",
        "explicit",
        "requires",
        "supported",
        "fails",
        "precondition",
        "error",
        "unsupported",
        "curl",
        "post",
        "http",
        "gateway",
        "8080",
        "namespaces",
        "products",
        "content",
        "type"
      ]
    },
    {
      "id": "scans#filters",
      "kind": "section",
      "title": "Scans",
      "heading": "Filters",
      "group": "Guides",
      "url": "/docs/scans#filters",
      "summary": "Scans accept the same filter array as query; on origin scans the filter is pushed to Turbopuffer and on cache scans the gateway evaluates a supported set of operators against cached attributes. Auto uses origin when the cache cannot evaluate a filter, while an explicit cache source with an unsupported filter fails rather than returning partial results.",
      "facts": [
        {
          "kind": "code",
          "literal": "Eq",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "NotEq",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Gt",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Gte",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Lt",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Lte",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "In",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "NotIn",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "And",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Or",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "Not",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "scans#filters"
        },
        {
          "kind": "code",
          "literal": "source: cache",
          "chunkId": "scans#filters"
        }
      ],
      "sources": [
        {
          "chunkId": "scans#filters",
          "url": "/docs/scans#filters",
          "anchor": "filters"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "filters",
        "scans",
        "accept",
        "same",
        "filter",
        "array",
        "query",
        "origin",
        "pushed",
        "turbopuffer",
        "cache",
        "gateway",
        "evaluates",
        "supported",
        "operators",
        "against",
        "cached",
        "attributes",
        "auto",
        "uses",
        "cannot",
        "evaluate",
        "while",
        "explicit",
        "source",
        "unsupported",
        "fails",
        "rather",
        "returning",
        "partial",
        "results",
        "noteq",
        "notin",
        "document",
        "sees"
      ]
    },
    {
      "id": "scans#id-scans",
      "kind": "section",
      "title": "Scans",
      "heading": "ID scans",
      "group": "Guides",
      "url": "/docs/scans#id-scans",
      "summary": "An ID scan posts a filter and source and returns an accepted job; the caller polls the job and then reads the matching ids paginated from a results route.",
      "facts": [
        {
          "kind": "code",
          "literal": "curl -X POST http://gateway:8080/v2/namespaces/products/scans \\\n  -H 'content-type: application/json' \\\n  -d '{\"mode\": \"ids\", \"source\": \"auto\", \"filters\": [\"category\", \"Eq\", \"Electronics\"]}'",
          "chunkId": "scans#id-scans"
        },
        {
          "kind": "code",
          "literal": "{\n  \"id\": \"scan-uuid\",\n  \"namespace\": \"products\",\n  \"source\": \"auto\",\n  \"status\": \"running\",\n  \"progress\": 0,\n  \"documents_scanned\": 0,\n  \"created_at\": \"2026-05-26T10:00:00Z\"\n}",
          "chunkId": "scans#id-scans"
        },
        {
          "kind": "code",
          "literal": "curl http://gateway:8080/v2/namespaces/products/scans/scan-uuid\ncurl 'http://gateway:8080/v2/namespaces/products/scans/scan-uuid/results?limit=1000'",
          "chunkId": "scans#id-scans"
        },
        {
          "kind": "code",
          "literal": "202 Accepted",
          "chunkId": "scans#id-scans"
        }
      ],
      "sources": [
        {
          "chunkId": "scans#id-scans",
          "url": "/docs/scans#id-scans",
          "anchor": "id-scans"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scans",
        "scan",
        "posts",
        "filter",
        "source",
        "returns",
        "accepted",
        "caller",
        "polls",
        "reads",
        "matching",
        "paginated",
        "results",
        "route",
        "curl",
        "post",
        "http",
        "gateway",
        "8080",
        "namespaces",
        "products",
        "content",
        "type",
        "application",
        "json",
        "mode",
        "auto",
        "filters",
        "category",
        "electronics",
        "uuid",
        "namespace",
        "status",
        "running",
        "progress",
        "documents",
        "scanned",
        "created",
        "2026",
        "26t10"
      ]
    },
    {
      "id": "scans#operational-notes",
      "kind": "section",
      "title": "Scans",
      "heading": "Operational notes",
      "group": "Guides",
      "url": "/docs/scans#operational-notes",
      "summary": "ID scan state is in-memory and resets on gateway restart, count scans carry a deadline with a server-side maximum, snapshot-served counts are exact at the snapshot watermark, and live counts include bounded, timed-out, and shard fields.",
      "facts": [
        {
          "kind": "code",
          "literal": "watermark_ms",
          "chunkId": "scans#operational-notes"
        },
        {
          "kind": "code",
          "literal": "bounded",
          "chunkId": "scans#operational-notes"
        },
        {
          "kind": "code",
          "literal": "timed_out",
          "chunkId": "scans#operational-notes"
        }
      ],
      "sources": [
        {
          "chunkId": "scans#operational-notes",
          "url": "/docs/scans#operational-notes",
          "anchor": "operational-notes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "operational",
        "notes",
        "scan",
        "state",
        "memory",
        "resets",
        "gateway",
        "restart",
        "count",
        "scans",
        "carry",
        "deadline",
        "server",
        "side",
        "maximum",
        "snapshot",
        "served",
        "counts",
        "exact",
        "watermark",
        "live",
        "include",
        "bounded",
        "timed",
        "shard",
        "fields",
        "ephemeral",
        "default",
        "300s",
        "watermarkms",
        "timedout"
      ]
    },
    {
      "id": "scans#sources",
      "kind": "section",
      "title": "Scans",
      "heading": "Sources",
      "group": "Guides",
      "url": "/docs/scans#sources",
      "summary": "Lists the scan sources per mode: auto (cache when fresh else origin; snapshot first then cache/origin for counts), snapshot (count only, requiring eligible equality/membership), cache (cache only), and origin (paginated upstream scan). When auto resolves to cache the gateway adds a warmed-through upper bound before the user filter so the scan is a stable warmed view.",
      "facts": [
        {
          "kind": "code",
          "literal": "auto",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "snapshot",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "Eq",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "In",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "cache",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "origin",
          "chunkId": "scans#sources"
        },
        {
          "kind": "code",
          "literal": "_hevlayer_upserted_at <= cache_warmed_through",
          "chunkId": "scans#sources"
        }
      ],
      "sources": [
        {
          "chunkId": "scans#sources",
          "url": "/docs/scans#sources",
          "anchor": "sources"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "sources",
        "lists",
        "scan",
        "mode",
        "auto",
        "cache",
        "fresh",
        "else",
        "origin",
        "snapshot",
        "first",
        "counts",
        "count",
        "only",
        "requiring",
        "eligible",
        "equality",
        "membership",
        "paginated",
        "upstream",
        "resolves",
        "gateway",
        "adds",
        "warmed",
        "through",
        "upper",
        "bound",
        "before",
        "user",
        "filter",
        "stable",
        "view",
        "hevlayer",
        "upserted",
        "source",
        "enough",
        "otherwise",
        "supported",
        "latest",
        "requires"
      ]
    },
    {
      "id": "search-knowledge-graph",
      "kind": "section",
      "title": "Search Knowledge Graph",
      "heading": null,
      "group": "Guides",
      "url": "/docs/search-knowledge-graph",
      "summary": "This page documents the generated knowledge graph the docs search bundles to expand domain terms before ranking pages, including query context, canonical terms, aliases, and the raw JSON artifact rendered from the committed site build.",
      "facts": [
        {
          "kind": "value",
          "literal": "KnowledgeGraphView.astro",
          "chunkId": "search-knowledge-graph"
        }
      ],
      "sources": [
        {
          "chunkId": "search-knowledge-graph",
          "url": "/docs/search-knowledge-graph",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "page",
        "documents",
        "generated",
        "knowledge",
        "graph",
        "docs",
        "search",
        "bundles",
        "expand",
        "domain",
        "terms",
        "before",
        "ranking",
        "pages",
        "including",
        "query",
        "context",
        "canonical",
        "aliases",
        "json",
        "artifact",
        "rendered",
        "committed",
        "site",
        "build",
        "knowledgegraphview",
        "astro",
        "currently",
        "bundled",
        "layer",
        "endpoint",
        "uses",
        "candidate",
        "renders",
        "exact"
      ]
    },
    {
      "id": "search-knowledge-graph#current-graph",
      "kind": "section",
      "title": "Search Knowledge Graph",
      "heading": "Current graph",
      "group": "Guides",
      "url": "/docs/search-knowledge-graph#current-graph",
      "summary": "Header introducing the rendering of the current committed knowledge-graph artifact.",
      "facts": [],
      "sources": [
        {
          "chunkId": "search-knowledge-graph#current-graph",
          "url": "/docs/search-knowledge-graph#current-graph",
          "anchor": "current-graph"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "current",
        "graph",
        "header",
        "introducing",
        "rendering",
        "committed",
        "knowledge",
        "artifact"
      ]
    },
    {
      "id": "tradeoffs",
      "kind": "section",
      "title": "Tradeoffs",
      "heading": null,
      "group": "Overview",
      "url": "/docs/tradeoffs",
      "summary": "This page makes Layer's design tradeoffs explicit, with configuration offered where possible: it adds query-path latency through an extra network hop (not configurable) and an optionally configurable strong-consistency query plan, and increases index storage through secondary indexing for upsert-time filtering and for scatter/gather sharding (both not configurable).",
      "facts": [
        {
          "kind": "flag",
          "literal": "--muted",
          "chunkId": "tradeoffs"
        },
        {
          "kind": "flag",
          "literal": "--signal",
          "chunkId": "tradeoffs"
        }
      ],
      "sources": [
        {
          "chunkId": "tradeoffs",
          "url": "/docs/tradeoffs",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "page",
        "makes",
        "layer",
        "design",
        "tradeoffs",
        "explicit",
        "configuration",
        "offered",
        "possible",
        "adds",
        "query",
        "path",
        "latency",
        "through",
        "extra",
        "network",
        "configurable",
        "optionally",
        "strong",
        "consistency",
        "plan",
        "increases",
        "index",
        "storage",
        "secondary",
        "indexing",
        "upsert",
        "time",
        "filtering",
        "scatter",
        "gather",
        "sharding",
        "both",
        "muted",
        "signal",
        "current",
        "product",
        "posture",
        "cases",
        "trying"
      ]
    },
    {
      "id": "udfs",
      "kind": "section",
      "title": "UDFs",
      "heading": null,
      "group": "Guides",
      "url": "/docs/udfs",
      "summary": "A UDF is a stateless worker that preserves row count, producing one derived attribute per input row, used for embeddings, classifications, tags, and backfills; use a pipeline when external data becomes rows or one row fans out into many, and a UDF when existing rows acquire derived attributes. The gateway runs an ID-scan discovery, enqueues rows, leases them to a worker via claim/complete, and writes results back to Turbopuffer.",
      "facts": [
        {
          "kind": "value",
          "literal": "Diagram.astro",
          "chunkId": "udfs"
        },
        {
          "kind": "value",
          "literal": "spec.filter",
          "chunkId": "udfs"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs",
          "url": "/docs/udfs",
          "anchor": null
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "stateless",
        "worker",
        "preserves",
        "count",
        "producing",
        "derived",
        "attribute",
        "input",
        "embeddings",
        "classifications",
        "tags",
        "backfills",
        "pipeline",
        "external",
        "data",
        "becomes",
        "rows",
        "fans",
        "many",
        "existing",
        "acquire",
        "attributes",
        "gateway",
        "runs",
        "scan",
        "discovery",
        "enqueues",
        "leases",
        "claim",
        "complete",
        "writes",
        "results",
        "back",
        "turbopuffer",
        "diagram",
        "astro",
        "spec",
        "filter",
        "user",
        "defined"
      ]
    },
    {
      "id": "udfs#author-a-worker",
      "kind": "section",
      "title": "UDFs",
      "heading": "Author a worker",
      "group": "Guides",
      "url": "/docs/udfs#author-a-worker",
      "summary": "The Python SDK turns a normal function into the claim/process/complete loop via a decorator declaring inputs, output attribute, and kind; function parameters are keyword-only and named to match the inputs, and the author raises a transient error for retryable work and a permanent error for unrecoverable input.",
      "facts": [
        {
          "kind": "code",
          "literal": "inputs",
          "chunkId": "udfs#author-a-worker"
        },
        {
          "kind": "code",
          "literal": "TransientError",
          "chunkId": "udfs#author-a-worker"
        },
        {
          "kind": "code",
          "literal": "PermanentError",
          "chunkId": "udfs#author-a-worker"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#author-a-worker",
          "url": "/docs/udfs#author-a-worker",
          "anchor": "author-a-worker"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "author",
        "worker",
        "python",
        "turns",
        "normal",
        "function",
        "claim",
        "process",
        "complete",
        "loop",
        "decorator",
        "declaring",
        "inputs",
        "output",
        "attribute",
        "kind",
        "parameters",
        "keyword",
        "only",
        "named",
        "match",
        "raises",
        "transient",
        "error",
        "retryable",
        "work",
        "permanent",
        "unrecoverable",
        "input",
        "transienterror",
        "permanenterror",
        "hevlayer",
        "import",
        "runudfworker",
        "title",
        "description",
        "tags",
        "tagproduct",
        "none",
        "list"
      ]
    },
    {
      "id": "udfs#declare-the-function",
      "kind": "section",
      "title": "UDFs",
      "heading": "Declare the function",
      "group": "Guides",
      "url": "/docs/udfs#declare-the-function",
      "summary": "A UDF is declared by applying a Function CRD, from which the operator emits a worker Deployment, an optional push Service, and a scaling object, and from which the gateway registers the UDF queue and discovery policy; the filter uses the same tuple syntax as upstream queries, the worker pod receives Layer environment variables, and the CRD is the source of truth so the runtime routes are only for registration without the operator or for manual recovery.",
      "facts": [
        {
          "kind": "code",
          "literal": "Function",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "Deployment",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "Service",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "ScaledObject",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "spec.scaling",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "spec.filter",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "HEVLAYER_UDF_ID",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "HEVLAYER_BASE_URL",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "HEVLAYER_UDF_BATCH_SIZE",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "HEVLAYER_UDF_TIMEOUT_SECONDS",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "HEVLAYER_UDF_LEASE_SECONDS",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "LAYER_GATEWAY_API_KEY",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/discover",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "claim",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "code",
          "literal": "complete",
          "chunkId": "udfs#declare-the-function"
        },
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "udfs#declare-the-function"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#declare-the-function",
          "url": "/docs/udfs#declare-the-function",
          "anchor": "declare-the-function"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "declare",
        "function",
        "declared",
        "applying",
        "operator",
        "emits",
        "worker",
        "deployment",
        "optional",
        "push",
        "service",
        "scaling",
        "object",
        "gateway",
        "registers",
        "queue",
        "discovery",
        "policy",
        "filter",
        "uses",
        "same",
        "tuple",
        "syntax",
        "upstream",
        "queries",
        "receives",
        "layer",
        "environment",
        "variables",
        "source",
        "truth",
        "runtime",
        "routes",
        "only",
        "registration",
        "without",
        "manual",
        "recovery",
        "scaledobject",
        "spec"
      ]
    },
    {
      "id": "udfs#gateway-api",
      "kind": "section",
      "title": "UDFs",
      "heading": "Gateway API",
      "group": "Guides",
      "url": "/docs/udfs#gateway-api",
      "summary": "In Kubernetes installs the Function CRD is the source of truth and the runtime API is registered from it; these routes are the same surface the Python SDK drives and the path for registering a UDF without the operator or coordinating and recovering workers by hand.",
      "facts": [],
      "sources": [
        {
          "chunkId": "udfs#gateway-api",
          "url": "/docs/udfs#gateway-api",
          "anchor": "gateway-api"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "gateway",
        "kubernetes",
        "installs",
        "function",
        "source",
        "truth",
        "runtime",
        "registered",
        "these",
        "routes",
        "same",
        "surface",
        "python",
        "drives",
        "path",
        "registering",
        "without",
        "operator",
        "coordinating",
        "recovering",
        "workers",
        "hand",
        "below",
        "reach",
        "register",
        "coordinate",
        "recover"
      ]
    },
    {
      "id": "udfs#lifecycle",
      "kind": "section",
      "title": "UDFs",
      "heading": "Lifecycle",
      "group": "Guides",
      "url": "/docs/udfs#lifecycle",
      "summary": "UDF lifecycle is managed via kubectl on the Function resource and gateway routes: read status, pause and resume by patching the spec, reset failed rows, and delete; deletion garbage-collects the operator-managed Deployment, Service, and scaling object but does not delete written outputs.",
      "facts": [],
      "sources": [
        {
          "chunkId": "udfs#lifecycle",
          "url": "/docs/udfs#lifecycle",
          "anchor": "lifecycle"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "lifecycle",
        "managed",
        "kubectl",
        "function",
        "resource",
        "gateway",
        "routes",
        "read",
        "status",
        "pause",
        "resume",
        "patching",
        "spec",
        "reset",
        "failed",
        "rows",
        "delete",
        "deletion",
        "garbage",
        "collects",
        "operator",
        "deployment",
        "service",
        "scaling",
        "object",
        "does",
        "written",
        "outputs",
        "product",
        "tags",
        "describe",
        "curl",
        "authorization",
        "bearer",
        "layergatewayapikey",
        "layergatewayurl",
        "udfs",
        "patch",
        "type",
        "merge"
      ]
    },
    {
      "id": "udfs#lifecycle-routes",
      "kind": "section",
      "title": "UDFs",
      "heading": "Lifecycle routes",
      "group": "Guides",
      "url": "/docs/udfs#lifecycle-routes",
      "summary": "Lists the lifecycle routes: pause (stop discovery and dispatch, draining in-flight), resume, reset-failed (move failed rows back to pending), and discover (trigger an immediate sweep); reset-failed is the recovery path after a transient incident, while permanent issues need fixing the input shape or bumping the output version and re-applying.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/pause",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/resume",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/reset-failed",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "failed",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "pending",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/discover",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "reset-failed",
          "chunkId": "udfs#lifecycle-routes"
        },
        {
          "kind": "code",
          "literal": "spec.output.version",
          "chunkId": "udfs#lifecycle-routes"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#lifecycle-routes",
          "url": "/docs/udfs#lifecycle-routes",
          "anchor": "lifecycle-routes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "lifecycle",
        "routes",
        "lists",
        "pause",
        "stop",
        "discovery",
        "dispatch",
        "draining",
        "flight",
        "resume",
        "reset",
        "failed",
        "move",
        "rows",
        "back",
        "pending",
        "discover",
        "trigger",
        "immediate",
        "sweep",
        "recovery",
        "path",
        "after",
        "transient",
        "incident",
        "while",
        "permanent",
        "issues",
        "need",
        "fixing",
        "input",
        "shape",
        "bumping",
        "output",
        "version",
        "applying",
        "post",
        "udfs",
        "spec",
        "route"
      ]
    },
    {
      "id": "udfs#not-in-01",
      "kind": "section",
      "title": "UDFs",
      "heading": "Not in 0.1",
      "group": "Guides",
      "url": "/docs/udfs#not-in-01",
      "summary": "Not in 0.1: cross-namespace aggregate UDFs, chunkers or fan-out transforms (which remain pipelines), multi-output UDFs, and managed image builds.",
      "facts": [
        {
          "kind": "value",
          "literal": "0.1",
          "chunkId": "udfs#not-in-01"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#not-in-01",
          "url": "/docs/udfs#not-in-01",
          "anchor": "not-in-01"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "cross",
        "namespace",
        "aggregate",
        "udfs",
        "chunkers",
        "transforms",
        "remain",
        "pipelines",
        "multi",
        "output",
        "managed",
        "image",
        "builds",
        "those"
      ]
    },
    {
      "id": "udfs#scaling-and-placement",
      "kind": "section",
      "title": "UDFs",
      "heading": "Scaling and placement",
      "group": "Guides",
      "url": "/docs/udfs#scaling-and-placement",
      "summary": "A UDF's scaling spec names a compute pool, a mode, and replica min/max bounded by the pool ceiling, with the minimum set to one for warm workers; the cluster infra-rules object owns shared placement (selectors, tolerations, resource requests, replica ceilings) so workload specs only choose a pool, and extra pod-level config is deep-merged from the worker pod spec without merging container array overrides.",
      "facts": [
        {
          "kind": "code",
          "literal": "spec.scaling",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "pool",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "InfraRules/default",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "mode",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "autoscale",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "fixed",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "disabled",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "replicas.min",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "replicas.max",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "InfraRules",
          "chunkId": "udfs#scaling-and-placement"
        },
        {
          "kind": "code",
          "literal": "spec.worker.podSpec",
          "chunkId": "udfs#scaling-and-placement"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#scaling-and-placement",
          "url": "/docs/udfs#scaling-and-placement",
          "anchor": "scaling-and-placement"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "scaling",
        "placement",
        "spec",
        "names",
        "compute",
        "pool",
        "mode",
        "replica",
        "bounded",
        "ceiling",
        "minimum",
        "warm",
        "workers",
        "cluster",
        "infra",
        "rules",
        "object",
        "owns",
        "shared",
        "selectors",
        "tolerations",
        "resource",
        "requests",
        "ceilings",
        "workload",
        "specs",
        "only",
        "choose",
        "extra",
        "level",
        "config",
        "deep",
        "merged",
        "worker",
        "without",
        "merging",
        "container",
        "array",
        "overrides",
        "infrarules"
      ]
    },
    {
      "id": "udfs#spec-routes",
      "kind": "section",
      "title": "UDFs",
      "heading": "Spec routes",
      "group": "Guides",
      "url": "/docs/udfs#spec-routes",
      "summary": "Lists the UDF spec routes (create a definition and queue, list, read, delete which preserves written output, and read status counts); the create body carries the same shape the CRD spec expresses, covering target namespaces, inputs, output, filter, triggers, worker, schedule, and retry.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/udfs",
          "chunkId": "udfs#spec-routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/udfs",
          "chunkId": "udfs#spec-routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/udfs/{id}",
          "chunkId": "udfs#spec-routes"
        },
        {
          "kind": "code",
          "literal": "DELETE /v2/udfs/{id}",
          "chunkId": "udfs#spec-routes"
        },
        {
          "kind": "code",
          "literal": "GET /v2/udfs/{id}/status",
          "chunkId": "udfs#spec-routes"
        },
        {
          "kind": "code",
          "literal": "spec",
          "chunkId": "udfs#spec-routes"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#spec-routes",
          "url": "/docs/udfs#spec-routes",
          "anchor": "spec-routes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "spec",
        "routes",
        "lists",
        "create",
        "definition",
        "queue",
        "list",
        "read",
        "delete",
        "preserves",
        "written",
        "output",
        "status",
        "counts",
        "body",
        "carries",
        "same",
        "shape",
        "expresses",
        "covering",
        "target",
        "namespaces",
        "inputs",
        "filter",
        "triggers",
        "worker",
        "schedule",
        "retry",
        "post",
        "udfs",
        "route",
        "behavior",
        "does",
        "depth",
        "flight",
        "failed",
        "content",
        "type",
        "application",
        "json"
      ]
    },
    {
      "id": "udfs#tuning-knobs",
      "kind": "section",
      "title": "UDFs",
      "heading": "Tuning knobs",
      "group": "Guides",
      "url": "/docs/udfs#tuning-knobs",
      "summary": "Lists the UDF tuning knobs: rows per batch, worker call timeout, claim lease duration, time between discovery scans, concurrent in-flight batches per UDF, concurrent discovery scans, and retry attempts before a row lands in failed.",
      "facts": [
        {
          "kind": "code",
          "literal": "worker.batchSize",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "worker.timeoutSeconds",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "schedule.leaseSeconds",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "schedule.discoveryIntervalSeconds",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "schedule.maxInFlightBatches",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "schedule.maxConcurrentScans",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "retry.maxAttempts",
          "chunkId": "udfs#tuning-knobs"
        },
        {
          "kind": "code",
          "literal": "failed",
          "chunkId": "udfs#tuning-knobs"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#tuning-knobs",
          "url": "/docs/udfs#tuning-knobs",
          "anchor": "tuning-knobs"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "tuning",
        "knobs",
        "lists",
        "rows",
        "batch",
        "worker",
        "call",
        "timeout",
        "claim",
        "lease",
        "duration",
        "time",
        "between",
        "discovery",
        "scans",
        "concurrent",
        "flight",
        "batches",
        "retry",
        "attempts",
        "before",
        "lands",
        "failed",
        "batchsize",
        "timeoutseconds",
        "schedule",
        "leaseseconds",
        "discoveryintervalseconds",
        "maxinflightbatches",
        "maxconcurrentscans",
        "maxattempts",
        "knob",
        "bounds",
        "long",
        "held",
        "reissue",
        "scan",
        "jobs",
        "namespace",
        "tries"
      ]
    },
    {
      "id": "udfs#version-markers",
      "kind": "section",
      "title": "UDFs",
      "heading": "Version markers",
      "group": "Guides",
      "url": "/docs/udfs#version-markers",
      "summary": "The output version field is the re-run safety rail: when set, the gateway stamps a per-output version marker alongside every write, so bumping the version and keeping the canonical stale filter triggers re-processing when a model, taxonomy, or prompt changes.",
      "facts": [
        {
          "kind": "code",
          "literal": "spec.output.version",
          "chunkId": "udfs#version-markers"
        },
        {
          "kind": "code",
          "literal": "{attribute}_v",
          "chunkId": "udfs#version-markers"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#version-markers",
          "url": "/docs/udfs#version-markers",
          "anchor": "version-markers"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "version",
        "markers",
        "output",
        "field",
        "safety",
        "rail",
        "gateway",
        "stamps",
        "marker",
        "alongside",
        "every",
        "write",
        "bumping",
        "keeping",
        "canonical",
        "stale",
        "filter",
        "triggers",
        "processing",
        "model",
        "taxonomy",
        "prompt",
        "changes",
        "spec",
        "attribute",
        "bump",
        "keep"
      ]
    },
    {
      "id": "udfs#worker-coordination-routes",
      "kind": "section",
      "title": "UDFs",
      "heading": "Worker coordination routes",
      "group": "Guides",
      "url": "/docs/udfs#worker-coordination-routes",
      "summary": "Lists the worker coordination routes (claim a batch, heartbeat to extend leases, complete to persist output, and fail) that the SDK's worker loop implements so most workloads never call them directly; claim returns batches as namespace/id pairs with declared input columns, rows that cannot be bound surface as explicit bind errors, and on failure a transient kind honors retry while a permanent kind dead-letters immediately.",
      "facts": [
        {
          "kind": "code",
          "literal": "POST /v2/udfs/product-tags/items/complete\nContent-Type: application/json\n\n{\n  \"worker_id\": \"udf-worker-0\",\n  \"items\": [\n    {\"namespace\": \"amazon-products\", \"id\": \"asin-B08N5WRWNW\", \"output\": [\"wireless\", \"waterproof\"]}\n  ]\n}",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/claim",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/items/heartbeat",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/items/complete",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "POST /v2/udfs/{id}/items/fail",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "run_udf_worker",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "claim",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "(namespace, id)",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "fail",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "kind: transient",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "spec.retry",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "kind: permanent",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "kind",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "TransientError",
          "chunkId": "udfs#worker-coordination-routes"
        },
        {
          "kind": "code",
          "literal": "PermanentError",
          "chunkId": "udfs#worker-coordination-routes"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#worker-coordination-routes",
          "url": "/docs/udfs#worker-coordination-routes",
          "anchor": "worker-coordination-routes"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "worker",
        "coordination",
        "routes",
        "lists",
        "claim",
        "batch",
        "heartbeat",
        "extend",
        "leases",
        "complete",
        "persist",
        "output",
        "fail",
        "loop",
        "implements",
        "most",
        "workloads",
        "never",
        "call",
        "directly",
        "returns",
        "batches",
        "namespace",
        "pairs",
        "declared",
        "input",
        "columns",
        "rows",
        "cannot",
        "bound",
        "surface",
        "explicit",
        "bind",
        "errors",
        "failure",
        "transient",
        "kind",
        "honors",
        "retry",
        "while"
      ]
    },
    {
      "id": "udfs#writeback-and-discovery",
      "kind": "section",
      "title": "UDFs",
      "heading": "Writeback and discovery",
      "group": "Guides",
      "url": "/docs/udfs#writeback-and-discovery",
      "summary": "UDF outputs are patched onto the target row as the named attribute with the same writeback semantics across kinds, and when the output version is set the gateway atomically writes the output and its version marker in a single patch; discovery sweeps run an ID scan with the spec filter per target namespace, enqueue and dedupe returned ids, run an implicit first sweep after apply, and run subsequent sweeps on the configured interval.",
      "facts": [
        {
          "kind": "code",
          "literal": "output.kind",
          "chunkId": "udfs#writeback-and-discovery"
        },
        {
          "kind": "code",
          "literal": "spec.output.version",
          "chunkId": "udfs#writeback-and-discovery"
        },
        {
          "kind": "code",
          "literal": "{attribute}_v",
          "chunkId": "udfs#writeback-and-discovery"
        },
        {
          "kind": "code",
          "literal": "spec.filter",
          "chunkId": "udfs#writeback-and-discovery"
        },
        {
          "kind": "code",
          "literal": "target_namespace",
          "chunkId": "udfs#writeback-and-discovery"
        },
        {
          "kind": "code",
          "literal": "schedule.discovery_interval_seconds",
          "chunkId": "udfs#writeback-and-discovery"
        }
      ],
      "sources": [
        {
          "chunkId": "udfs#writeback-and-discovery",
          "url": "/docs/udfs#writeback-and-discovery",
          "anchor": "writeback-and-discovery"
        }
      ],
      "mode": "agent-primary",
      "terms": [
        "writeback",
        "discovery",
        "outputs",
        "patched",
        "onto",
        "target",
        "named",
        "attribute",
        "same",
        "semantics",
        "across",
        "kinds",
        "output",
        "version",
        "gateway",
        "atomically",
        "writes",
        "marker",
        "single",
        "patch",
        "sweeps",
        "scan",
        "spec",
        "filter",
        "namespace",
        "enqueue",
        "dedupe",
        "returned",
        "implicit",
        "first",
        "sweep",
        "after",
        "apply",
        "subsequent",
        "configured",
        "interval",
        "kind",
        "schedule",
        "seconds",
        "type"
      ]
    }
  ],
  "edges": []
}
```

---

# Introduction

Source: https://hevlayer.com/docs

import Diagram from "../../components/docs/Diagram.astro";
import { layerMapDiagram } from "../../lib/diagrams";

Layer provides a set of drop-in enhancements to your favorite retrieval
systems. Layer lets you scale your own compute over [multi-stage
pipelines](/docs/pipelines), reason about the state of your index, observe
clickstream, track cost, and more.

<Diagram label="hev layer 0.1 map">{layerMapDiagram}</Diagram>

You run two server components in your own cluster: a Rust **gateway** and a
Kubernetes **operator**. The **gateway** is a transparent proxy in front of
Turbopuffer. It extends
native clients with [fetch](/docs/api/query#fetch), [scans](/docs/scans),
[snapshots](/docs/api/snapshots), [result count](/docs/api/result-count), and
operator-facing semantics around the cache, write path, and
[pipelines](/docs/pipelines) — you swap in Layer's drop-in
client and change nothing else.
It also drives the function runtime: discovering [UDF](/docs/udfs) work, leasing
it to worker pools, retrying, and writing results back, with KEDA scaling each
pool to zero between bursts.

In addition to a set of [wire-compatible clients](/docs/install), Layer also
ships an optional GUI [dashboard](/docs/dashboard). The
dashboard manages cluster configuration through CRDs; all other state is
persisted in object storage (S3). No durable state lives in a Layer
process, so the compute tier is stateless and fully elastic.

Because indexing is bursty — especially GPU-bound work — our
[Terraform](/docs/install/terraform) installs
[Karpenter](https://karpenter.sh) as a cluster autoscaler to provision and
scale the nodes Layer's compute runs on. The remaining backing services are the
document cache, the indexing-state store, and the metrics store. Every component
Layer runs alongside is open source:

- **[Karpenter](https://karpenter.sh)** — cluster autoscaler that provisions
  and scales nodes for Layer's bursty, GPU-bound compute (Apache-2.0).
- **[Aerospike](https://aerospike.com)** — NVMe-backed ephemeral document
  cache (AGPL-3.0).
- **[PostgreSQL](https://www.postgresql.org)** — indexing-state store for the
  pipeline and embed queue (PostgreSQL License).
- **[VictoriaMetrics](https://victoriametrics.com)** — metrics store
  (Apache-2.0).

To get started, see the [install guide](/docs/install). For more technical
detail, see [Concepts](/docs/concepts), [Guarantees](/docs/guarantees), and
[Tradeoffs](/docs/tradeoffs).

---

# Concepts

Source: https://hevlayer.com/docs/concepts

## Control loops

Layer uses a control loop as a core primitive for managing your indexes. It reconciles index state against metrics emitted by the search system, which is how Layer applies row-level transformations ([UDFs](/docs/udfs)) and keeps an index's stable view current.

Related: [UDFs](/docs/udfs), [snapshots](/docs/api/snapshots), stable watermark.

## Kubernetes autoscaling

Because Layer is stateless, you can autoscale every tier independently. Karpenter handles node-level scaling, and KEDA scales pods against signals from an embedded PostgreSQL queue. The data in that queue is used for scaling decisions only — it carries no non-recoverable system state.

## Gateway enhancements

Where helpful, the gateway extends your search system with common query patterns and filtering primitives. Layer's enhancements use reserved `_hevlayer_*` attributes; changing the schema on those attributes breaks Layer's guarantees but should degrade gracefully. All functionality is exposed through a single client, so applications can route every call through the gateway — Layer works best when traffic flows through it consistently, even for requests that need no extra behavior.

## Scatter/gather

Layer can partition a single namespace into hash buckets — shards — by assigning each row a reserved `_hevlayer_shard` attribute (xxh64 of its id, modulo the shard count). The gateway then scatters a query to every bucket in parallel, one `_hevlayer_shard`-filtered query per shard, and gathers the results: it merges and re-ranks the combined rows down to your requested `top_k` before returning them. Sharding stays invisible to the client — you issue one query and get one ranked result set. The same scatter/gather path backs [result count](/docs/api/result-count), [scans](/docs/scans), and [UDF](/docs/udfs) discovery scans.

## Pull-through cache

Document [reads](/docs/api/query#fetch) are served by a pull-through cache: the gateway checks the NVMe-backed cache (Aerospike) first, and on a miss reads through to Turbopuffer — or S3 for snapshots — returns the row, and backfills the cache best-effort. The cache is a read accelerator, not a hard dependency: if it is unavailable, reads fall through to origin and still succeed. One logical cache serves every read path, with different uses (document fetch, snapshot field-values) separated by Aerospike `set`.

## Observability as code

Layer's observability contract is defined in the service itself. The gateway emits a self-describing [catalog](/docs/api/metrics) of every metric it exports — names, labels, and example PromQL — so the metric surface is code, not hand-maintained dashboard config. The bundled [dashboard](/docs/dashboard) and any external automation read from that catalog, and an embedded, Prometheus-compatible VictoriaMetrics instance lets you run [PromQL](/docs/api/metrics) against the series directly or bring your own monitoring stack.

## Glossary

| Concept | Current meaning |
| --- | --- |
| [Namespace](/docs/api/introduction) | A Turbopuffer namespace addressed through `/v2/namespaces/{namespace}`. |
| Document | A row id plus attributes, and optionally a vector when writing/searching. |
| Cache | NVMe-backed records keyed by namespace and document id, plus cache sets for pipeline chunks and snapshots. |
| Stable watermark | Epoch-ms cut tracked by the consistency watcher when Turbopuffer index status is up-to-date. |
| [Pipeline](/docs/pipelines) | A PostgreSQL-backed state machine for CPU extraction and GPU embedding work. |
| [Snapshot](/docs/api/snapshots) | A content-addressed S3 facet histogram written after a namespace is observed stable. |
| Facet listing | The distinct values for a configured snapshot field, surfaced as `fields[].values[].v`. |
| Facet count | The document count for a configured snapshot field value, surfaced as `fields[].values[].n`. |
| [Result count](/docs/api/result-count) | A synchronous ranked-query count over FTS or vector query input. |
| [Scan](/docs/scans) | A filter scan that returns matching IDs asynchronously or a matching row count synchronously. |
| [UDF](/docs/udfs) | A stateless container the gateway calls once per row of an index to compute a derived attribute. |
| Gateway | The Rust proxy fronting Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. |
| [Operator](/docs/kubernetes/operator) | The Kubernetes operator that reconciles Layer's CRDs — functions, pipelines, scaling, and cluster config. |
| Shard | A hash bucket within a single namespace. Each row carries a reserved `_hevlayer_shard` value (xxh64 of its id, modulo the shard count) so the gateway can scatter/gather a query across buckets. |
| CRD | Custom Resource Definition: the Kubernetes-native resources the operator reconciles — [functions](/docs/kubernetes/function-crd), [pipelines](/docs/kubernetes/pipeline-crd), [scaling](/docs/kubernetes/scaling-crd), and [indexes](/docs/kubernetes/index-crd). |
| PromQL | The Prometheus query language. The gateway proxies it to the embedded VictoriaMetrics so you can query [metrics](/docs/api/metrics) without a separate scraper. |

---

# Document model

Source: https://hevlayer.com/docs/document-model

A Layer document is a Turbopuffer row — an id, your attributes, and an optional vector — read and written through the [pull-through cache](/docs/concepts#pull-through-cache). Alongside your own schema, Layer reserves the `_hevlayer_*` attribute prefix for its own bookkeeping. The gateway manages these attributes: your writes and [UDF](/docs/udfs) outputs must not set them, and editing them directly breaks Layer's guarantees (the gateway degrades gracefully if they drift).

| Attribute | Type | Purpose |
| --- | --- | --- |
| `_hevlayer_upserted_at` | integer (epoch ms) | Server-stamped on every write. The gateway filters queries to `_hevlayer_upserted_at <= watermark` to hold the read-consistency cut while the upstream index catches up. |
| `_hevlayer_shard` | integer | Hash bucket assigned at write time (`xxh64(id) % shard_count`), present only on sharded namespaces. Lets the gateway [scatter/gather](/docs/concepts#scattergather) a query across the shards of one namespace. |

The `_hevlayer_` prefix also namespaces internal cache sets — snapshot field-values and search-history clickstream — but those are cache keys, not part of your document schema.

---

# No Guarantees

Source: https://hevlayer.com/docs/guarantees

import Callout from "../../components/docs/Callout.astro";

Layer can't offer guarantees. We try our best to provide secure, hands-off
infrastructure that you are ultimately responsible for. While we can't offer
guarantees, we make a set of promises in how we design, secure, and distribute
our software that we believe make it easy to use and will stand the test of
time. This page covers the specific status of those promises.

## Commitments

- Your index stays in your search system. We will not reimplement indexing. Layer keeps a copy of your data, but the search index lives in your vector store.
- Your history is backed up to S3. Search history and namespace snapshots are written to the S3 bucket you specify. The format of this data may change prior to v1.0.
- Data on NVMe. Customer document and chunk data is served from NVMe for price/performance. We try not to stray from this pattern, though some use cases may justify a smaller in-memory document cache.
- This documentation is accurate and up to date. When it isn't, that's a bug in the software — report it.
- Metrics and alerts are documented as code, and tested. The observability surface is versioned, reviewable, and covered by tests — not hand-rolled per deployment.
- Graceful degradation. We add graceful degradation support whenever possible — the gateway degrades rather than failing hard.
- Client compatibility. We will (almost) always stay client-compatible with the search systems we front. Where we diverge, it's a feature making an explicit tradeoff we believe is an improvement.

<Callout label="ai disclaimer">
Layer was developed by a single person orchestrating agentic coding tools and building automation. Not a single line of code was hand-written. That said, it was made with ❤️ by a human as much as it is built by AI.
</Callout>

---

# Tradeoffs

Source: https://hevlayer.com/docs/tradeoffs

Layer makes a set of design tradeoffs we believe improve functionality of the search engine. This page makes those tradeoffs explicit. As this list grows, we will offer configuration where possible to allow users to configure their preference.

Layer adds latency to the query path in the following ways.

- An additional network hop (<span style="color: var(--muted); font-weight: 500;">not configurable</span>).
- A query plan that allows for strongly consistent reads during heavy writes ([<span style="color: var(--signal); font-weight: 500;">index configurable</span>](/docs/kubernetes/index-crd)).

Layer also increases index storage requirements via.

- A secondary indexing for filtering by upsert time (<span style="color: var(--muted); font-weight: 500;">not configurable</span>).
- A secondary indexing used for scatter gather sharding (<span style="color: var(--muted); font-weight: 500;">not configurable</span>).

---

# Limits

Source: https://hevlayer.com/docs/limits

Layer is limited by certain constraints of the underlying components we ship with. We will lift these as demand increases.

- **Single-node Aerospike.** We enforce this for simplicity and also believe that a single large NVMe drive offers enough storage for almost every dataset.
- **~4,090 Turbopuffer namespaces.** We use Aerospike sets for logical separation of data, which are limited by the Aerospike Community Edition AGPL license.
- **~3 TB cache size.** Another limitation of the Aerospike license.
- **10,000 distinct values per scan facet field.** Pre-computed snapshot scans cap each facet field's cardinality. If a field exceeds the cap, it is noted in `fields_skipped[]` rather than `fields[]`, so readers can treat every emitted field as complete. See [snapshots](/docs/api/snapshots).

## No limits

These have no enforced ceiling, but practical limits exist and will surface under load.

- **CRD instances** (`Index`, `Function`, `Pipeline`, `Scaling`) — bounded only by the etcd and operator throughput of your Kubernetes cluster.
- **Snapshot history per namespace** — durable in S3, accumulates indefinitely; bounded by object storage cost.
- **Search history retention** — accumulates indefinitely in S3; no automatic expiry.
- **Clickstream event volume** — accumulates indefinitely in S3; no automatic expiry.
- **UDF concurrency per function** — KEDA scales replicas to match queue depth, bounded by your cluster's capacity.
- **Pipeline queue depth** — pipeline queues, including chunked document queues, store document IDs and chunk ID lists in S3 manifests and keep only segment state and counters in Postgres.
- **Document size and attribute count** — bounded by Turbopuffer and Aerospike record limits, not by Layer.

---

# Agents

Source: https://hevlayer.com/docs/agents

import Callout from "../../components/docs/Callout.astro";

These docs are queryable from the command line. The same engine behind the
`⌘K` search on this site ships as a CLI, so your coding agent can search,
read, and cite the Layer docs directly — no scraping, no MCP server, no API
key. Two commands wire it up.

## 1. Install the CLI

```sh
go install github.com/hev/ask/cmd/ask@latest
```

The binary is self-contained; any agent harness that can run a shell command
can use it.

## 2. Add the skill

For Claude Code, paste this once:

```sh
mkdir -p ~/.claude/skills/hevlayer-docs
cat > ~/.claude/skills/hevlayer-docs/SKILL.md <<'EOF'
---
name: hevlayer-docs
description: >-
  Query the hev layer docs. Use when the user asks about Layer — the
  Turbopuffer gateway, strong-consistent reads, the stable watermark,
  the pull-through document cache, warm jobs, scans, result count,
  snapshots, pipelines, UDFs, the Index/InfraRules/Pipeline/Function
  CRDs, compute pools, install via Terraform or Helm, failure modes,
  or the dashboard.
---

# hev layer docs

Answer Layer questions from the docs, not from memory. Every verb is a
keyless read:

    ask --endpoint https://hevlayer.com/api/ask search "<question>"
    ask --endpoint https://hevlayer.com/api/ask section get "<id>"
    ask --endpoint https://hevlayer.com/api/ask overview
    ask --endpoint https://hevlayer.com/api/ask glossary get "<term>"

Start with `search`; fetch sections for detail; use `overview` when you
need the full map. Section ids look like
`api/query#strong-consistent-reads`. Cite sections in your answer as
https://hevlayer.com plus the returned `url` field.

If `ask` is missing, install it:
`go install github.com/hev/ask/cmd/ask@latest`
EOF
```

Other harnesses: paste the body of that skill into your `AGENTS.md` — it is
plain instructions around a CLI, nothing Claude-specific.

## 3. Ask

```sh
ask --endpoint https://hevlayer.com/api/ask search "cache is down"
```

```json
{
  "results": [
    {
      "title": "Concepts",
      "heading": "Pull-through cache",
      "url": "/docs/concepts#pull-through-cache",
      "group": "Overview",
      "snippet": "Document reads are served by a pull-through cache: the gateway checks..."
    }
  ]
}
```

From here your agent typically runs `section get` on the winning id and
answers with the citation.

## The verbs

| Verb | Returns |
| --- | --- |
| `overview` | Orientation context plus the full section map with stable ids |
| `search "<query>"` | Ranked sections with snippets and deep links |
| `section get "<id>"` | One section: summary, exact identifiers, source URL |
| `glossary get "<term>"` | A product term resolved through its aliases (`watermark` → stable watermark) |

## Why answers stay grounded

Search runs over a committed, reviewable digest of these docs — the same
corpus, heading by heading, that renders on this site. Every anchor in it is
verified against the rendered pages in CI, so a cited deep link like
[/docs/api/query#strong-consistent-reads](/docs/api/query#strong-consistent-reads)
always resolves. When the docs change, the digest is rebuilt and recommitted
with them.

<Callout label="No key required">
  Every verb above is a read against the public docs. Nothing to sign up
  for, nothing to configure beyond the endpoint URL.
</Callout>

The docs are also available as plain text for direct ingestion:
[/llms.txt](/llms.txt) (index) and [/llms-full.txt](/llms-full.txt)
(full corpus). The CLI is the better path for agents that can run commands —
it ranks, resolves aliases, and costs a fraction of the tokens.

---

# Roadmap & Changelog

Source: https://hevlayer.com/docs/roadmap

## Up Next

- 🧷 Count and scan primitives — filter-count mode, snapshot truncation removal, route renames ([RFC 0019](https://github.com/hev/layer/blob/main/docs/rfcs/0019-count-and-scan-primitives.md), [#67](https://github.com/hev/layer/issues/67))
- 🚑 Indexing failure-mode E2E runbook — Aerospike stop-writes + Postgres pressure ([#55](https://github.com/hev/layer/issues/55))
- 🧬 Embedding UDF writeback via row re-upsert ([#52](https://github.com/hev/layer/issues/52))
- 🌱 Namespace init UDF for first-time embed population
- 🚦 Snapshot-aware ready signal — `layer.is_stable` honors UDF state ([#46](https://github.com/hev/layer/issues/46))
- 🎨 Full dashboard redesign — 6-tab layout from the prototype ([#11](https://github.com/hev/layer/issues/11))
- 🐚 `layer` CLI — kube-style resource access, queries, scans, and jobs over the gateway REST API

### Later

- 🔐 RBAC: scoped API keys and entitlements as a Layer primitive ([#8](https://github.com/hev/layer/issues/8))
- ♻️ Soft delete with TTL + restore ([#7](https://github.com/hev/layer/issues/7))
- 🪢 Hybrid text fusion — typo-tolerant search via per-token fuzzy + BM25 legs, fused by Turbopuffer-native RRF ([RFC 0022](https://github.com/hev/layer/blob/main/docs/rfcs/0022-hybrid-text-fusion.md), [#18](https://github.com/hev/layer/issues/18))
- ⌨️ Typeahead via Turbopuffer regex index ([#19](https://github.com/hev/layer/issues/19))
- 🕰️ Temporal queries — `as_of` selector for `/query`, `/scans`, `/fetch`, and `/snapshots` ([RFC 0020](https://github.com/hev/layer/blob/main/docs/rfcs/0020-temporal-queries.md), [#68](https://github.com/hev/layer/issues/68))
- 🌿 `copy_from_with_filter` — time travel + subset branching ([#20](https://github.com/hev/layer/issues/20))
- 🐇 Exact kNN result cache keyed by consistency watermark ([#21](https://github.com/hev/layer/issues/21))
- 🧪 A/B variant indexes with operator-controlled rollout ([#23](https://github.com/hev/layer/issues/23))
- 🦚 Per-query observability with LLM-judged Tail Quality ([#41](https://github.com/hev/layer/issues/41))
- 🎞️ Pipeline crash recovery via source replay + deterministic IDs ([#43](https://github.com/hev/layer/issues/43))
- ☠️ Paginated UDF dead-letter list ([#44](https://github.com/hev/layer/issues/44))
- 🏗️ Narrow cluster topology defaults ([#45](https://github.com/hev/layer/issues/45))
- 📣 Write amplification baselines ([#15](https://github.com/hev/layer/issues/15))
- 📮 `layer push` — Python UDF dev experience via Depot ([#64](https://github.com/hev/layer/issues/64))
- 💸 Cost API — AWS + Turbopuffer cost snapshots, timeseries, and rate card ([#35](https://github.com/hev/layer/issues/35))

## 0.1 Release (UAT)

### API hardening

- 🧩 Scaling CRD consolidation — `Pipeline`, `UDF`, `InfraRules` ([RFC 0012](https://github.com/hev/layer/blob/main/docs/rfcs/0012-crd-scaling-consolidation.md))
- 🎛️ `Index` CRD redesign ([RFC 0013](https://github.com/hev/layer/blob/main/docs/rfcs/0013-index-policy-surface.md))
- 📸 Snapshot scan naming conventions ([RFC 0014](https://github.com/hev/layer/blob/main/docs/rfcs/0014-snapshot-noun-scan-verb.md))
- 🧹 Remove unused APIs

### Lifecycle and operability

- 🎚️ [Autoscaling compute](/docs/kubernetes/scaling-crd) for pipelines and UDFs
- 🗄️ [Document cache endpoint](/docs/api/query#fetch) for building multi-stage pipelines
- 📸 [Index snapshot history](/docs/api/snapshots)
- 🧨 Coordinated delete
- ⛵ [Helm](/docs/install/helm) and [Terraform](/docs/install/terraform) install scripts

### Surfaces

- 🪟 [Dashboard MVP](/docs/dashboard) — basic CRD management and observability
- 🐍 Official Python SDK

### Search

- 🎯 Strongly consistent queries during heavy writes via [`_hevlayer_upserted_at`](/docs/guarantees)
- 🧮 [Result count](/docs/api/result-count) over FTS/vector queries via scatter/gather
- 📜 Precomputed facet listings in [snapshots](/docs/api/snapshots)
- 🪙 Precomputed facet counts in [snapshots](/docs/api/snapshots)
- 🪃 [Scans](/docs/api/scans) for filter IDs and filter counts not available in a snapshot
- 🆔 Search by id via document-cached vector
- 📰 [Search history](/docs/api/search-history) saved to S3
- 🗂️ Enhanced [namespace metadata](/docs/api/namespace-metadata)

---

# Install

Source: https://hevlayer.com/docs/install

import LinkGrid from "../../components/docs/LinkGrid.astro";

A hev layer install has two stages. **Terraform** provisions the required
AWS resources: IAM, S3, ECR, networking, cost-read roles, and, for the
recommended path, a fresh EKS cluster. **Helm** installs the gateway,
operator, and document cache into that cluster and wires them to the AWS
resources Terraform produced.

You can skip Terraform if you already have the AWS resources hev layer
needs. At minimum, provide an S3 bucket and gateway IRSA role for
snapshots and history. For the full operations surface, also provide
dashboard cost-read IAM, image registry locations, and cluster-level
components equivalent to the Terraform outputs.

<LinkGrid links={[
  { href: "/docs/install/terraform", label: "Terraform", description: "Provision IAM, S3, ECR, cost access, and the recommended EKS cluster for hev layer." },
  { href: "/docs/install/helm", label: "Helm Install", description: "Install the gateway, operator, and document cache into your cluster." },
  { href: "/docs/kubernetes/operator", label: "Operator Overview", description: "How the layer-operator reconciles Index, InfraRules, Pipeline, and Function CRDs." },
]} />

## What ships in 0.1

The 0.1 install is single-tenant: one Helm release per environment, one
Turbopuffer credential per release, one S3 bucket for snapshot and history
data. Multi-tenant gateway scoping is on the 0.2+ roadmap and is not
exposed at the install layer yet.

---

# Terraform

Source: https://hevlayer.com/docs/install/terraform

import Callout from "../../../components/docs/Callout.astro";

The Terraform configuration in `infra/terraform/` provisions the AWS
resources that the gateway and operator need. It is opinionated about
the resources hev layer needs to behave correctly and conservative about
resources around it. Route53 hosted zones and ACM certificates are
opt-in; most installs bring existing DNS and TLS.

## What it sets up

| Resource | Purpose |
| --- | --- |
| S3 bucket | Durable storage for namespace snapshots, search history, and clickstream events. |
| IAM roles + IRSA policies | Gateway S3 access, dashboard cost-read access, and worker/operator AWS access. |
| ECR repositories | Image registry for the gateway, operator, and customer-built function images. |
| EKS + VPC + node pools | Recommended fresh-cluster runtime for design partners. |
| Route53 + ACM | Optional DNS zones, records, and TLS certificates when `manage_public_dns=true`. |

## Cluster: recommended

Design-partner installs should use a fresh EKS cluster unless there is a
specific reason to bind hev layer to an existing one. The cluster path
provisions:

- a VPC with the subnets, NAT, and endpoints hev layer expects
- an EKS control plane and node groups
- Karpenter for node autoscaling
- the AWS Load Balancer Controller for ingress
- EFS for shared persistent volumes

<Callout label="Bring your own cluster">
If you already operate an EKS cluster, you can disable the cluster modules
and point hev layer at the existing cluster. You are still responsible for
the functional prerequisites: an S3 bucket for snapshots/history, gateway
IRSA that can read/write that bucket, dashboard IRSA for AWS cost and
pricing reads, image registry access, Karpenter or equivalent node
autoscaling for workers, and the AWS Load Balancer Controller if you use
public ingress.
</Callout>

<Callout label="Fresh cluster recommendation">
For design partners, deploy hev layer to a fresh cluster. It keeps worker
autoscaling, document-cache placement, and cost attribution isolated from
unrelated workloads while the 0.1 operating model settles.
</Callout>

## Cost notes

The Terraform is designed to deploy a cost-efficient AWS footprint with
autoscaling for on-demand indexing work. At rest, the fixed costs are
mostly EKS, NAT when private workers need third-party egress, and small
storage lines. Indexing bursts scale worker nodes up through Karpenter
and back down when queues drain.

Heavier search use cases may need more read-side infrastructure:
additional gateway replicas, larger document-cache nodes, or dedicated
node pools for steady read traffic. Contact hev layer for help sizing
read-heavy deployments.

## Outputs

Terraform emits the values the Helm chart needs to install: the S3
bucket name, gateway IRSA role ARN, dashboard cost-read role ARN, ECR
image URLs, and cluster metadata. Pass these into the Helm values file
described in [Helm Install](/docs/install/helm).

---

# Helm Install

Source: https://hevlayer.com/docs/install/helm

import Callout from "../../../components/docs/Callout.astro";

The Helm chart at `infra/helm/layer/` installs the gateway, operator, and
document cache into a cluster that already has the AWS resources from
[Terraform](/docs/install/terraform) or equivalent resources you manage.

## Required values

Most of the chart is opinionated defaults. In a typical install the only
value you must bring from outside the cluster is the Turbopuffer API key.

| Value | Required | Notes |
| --- | --- | --- |
| `turbopuffer.apiKey` | yes | Turbopuffer credential the gateway uses on every upstream request. |
| `gateway.image` | yes | Gateway image URL — Terraform emits this as an ECR output. |
| `gateway.apiKey` | yes | Bearer token clients send as `Authorization: Bearer …`. Chart render fails when blank, by design. |
| `s3.bucket` | yes | S3 bucket Terraform created for snapshots and history. |
| `serviceAccount.roleArn` | yes | IRSA role ARN that grants the gateway access to the S3 bucket. |
| `gateway.indexGc.enabled` | no | Enables namespace hard-delete cleanup of operator-discovered `Index` CRs. |
| `gateway.indexGc.indexNamespace` | no | Namespace containing `Index` CRs. Blank follows `operator.discovery.indexNamespace`, then the Helm release namespace. |
| `dashboard.serviceAccount.roleArn` | for cost tab | IRSA role ARN with AWS pricing, CloudWatch, and cost read access. |
| `ingress.host` | optional | Set when you want a public ingress; use your DNS/TLS or enable Terraform-managed Route53/ACM. |

<Callout label="Why TURBOPUFFER_API_KEY is the headline value">
Most other Helm inputs are wiring between resources the install
process already produced. The Turbopuffer API key is the one thing
hev layer can't generate for you — it's the credential you bring in
from your Turbopuffer account.
</Callout>

## Install

```sh
helm upgrade --install layer ./infra/helm/layer \
  --namespace layer --create-namespace \
  -f values.customer.yaml
```

The chart is not published to a public Helm repository in 0.1 — install
from the source path or from the chart artifact provided during
onboarding.

## What gets installed

- `layer-gateway` — Rust gateway for Turbopuffer-compatible routes, fetch,
  scans, snapshots, warm jobs, and pipeline state.
- `layer-operator` — reconciler for Index, InfraRules, Pipeline, and Function
  CRDs documented in [Kubernetes](/docs/kubernetes/operator).
- `layer-document-cache` — Aerospike-backed document cache, scale-to-zero
  by default.
- Supporting resources: service accounts, IRSA bindings, ingress, and
  CRDs.

---

# Failure Modes

Source: https://hevlayer.com/docs/failure-modes

## Read

If the gateway is down, your queries are down. The document cache is
stateless and can scale to zero with no disruption, and no other components
impact the read path.

## Write

The primary failure mode for writes is Aerospike stop-writes during a
multi-stage pipeline job. Staged documents stay warm in the cache but do
not contain vector data. If this data exceeds the Aerospike drive
allocation the system will stop accepting writes and your pipeline will
degrade to S3-backed chunk reads. The operator can restart Aerospike and the
document cache will be lost. Pipeline workers resume automatically: staged
chunk bodies are durable in S3, pending state is in PostgreSQL, and the
gateway refills Aerospike from S3 after reconnect.

The Helm document cache restarts automatically on stop-writes by default
(`documentCache.autoRestartOnStopWrites: true`) and clears its Aerospike
backing file on pod start (`documentCache.storage.resetOnStart: true`). That
makes a pod restart a valid stop-writes recovery action for the Layer-owned
cache. S3 and PostgreSQL must remain healthy; they are the durable recovery
boundary.

---

# Operator Overview

Source: https://hevlayer.com/docs/kubernetes/operator

`layer-operator` manages declarative state for your hev layer
deployment. It serves a few crucial functions — monitoring for changes
to your indexes and managing scaling. It does this through a set of
abstractions known as [custom resource definitions
(CRDs)](/docs/concepts#glossary).

The gateway handles the read and write path; the operator handles
everything that wants to be expressed as desired state in the cluster:
which indexes exist, how worker pools scale, and which stateless
functions run against which indexes.

## CRDs

The operator reconciles four resource kinds, each documented on its own
page:

- [Index CRD](/docs/kubernetes/index-crd) — one resource per Turbopuffer
  namespace the gateway should manage.
- [InfraRules CRD](/docs/kubernetes/scaling-crd) — cluster-wide
  compute pools, document cache rules, and shared scaling policy.
- [Pipeline CRD](/docs/kubernetes/pipeline-crd) — staged work that
  changes row count.
- [Function CRD](/docs/kubernetes/function-crd) — stateless user-defined
  functions that read and write attributes on an index.

## Relationship to the gateway

The gateway and the operator are decoupled. The operator reconciles
declarative state; the gateway serves the read and write path. Neither
sits in the other's hot path, so the gateway keeps serving even if the
operator is restarted or lagging.

The link between them is one-directional and read-only. For some
features the gateway reads CRD status — which indexes exist, which
worker pools are ready — to inform what it serves. It never writes to
the CRDs; declarative state is authored by you and reconciled by the
operator, and the gateway is only ever a reader of it.

## Scheduling and node pools

The operator is opinionated about where the workers it creates run. It
does not schedule Pipeline and Function pods onto general cluster
capacity — each compute pool pins to a dedicated, labeled node pool via
`nodeSelector` and `tolerations`, so CPU and GPU work land on the right
nodes and stay isolated from the rest of your cluster. The shipped
defaults assume [Karpenter](https://karpenter.sh) and select on the
`karpenter.sh/nodepool` label, but any labeled node pool works.

This is configured once on `InfraRules/default`, not per workload — see
[InfraRules](/docs/kubernetes/scaling-crd) for the compute-pool fields
and how Pipelines and Functions choose a pool.

---

# Index CRD

Source: https://hevlayer.com/docs/kubernetes/index-crd

An `Index` represents one namespace exposed through the gateway. It
declares the backend, snapshot policy, cache posture, consistency mode,
and access metadata.

```yaml
apiVersion: hevlayer.com/v1
kind: Index
metadata:
  name: products
  namespace: layer
spec:
  backend:
    kind: turbopuffer
    region: aws-us-east-1
    namespace: products
    distanceMetric: cosine_distance
  metadata:
    labels:
      app: shop
    tags:
      - catalog
  snapshot:
    interval: 5m
    retention: never
    facetFields:
      - category
      - brand
  cache:
    ttl: 24h
    capGiB: 64
    mode: standard
    warming:
      threads: 4
  consistency: strong
```

## Backend

| Field | Purpose |
| --- | --- |
| `backend.kind` | `turbopuffer` in the 0.1 runtime. |
| `backend.region` | Backend region identifier. |
| `backend.namespace` | Optional upstream namespace override. Defaults to the Index name. |
| `backend.distanceMetric` | Vector metric, default `cosine_distance`. |

## Snapshot policy

`snapshot.facetFields` is the user-facing source of fields the gateway
materializes into durable snapshots. `retention` defaults to `never` in
0.1 because automatic snapshot GC is not shipped yet.

## Cache policy

`cache.warming.threads` defaults to `4`. Aerospike remains an ephemeral
cache; durable snapshot history stays in S3.

## Status

The operator reports observed generation, snapshot scheduling metadata,
metadata sync state, and conditions.

---

# InfraRules CRD

Source: https://hevlayer.com/docs/kubernetes/scaling-crd

`InfraRules` is the cluster-scoped policy object for Layer-managed
runtime infrastructure. The 0.1 surface has exactly one object:
`InfraRules/default`.

Pipelines and Functions do not reference a separate autoscaling resource.
They set `spec.scaling` inline and choose a pool from
`InfraRules/default.spec.computePools`.

## InfraRules

```yaml
apiVersion: hevlayer.com/v1alpha1
kind: InfraRules
metadata:
  name: default
spec:
  computePools:
    - name: cpu
      kind: cpu
      maxReplicasPerWorkload: 20
      nodeSelector:
        karpenter.sh/nodepool: cpu
      tolerations: []
      resources:
        requests:
          cpu: "500m"
          memory: 512Mi
        limits:
          memory: 1Gi
  documentCache:
    capGiB: 256
    replicationFactor: 1
    scaling:
      mode: autoscale
      nodes:
        min: 1
        max: 3
```

The operator validates that the object is named `default`. Helm can
render the default object with `operator.infraRules.create=true`.

## Compute pools

| Field | Purpose |
| --- | --- |
| `name` | Referenced by `spec.scaling.pool` on Pipeline and Function resources. |
| `kind` | Pool class label such as `cpu` or `gpu`. |
| `gpuType` | Optional descriptive GPU type for GPU pools. |
| `nodeSelector` | Applied to worker pods that choose the pool. |
| `tolerations` | Applied to worker pods that choose the pool. |
| `resources` | Container resources applied to worker pods. |
| `maxReplicasPerWorkload` | Hard ceiling for one Pipeline or Function. |

If a workload names an unknown pool or asks for more replicas than the
pool ceiling, the operator leaves the workload unready and records a
condition on its status.

## Workload scaling

```yaml
scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 4
```

| Mode | Behavior |
| --- | --- |
| `autoscale` | Emit a KEDA `ScaledObject` and let queue depth scale the Deployment between `min` and `max`. |
| `fixed` | Set Deployment replicas to `replicas.min`; no KEDA object is emitted. |
| `disabled` | Scale the Deployment to 0; no KEDA object is emitted. |

Paused workloads also scale to 0. To keep a cold-start-heavy worker
warm, set `mode: autoscale` and `replicas.min: 1`.

## Document cache rules

`documentCache` captures the operator-owned document cache envelope:
capacity, replication factor, and node count. Helm still renders the
document-cache KEDA object directly in 0.1; `InfraRules` is the declared
policy shape the operator reports and validates against.

---

# Pipeline CRD

Source: https://hevlayer.com/docs/kubernetes/pipeline-crd

The `Pipeline` CRD declares worker-owned indexing work whose row count
can change between input and output: external ingestion, chunking, and
other fan-out stages. Use a [Function](/docs/kubernetes/function-crd)
when existing rows acquire a derived attribute without changing row
count.

Pipeline and Function resources share the same `spec.worker` and
`spec.scaling` envelopes. `InfraRules/default` owns placement and pool
limits; each workload chooses a pool.

```yaml
apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
  name: product-images
  namespace: layer
spec:
  target:
    namespace: products
  sourceRef:
    kind: sqs
    queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images
  worker:
    image: ghcr.io/hev/product-image-worker:latest
    batchSize: 64
    timeoutSeconds: 60
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 8
```

## Target

`spec.target.namespace` is the Turbopuffer namespace the pipeline writes.
The gateway pipeline API owns document state, chunks, and vector writes
for that target namespace.

## Source

`spec.sourceRef` is intentionally open JSON so operators can record the
external source that feeds the worker: SQS, Kafka, S3 events, a partner
API, or a one-off migration source. The operator passes it through as
declarative metadata; the worker image owns source-specific behavior.

## Worker

| Field | Purpose |
| --- | --- |
| `image` | Worker image. |
| `batchSize` | Work items per batch. |
| `timeoutSeconds` | Worker call timeout. |
| `podSpec` | Optional pod-level merge patch. |

The operator creates one Deployment per Pipeline.

## Scaling

```yaml
scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 8
```

`spec.scaling.pool` must name a pool in `InfraRules/default`.
`mode: autoscale` creates a KEDA `ScaledObject` backed by pipeline queue
depth. `mode: fixed` pins the Deployment to `replicas.min`; `mode:
disabled` scales it to zero.

`spec.paused: true` also scales the worker to zero.

## Status

The operator reports managed object references and readiness conditions.
Queue counts and worker progress are served by the gateway pipeline
status API.

---

# Function CRD

Source: https://hevlayer.com/docs/kubernetes/function-crd

The `Function` CRD declares row-preserving compute over an
[Index](/docs/kubernetes/index-crd). The operator creates worker
resources; the gateway owns discovery, queueing, retries, leases, and
writeback through the UDF API.

```yaml
apiVersion: hevlayer.com/v1alpha1
kind: Function
metadata:
  name: tag-products
  namespace: layer
spec:
  targetNamespaces:
    - products
  inputs:
    - id
    - title
  output:
    attribute: tags
    kind: tags
    version: v1
  filter:
    - "Or"
    - - ["tags_v", "NotEq", "v1"]
      - ["tags_v", "Eq", null]
  worker:
    image: ghcr.io/hev/tag-products:latest
    dispatch: pull
    batchSize: 32
    timeoutSeconds: 30
  schedule:
    discoveryIntervalSeconds: 300
    leaseSeconds: 120
    maxInFlightBatches: 8
    maxConcurrentScans: 1
  retry:
    maxAttempts: 8
    initialBackoffSeconds: 5
    maxBackoffSeconds: 300
  triggers:
    - discovery
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 6
```

## Selection

Use `targetNamespaces` for explicit namespaces. Use `indexSelector` when
labels on `Index` resources should choose the namespaces.

`filter` preserves arbitrary JSON, including array-form Turbopuffer
filters. The operator stores the shape as-is; the gateway evaluates it
during discovery.

## Worker

| Field | Purpose |
| --- | --- |
| `image` | Worker image. |
| `dispatch` | `pull` for SDK claim/poll workers, `push` for HTTP `/run` workers. |
| `port` | Push-dispatch service port. |
| `batchSize` | Rows per batch. |
| `timeoutSeconds` | Worker call timeout. |
| `podSpec` | Optional pod-level merge patch. |

Pull dispatch creates a Deployment. Push dispatch also creates a Service
and readiness probe.

## Scaling

Function scaling is inline under `spec.scaling`. The operator emits a
KEDA `ScaledObject` when `mode: autoscale`, using
`layer_udf_queue_depth` for the trigger.

The selected pool must exist in `InfraRules/default`. Replica maxima
above the pool's `maxReplicasPerWorkload` are rejected in status.

## Output

`output.kind: embedding` should set `output.dim` so consumers can
validate vector shape. Outputs are patched onto the target row through
the gateway.

Deleting a Function garbage-collects operator-managed Kubernetes
resources. It does not delete already-written attributes.

---

# Introduction

Source: https://hevlayer.com/docs/api/introduction

import Upstream from "../../../components/docs/Upstream.astro";

Layer matches the Turbopuffer wire contract so existing clients keep
working when you point them at the gateway. Where a route has an upstream
equivalent, the site documents what Layer adds — not the upstream
behavior itself. Follow the **Upstream docs** link on each page for the
underlying request/response shape.

## Install

The Python SDK is generated from `apps/layer-gateway/openapi.yaml` and
ships the typed async client (`AsyncHevlayer`).

```sh
pip install hevlayer
```

Requires Python 3.11+. The SDK reads connection info from environment
variables:

| Variable | Purpose |
| --- | --- |
| `LAYER_GATEWAY_URL` | Base URL of the gateway. |
| `LAYER_GATEWAY_API_KEY` | API key sent on every request. |
| `TURBOPUFFER_API_KEY` | Optional direct fallback key for Turbopuffer-compatible SDK calls when the gateway is unreachable. |
| `TURBOPUFFER_API_URL` | Optional direct fallback base URL; defaults to `https://aws-us-east-1.turbopuffer.com`. |

Languages beyond Python are generated on demand through the SDK harness;
reach out if you need one that isn't shipped yet.

## Client fall-through

The Python SDK can fall through to Turbopuffer direct when the gateway is
unreachable. The fallback is limited to calls that can be satisfied without
Layer state: simple vector queries and raw Turbopuffer-compatible methods
such as `write_namespace`, `query_turbopuffer_namespace`, and namespace
schema/listing calls. It emits a client log warning and sets
`LayerPerf.fallback` to `turbopuffer_direct` when `with_perf=True`.

Fetches, warm jobs, pipelines, UDFs, `nearest_to_id` queries, and other
Layer-only workflows still fail fast because they depend on gateway-owned
cache, queue, history, or consistency state. Set
`fallback_to_turbopuffer=False` on `AsyncHevlayer` to disable direct
fallback.

## Enhancements to upstream routes

Each of the routes below is wire-compatible with Turbopuffer. The body
of each section describes only what Layer overlays on top.

### Write — `POST /v2/namespaces/{ns}` and `PATCH /v2/namespaces/{ns}`

<Upstream href="https://turbopuffer.com/docs/write">
Upstream contract for upsert, delete, and `patch_rows`.
</Upstream>

- Best-effort NVMe cache mirror before the upstream write.
- Server-stamped `_hevlayer_upserted_at` on every upsert and patch, which powers
  the consistency watermark on the query path.
- `_hevlayer_*` attributes are reserved — writes to them are rejected.

Page: [Write](/docs/api/write).

### Query — `POST /v2/namespaces/{ns}/query`

<Upstream href="https://turbopuffer.com/docs/query">
Upstream contract for vector and FTS queries — request shape, ranking,
filters, attribute selection.
</Upstream>

- Strong-consistent reads via an injected `_hevlayer_upserted_at <= watermark`
  predicate while the upstream index is `updating`.
- One-shot 429 retry with the watermark filter forced on, for queries
  that race a write storm.
- `stable_as_of` echoed on every response so callers can correlate
  freshness across reads.

Page: [Query](/docs/api/query).

### Metadata — `GET /v2/namespaces/{ns}/metadata`

<Upstream href="https://turbopuffer.com/docs/metadata">
Upstream contract for namespace metadata — schema, row count, index
status, timestamps.
</Upstream>

- Proxied upstream verbatim, then enriched with a `layer` block
  containing `stable_as_of` and `is_stable`.

Page: [Namespace metadata](/docs/api/namespace-metadata).

### Cache warm hint — `GET /v1/namespaces/{ns}/hint_cache_warm`

<Upstream href="https://turbopuffer.com/docs">
Upstream contract for the cache warm hint.
</Upstream>

- Forwards the hint upstream, then runs Layer-side warm steps:
  a warm job to backfill the NVMe cache from origin, plus a mirror of
  the latest S3 snapshot body into NVMe.
- Each step is independently toggleable per request.

Page: [Warm cache](/docs/api/warm-cache).

## Cross-cutting conventions

These apply to every endpoint Layer proxies, whether the route is
upstream-compatible or Layer-only.

- **Server-stamped `_hevlayer_upserted_at`.** Every upsert and patch is stamped
  with a server-side epoch-ms watermark. Caller-supplied values are
  silently overwritten.
- **`_hevlayer_*` reserved.** Document attributes prefixed with
  `_hevlayer_` are reserved for the proxy layer. Writing to them is a
  validation error; reading them is fine when explicitly requested.
- **Hard vs soft failures.** Turbopuffer write/query failures are hard
  failures and surface as 5xx. NVMe cache failures are soft and never
  block the response.
- **`x-layer-cache` header.** Fetch responses include `hit`, `miss`, or
  `miss-on-error` so callers can distinguish a cold cache from an
  outage.
- **Consistency hints.** Reads that go through the watermark path
  include `stable_as_of`; queries omit it only on a cold-start gateway
  that has not yet observed a stable poll.

## Compatibility posture

Layer aims to be a drop-in for existing Turbopuffer clients. Routes that
the upstream does not implement are namespaced under `/v2/` and do not
shadow upstream behavior. If a Turbopuffer client sends a request to a
route Layer doesn't proxy, the gateway returns 404 — it does not
silently re-route to an upstream that might handle it differently.

---

# Write & Stage

Source: https://hevlayer.com/docs/api/write

import Upstream from "../../../components/docs/Upstream.astro";

<Upstream href="https://turbopuffer.com/docs/write">
The write path is wire-compatible with the upstream `POST /v2/namespaces/{ns}` endpoint. The shape below shows what Layer adds — see the upstream docs for the full request schema.
</Upstream>

## Upsert and delete

```http
POST /v2/namespaces/products
Content-Type: application/json

{
  "upserts": [
    {
      "id": "asin-B08N5WRWNW",
      "vector": [0.0012, -0.043],
      "attributes": {"title": "Wireless headphones", "category": "Electronics"}
    }
  ],
  "deletes": ["asin-old-001"]
}
```

Status semantics:

- 200 OK once the upstream Turbopuffer write succeeds.
- 422 when both `upserts` and `deletes` are empty.
- 502 when the upstream write/delete fails.

NVMe cache writes happen *before* the upstream call as a best-effort
side effect. They never block the response — they're how Layer keeps
reads fast through the cache.

Every upsert is server-stamped with a hidden `_hevlayer_upserted_at` attribute
(epoch milliseconds). Any caller-supplied value is overwritten — this
stamp powers the consistency watermark on the [query](/docs/api/query)
path.

## Patch

```http
PATCH /v2/namespaces/products
Content-Type: application/json

{
  "patches": [
    {"id": "asin-B08N5WRWNW", "attributes": {"category": "Audio"}}
  ]
}
```

Patch preserves unspecified attributes and maps to Turbopuffer
`patch_rows`. Vectors cannot be patched — update a vector by reading the
row and upserting the full document.

`_hevlayer_upserted_at` is bumped on every patch, so reads through the watermark
filter see the patched row only after it's indexed.

## Pipeline stage

When a document is part of a pipeline, the writer doesn't talk to the
namespace directly. The CPU worker hands chunks off to the pipeline, the
GPU worker writes vectors back, and the gateway is the one calling the
namespace upsert.

```http
PUT /v2/pipelines/product-images/documents/asin-B08N5WRWNW
Content-Type: application/json

{
  "chunks": [
    {"id": "asin-B08N5WRWNW-0", "text": "Wireless noise-cancelling headphones"},
    {"id": "asin-B08N5WRWNW-1", "text": "40-hour battery life", "metadata": {"page": 2}}
  ]
}
```

Staging stores chunks in the NVMe cache and marks the document `pending`.
Re-staging the same document ID replaces the chunks and resets state to
`pending`. The full pipeline API is documented under
[Pipelines](/docs/pipelines).

## Side effects

| Side effect | Behavior |
| --- | --- |
| NVMe cache mirror | Best-effort, written before the upstream call. A failure here doesn't roll back; the gateway can briefly cache a doc that didn't reach the upstream index. Re-sending the upsert resolves it. |
| Snapshot watcher | Re-evaluates freshness on the next poll. Stable namespaces materialize a new snapshot if the histogram shape changed (see [Snapshots](/docs/api/snapshots)). |

---

# Query & Fetch

Source: https://hevlayer.com/docs/api/query

import Upstream from "../../../components/docs/Upstream.astro";

<Upstream href="https://turbopuffer.com/docs/query">
Query is wire-compatible with the upstream `POST /v2/namespaces/{ns}/query` endpoint. The request schema (vector, filters, ranking, attribute selection) is documented upstream. The shape below is what Layer adds on top.
</Upstream>

## Query request

```http
POST /v2/namespaces/products/query
Content-Type: application/json

{
  "vector": [0.0012, -0.043],
  "top_k": 10,
  "filters": ["category", "Eq", "Electronics"],
  "include_attributes": ["title", "category"]
}
```

```json
{
  "results": [
    {"id": "asin-B08N5WRWNW", "dist": 0.42, "attributes": {"title": "..."}}
  ],
  "stable_as_of": 1715600400000
}
```

## Strong-consistent reads

Turbopuffer indexes upserts asynchronously, so a naive query right after
an upsert can return partial results or 429 entirely under streaming-write
pressure. Layer sidesteps both:

1. Queries run at `consistency=eventual` upstream, so they never block on
   indexing.
2. A background loop polls each registered namespace's `index.status` and
   records the latest status plus, when stable, a watermark equal to
   `poll_start - safety_margin`.
3. Per-query decision:
   - `Updating` → inject a hidden `_hevlayer_upserted_at <= watermark` predicate
     so the read never sees partially-indexed rows.
   - `Stable` or `Unknown` → run without the predicate. The upstream
     index is caught up (or no contrary evidence exists).
4. On a 429 to an unfiltered query, Layer retries once with the watermark
   filter forced on.

Responses always report `stable_as_of` (epoch ms) — the most recent
watermark the watcher has recorded. Omitted on a cold-start gateway that
has not yet observed a stable poll.

## Filter shape

```
["category", "Eq", "Electronics"]                # leaf
["And", [["category", "Eq", "Electronics"],
         ["price", "Lte", 200]]]                 # conjunction
["Or",  [...]]                                   # disjunction
```

Filter shape follows Turbopuffer array syntax. Layer combines the
caller's filter with the watermark predicate using a 2-element `And`
automatically — callers never see `_hevlayer_upserted_at` in their request or
response.

## Tunables

| Variable | Default | Purpose |
| --- | --- | --- |
| `CONSISTENCY_POLL_INTERVAL_MS` | 1000 | How often the watcher polls each namespace. |
| `CONSISTENCY_SAFETY_MARGIN_MS` | 500 | Cushion between poll time and watermark to cover in-flight upserts. |

## Explain query

```http
POST /v2/namespaces/products/explain_query
```

`explain_query` is proxied to Turbopuffer verbatim — Layer adds nothing
and applies no watermark filter. Use it to inspect upstream query
planning; see the [upstream docs](https://turbopuffer.com/docs) for the
request and response shape.

## Fetch

Fetch is a Layer-only surface — there is no upstream equivalent. The NVMe
cache is checked first; on miss or error the gateway falls through to
Turbopuffer and backfills the cache best-effort.

### Single fetch

```http
GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category
```

| Outcome | Status | Header |
| --- | --- | --- |
| Cached hit | 200 | `x-layer-cache: hit` |
| Cache miss, upstream hit, cache backfilled | 200 | `x-layer-cache: miss` |
| Cache unavailable, upstream hit | 200 | `x-layer-cache: miss-on-error` |
| Missing from both layers | 404 | — |

### Batch fetch

```http
POST /v2/namespaces/products/documents
Content-Type: application/json

{
  "ids": ["asin-1", "asin-2", "asin-3"],
  "include_attributes": ["title"]
}
```

```json
{
  "documents": [
    {"id": "asin-1", "attributes": {"title": "..."}},
    {"id": "asin-3", "attributes": {"title": "..."}}
  ],
  "missing": ["asin-2"]
}
```

Batch fetch returns found documents and missing ids inline instead of a
partial 404. `documents` preserves request order; ids the gateway could
not find anywhere land in `missing`.

### Behavior matrix

| Cache state | Single fetch | Batch fetch |
| --- | --- | --- |
| Hit | cache | cache |
| Miss, upstream present | upstream + backfill | upstream + backfill |
| Miss, upstream absent | 404 | inline `missing` |
| Cache unavailable | upstream, `miss-on-error` | upstream, `miss-on-error` |

---

# Namespace metadata

Source: https://hevlayer.com/docs/api/namespace-metadata

import Upstream from "../../../components/docs/Upstream.astro";

<Upstream href="https://turbopuffer.com/docs/metadata">
The metadata payload is proxied verbatim from the upstream `/v2/namespaces/{ns}/metadata` endpoint. Schema, row counts, index status, and timestamps follow the upstream contract. Layer adds a single sub-object on top.
</Upstream>

## Request

```http
GET /v2/namespaces/products/metadata
```

```jsonc
{
  // Proxied from Turbopuffer verbatim
  "schema": { },
  "approx_row_count": 12500,
  "approx_logical_bytes": 48800000,
  "created_at": "2026-03-15T10:30:45Z",
  "updated_at": "2026-05-12T18:49:00Z",
  "last_write_at": "2026-05-12T18:48:30Z",
  "index": { "status": "up-to-date" },

  // Layer enhancement
  "layer": {
    "stable_as_of": 1715600400000,
    "is_stable": true
  }
}
```

## The `layer` block

| Field | Meaning |
| --- | --- |
| `stable_as_of` | Epoch-ms watermark from the most recent stable poll. Null on cold start before the watcher has observed a stable namespace. |
| `is_stable` | Whether the most recent poll observed `index.status == "up-to-date"`. False on cold start, true once the watcher catches up. |

`is_stable` is the *current* signal — it drives the per-query filter-skip
decision on the query path. `stable_as_of` is the *historical* watermark
— the cut a filtered query would apply.

For snapshot history derived from these freshness signals, see
[Snapshots](/docs/api/snapshots).

## List namespaces

`GET /v2/namespaces` is a Layer-only augmented listing. It pages the
upstream namespace list and enriches each row with the same freshness and
cache signals surfaced above. It is the surface the dashboard's inventory
view reads.

```http
GET /v2/namespaces?prefix=prod&page_size=100
```

```jsonc
{
  "namespaces": [
    {
      "name": "products",
      "row_count": 12500,
      "size_bytes": 48800000,
      "stable_as_of_ms": 1715600400000,
      "is_stable": true,
      "cache_state": {"state": "warm", "warm_inflight": false},
      "last_write_ms": 1715600399000,
      "shadow": false,
      "labels": {}
    }
  ],
  "next_cursor": "..."
}
```

| Query param | Purpose |
| --- | --- |
| `prefix` | Restrict to namespaces whose name starts with this string. |
| `cursor` | Pagination cursor from a prior `next_cursor`. |
| `page_size` | Page size; the upstream list page is capped at 1000. |

A per-row metadata failure degrades to a row with `metadata_error` set
rather than dropping the namespace, so the list stays complete even when a
single namespace's metadata call fails. Responses are served from a
short-TTL cache (`NAMESPACE_LIST_CACHE_TTL_MS`, default `10000`) so
dashboard polling does not fan out a metadata call per namespace per
refresh.

---

# Scan

Source: https://hevlayer.com/docs/api/scans

Scans iterate a namespace by filter. `mode: ids` creates an asynchronous
job and returns IDs through a results route. `mode: count` returns one
number synchronously.

## Routes

| Route | Method | Behavior |
| --- | --- | --- |
| `POST /v2/namespaces/{ns}/scans` | POST | Create an ID scan job or return a count. |
| `GET /v2/namespaces/{ns}/scans` | GET | List ID scan jobs for the namespace. |
| `GET /v2/namespaces/{ns}/scans/{id}` | GET | Read one ID scan job. |
| `GET /v2/namespaces/{ns}/scans/{id}/results` | GET | Read completed scan IDs. |
| `DELETE /v2/namespaces/{ns}/scans/{id}` | DELETE | Drop the in-memory scan job. |

## ID Mode

```http
POST /v2/namespaces/products/scans
Content-Type: application/json

{
  "source": "auto",
  "mode": "ids",
  "filters": ["category", "Eq", "Electronics"],
  "page_size": 1000
}
```

`mode` defaults to `ids`. Valid ID-mode sources are `auto`, `cache`, and
`origin`.

The create response is `202 Accepted`:

```json
{
  "id": "scan-uuid",
  "namespace": "products",
  "source": "auto",
  "effective_source": "origin",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "created_at": "2026-05-26T10:00:00Z"
}
```

Read IDs after `status` is `completed`:

```http
GET /v2/namespaces/products/scans/scan-uuid/results?limit=1000&offset=0
```

```json
{
  "ids": ["doc-1", "doc-2"],
  "total": 2
}
```

## Count Mode

```http
POST /v2/namespaces/products/scans
Content-Type: application/json

{
  "mode": "count",
  "source": "auto",
  "filters": ["category", "Eq", "Electronics"],
  "timeout_seconds": 30
}
```

```json
{
  "count": 4210,
  "served_by": "snapshot",
  "snapshot_sha": "3f9e8b21",
  "watermark_ms": 1747300000123,
  "elapsed_ms": 3
}
```

Count-mode sources are `auto`, `snapshot`, `cache`, and `origin`.
Snapshot reads are eligible only for a single leaf `Eq` or `In` filter
on a field present in the latest snapshot `fields[]`. `And`, `Or`,
`Not`, range operators, fields absent from the snapshot, and skipped
fields fall through under `auto` and fail with `412 precondition_failed`
under `source: snapshot`.

Live count responses include:

```json
{
  "count": 4210,
  "served_by": "origin",
  "bounded": false,
  "timed_out": false,
  "shards_saturated": 0,
  "shards_total": 1,
  "elapsed_ms": 42
}
```

## Auto-Mode Policy

Auto ties cache freshness to the same consistency watermark used by
strong-consistent queries. The gateway tracks per-namespace
`cache_warmed_through`, the watermark observed at the end of the last
successful origin warm.

| Cache state | Watermark state | Action |
| --- | --- | --- |
| Empty | any | Run origin and stamp `cache_warmed_through`. |
| Populated, `cache_warmed_through >= watermark` | observed | Serve cache. |
| Populated, `cache_warmed_through < watermark` | observed | Serve cache and start a background origin warm. |
| Populated, no `cache_warmed_through` yet | observed | Serve cache and start a background origin warm. |
| Populated | not yet observed | Serve cache. |

When cache is used, `_hevlayer_upserted_at <= cache_warmed_through` is added
before the user filter so the scan is a stable warmed view.

---

# Result Count

Source: https://hevlayer.com/docs/api/result-count

Result count answers "how many rows match this ranked query?" It is
separate from [scan count](/docs/api/scans), which counts rows matching a
filter.

```http
POST /v2/namespaces/products/result-count
Content-Type: application/json

{
  "query": {"field": "title", "fts": "wireless headphones"},
  "filters": ["category", "Eq", "Electronics"],
  "mode": "bounded",
  "timeout_seconds": 30
}
```

```json
{
  "count": 4210,
  "bounded": false,
  "timed_out": false,
  "shards_saturated": 0,
  "shards_total": 1,
  "elapsed_ms": 42
}
```

| Shape | Required fields | Notes |
| --- | --- | --- |
| FTS | `field`, `fts` | BM25 query against a BM25-indexed field. |
| Vector | `vector`, `max_distance` | `max_distance` is required; without an upper bound every row matches. `field` defaults to `vector`. |

| Mode | Behavior |
| --- | --- |
| `bounded` | One scatter/gather. Saturated shards contribute their `top_k` as a lower bound. |
| `exhaustive` | Recurses through saturated shards until every page is short or the request deadline expires. |

Every call carries a deadline, default 30s and server-side max 300s. On
timeout the partial count is returned with `bounded: true` and
`timed_out: true`.

---

# Warm cache

Source: https://hevlayer.com/docs/api/warm-cache

import Upstream from "../../../components/docs/Upstream.astro";
import Callout from "../../../components/docs/Callout.astro";

Layer exposes two warm surfaces. `hint_cache_warm` is the
Turbopuffer-compatible hint; `warm` is the Layer-only shortcut that
creates a gateway warm job.

<Upstream href="https://turbopuffer.com/docs">
`GET /v1/namespaces/{ns}/hint_cache_warm` matches Turbopuffer's warm-cache hint. The upstream call advises the index to pre-load. Layer additionally runs cache-warm steps on the gateway side.
</Upstream>

## Hint-cache warm

```http
GET /v1/namespaces/products/hint_cache_warm
```

Layer-side steps (all default-on):

| Step | What it does |
| --- | --- |
| `turbopuffer=true` | Forwards the warm hint upstream. |
| `documents=true` | Starts an origin warm job to backfill the NVMe cache. |
| `snapshots=true` | Mirrors the latest S3 snapshot body into NVMe. |

Disable steps independently:

```http
GET /v1/namespaces/products/hint_cache_warm?turbopuffer=false&documents=false&snapshots=true
```

The response reports per-step status. If `documents` is enabled, the
response includes a warm job; poll it through `/warm-jobs/{id}`.

## Layer warm

`POST /v2/namespaces/{ns}/warm` creates an asynchronous job that pages
through Turbopuffer, backfills Aerospike, and refreshes
`cache_warmed_through`. Use it when bootstrapping a namespace whose data
was written outside the gateway.

```http
POST /v2/namespaces/products/warm?page_size=1000
```

The response is `202 Accepted` with the warm job:

```json
{
  "id": "warm-job-uuid",
  "namespace": "products",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "created_at": "2026-05-26T10:00:00Z"
}
```

Poll it through:

```http
GET /v2/namespaces/products/warm-jobs/warm-job-uuid
```

## Cache-cold behavior

<Callout label="cache cold">
Warm jobs, cache scans, cache snapshot jobs, and pipeline chunk reads return 503 `cache_cold` when
the NVMe cache is unavailable. Fetch and fetch-many fall through to
Turbopuffer with `x-layer-cache: miss-on-error` instead.
</Callout>

The split is deliberate. Fetch is correctness-first: a cache outage must
not turn into a missing document. Warm is throughput-first: warming on a
cold cache would be wasted work, so the gateway surfaces the cold state
to the caller rather than silently no-op-ing.

---

# Snapshot History

Source: https://hevlayer.com/docs/api/snapshots

Snapshots are materialized facet histograms for a namespace. They carry
facet listings in `values[].v` and facet counts in `values[].n`, stored
durably in S3 and mirrored into Aerospike for the latest body.

Use `POST /snapshots` to materialize a field now. Use history and body
routes to read the durable chronology written by the consistency watcher.

## Configure watched fields

The consistency watcher only materializes snapshots for facet fields it
has been told to watch. Configure them with the `LAYER_FACET_FIELDS`
environment variable — a JSON object mapping each namespace to its facet
fields (Helm chart: `gateway.facetFields`):

```sh
export LAYER_FACET_FIELDS='{
  "products": ["category", "brand"],
  "reviews": ["sentiment", "language"]
}'
```

The default is empty, which disables the snapshot writer: with no watched
fields, `source: stored` and eligible `source: auto` jobs have nothing to
read, and the history and activity feeds stay empty. Namespaces discovered
through `GET /v2/namespaces` are registered with the watcher automatically,
but only the fields listed here are materialized. `LAYER_SNAPSHOT_MIN_INTERVAL_MS`
(default `300000`) sets the floor between writes for a namespace.

## Routes

| Route | Method | Behavior |
| --- | --- | --- |
| `POST /v2/namespaces/{ns}/snapshots` | POST | Create an on-demand snapshot job for one field. |
| `GET /v2/namespaces/{ns}/snapshot-jobs` | GET | List in-memory snapshot jobs. |
| `GET /v2/namespaces/{ns}/snapshot-jobs/{id}` | GET | Read one snapshot job. |
| `GET /v2/namespaces/{ns}/history` | GET | Newest-first durable snapshot history. |
| `GET /v2/namespaces/{ns}/snapshots/{sha}` | GET | Full snapshot body by full SHA or 7-char prefix. |
| `GET /v2/activity/snapshots` | GET | Cross-namespace snapshot-write activity stream. |

## Create a snapshot job

```http
POST /v2/namespaces/products/snapshots
Content-Type: application/json

{
  "field": "category",
  "source": "auto",
  "filters": ["brand", "Eq", "Acme"],
  "page_size": 1000
}
```

Valid sources are `auto`, `stored`, `cache`, and `origin`.

| Source | Reads from | Notes |
| --- | --- | --- |
| `auto` | Stored snapshot when possible, otherwise cache/origin policy | Default. Stored snapshots only support unfiltered configured fields. |
| `stored` | Latest S3 snapshot body, with Aerospike mirror as a cache | Fastest path for configured facet fields. |
| `cache` | Aerospike document cache | Supports filters the cache can evaluate. |
| `origin` | Turbopuffer paginated scan | Authoritative. Persists the computed snapshot body to S3. |

The response is `202 Accepted`:

```json
{
  "id": "snapshot-job-uuid",
  "namespace": "products",
  "field": "category",
  "source": "auto",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "created_at": "2026-05-26T10:00:00Z"
}
```

Poll the job:

```http
GET /v2/namespaces/products/snapshot-jobs/snapshot-job-uuid
```

Completed jobs include `sha` when a body was materialized:

```json
{
  "id": "snapshot-job-uuid",
  "namespace": "products",
  "field": "category",
  "source": "origin",
  "status": "completed",
  "documents_scanned": 12844,
  "sha": "3f9e8b21",
  "stable_as_of": 1747300000123
}
```

## History

```http
GET /v2/namespaces/products/history?limit=20
```

```json
[
  {"watermark_ms": 1747300000123, "sha": "3f9e8b21..."},
  {"watermark_ms": 1747299600045, "sha": "a1c5b09f..."}
]
```

| Query param | Default | Purpose |
| --- | --- | --- |
| `limit` | 50 | Maximum entries returned. Capped at 500. |
| `before` | none | Return entries older than this SHA. 7-char prefixes are accepted. |

The history endpoint lists S3 keys only; it does not read every snapshot
body.

## Snapshot body

```http
GET /v2/namespaces/products/snapshots/3f9e8b2
```

```json
{
  "namespace": "products",
  "watermark_ms": 1747300000123,
  "sha": "3f9e8b21",
  "fields": [
    {
      "name": "category",
      "values": [
        {"v": "books", "n": 1240},
        {"v": "electronics", "n": 873}
      ]
    }
  ],
  "fields_skipped": [
    {
      "name": "tags",
      "reason": "exceeded_cap",
      "distinct_observed": 247000,
      "cap": 10000
    }
  ]
}
```

`fields[].values[].v` is the facet listing. `fields[].values[].n` is the
facet count. Fields present in `fields[]` are complete. Fields above the
10,000 distinct-value cap are listed in `fields_skipped[]` instead of
being partially materialized.

## Activity

```http
GET /v2/activity/snapshots?since=1747200000000&limit=50
```

| Query param | Required | Purpose |
| --- | --- | --- |
| `since` | yes | Epoch-ms lower bound on `ts_ms`. |
| `limit` | no | Cap 500, default 50. |
| `namespace` | no | Exact namespace filter. |
| `cursor` | no | Pagination cursor from `next_cursor`. |

Activity is snapshot lifecycle only. Search history and clickstream
events have separate feeds.

---

# Query History

Source: https://hevlayer.com/docs/api/search-history

Layer logs every query the gateway serves into a durable JSONL trail in
S3, mirrored into the NVMe cache for fast recent reads. Fetch events
that downstream consumers tag back to a query land in a sibling
clickstream feed. Together they make a search session reconstructable
after the fact — for relevance tuning, A/B comparison, or incident
review.

Both surfaces are Layer-only.

## Routes

| Route | Behavior |
| --- | --- |
| `GET /v2/namespaces/{ns}/search-history` | Per-namespace query log, newest first. |
| `GET /v2/namespaces/{ns}/clickstream` | Fetch events correlated to a search, newest first. |

The `/v1/` versions of both routes are identical aliases held for client
compatibility.

## Search history entry

```json
{
  "entries": [
    {
      "timestamp": "2026-05-22T08:00:00.000Z",
      "timestamp_nanos": 1747900800000000000,
      "namespace": "products",
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "raw_query": "wireless headphones",
      "stable_as_of": 1747900700000,
      "query": {"vector": "[…]", "top_k": 10, "filters": "[…]"},
      "top_result_ids": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
      "tags": ["app:hev-shop", "route:search", "surface:storefront"]
    }
  ],
  "next_cursor": "1747900799000000000"
}
```

| Field | Meaning |
| --- | --- |
| `timestamp` / `timestamp_nanos` | Wall-clock and nanosecond timestamps. `timestamp_nanos` is the pagination cursor. |
| `trace_id` | Trace context propagated or generated for the query. Joins to the clickstream feed. |
| `raw_query` | Caller-supplied query string from the `x-hevlayer-search-query` header (e.g. the BM25 input). Omitted when the header is absent. |
| `stable_as_of` | Epoch-ms namespace watermark used by the served response. Omitted on cold-start gateways before the namespace has a watermark. |
| `query` | Structured query summary — vector shape, filters, ranking. |
| `top_result_ids` | IDs from the served response, in rank order. |
| `tags` | Caller-supplied labels propagated through request headers. Used for ad-hoc segmentation. |

### Writing metadata

Set `x-hevlayer-search-query` on query requests to capture the human
input, and set `x-hevlayer-tags` to a comma-separated list of
segmentation tags. The Python SDK exposes these as `raw_query` and
`tags`:

```python
query = await client.query_namespace(
    "products",
    {"vector": embedding, "top_k": 10, "include_attributes": ["title"]},
    raw_query="wireless headphones",
    tags=["app:hev-shop", "surface:storefront", "route:search", "page:first"],
)

history = await client.list_search_history(
    "products",
    tags=["app:hev-shop", "route:search", "page:first"],
    limit=20,
)
```

Keep the query text in `raw_query`; use tags for segmentation, not for
duplicating the query string.

### Tag contract

Layer splits `x-hevlayer-tags` and `?tag=` on commas, trims whitespace,
drops empty values, then sorts and dedupes tags before storing or
matching them. Commas are separators and cannot be escaped.

Limits:

| Limit | Value |
| --- | --- |
| Max tags | 32 unique tags per request or filter |
| Max tag length | 128 bytes |
| Allowed characters | ASCII letters, digits, `:`, `_`, `-`, `.`, `/`, `=`, `+` |

The list filter uses AND semantics: `?tag=a,b` returns only entries that
carry both `a` and `b`.

### Query parameters

| Param | Purpose |
| --- | --- |
| `tag` | Comma-separated tag filter. AND semantics — every tag must match. |
| `from` / `to` | RFC3339 time bounds. |
| `before` | Pagination cursor; return entries strictly older than the given `timestamp_nanos`. |
| `limit` | Cap 500, default 50. |

## Clickstream entry

```json
{
  "events": [
    {
      "timestamp": "2026-05-22T08:00:02.143Z",
      "timestamp_nanos": 1747900802143000000,
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "namespace": "products",
      "doc_id": "asin-B08N5WRWNW",
      "tags": ["session:abc123"],
      "source": "fetch",
      "served_from": "cache"
    }
  ],
  "next_cursor": "1747900802142000000"
}
```

`trace_id` joins to the search-history entry that produced the result;
`served_from` distinguishes a cache hit from an upstream fetch.
`trace_id` is also a supported query parameter so you can pull every
event for a single search session.

## Storage

```text
search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl
```

Writes are best-effort and never block the query response. Aerospike
holds a recent window for fast reads; S3 is the durable store. A
cache outage degrades read latency but not durability — list calls walk
the S3 prefix and merge inline.

---

# Metrics API

Source: https://hevlayer.com/docs/api/metrics

The gateway exposes a Prometheus-shaped metrics surface on its own
endpoint, plus passthrough routes to the bundled VictoriaMetrics
(`vmsingle`) instance so callers can run PromQL without a separate
scraper. A self-describing catalog of every metric the gateway emits
backs both the dashboard's observe tab and external automation.

Per-metric definitions, label conventions, and example PromQL live in the
[metrics catalog](#metrics-catalog) below.

## Routes

| Route | Behavior |
| --- | --- |
| `GET /metrics` | Prometheus exposition from the gateway. |
| `GET /health` | Liveness, NVMe cache connection state, and per-namespace cache state. |
| `GET\|POST /v2/metrics/api/v1/query` | Proxy Prometheus instant query. |
| `GET\|POST /v2/metrics/query` | Short-form instant query proxy. |
| `GET\|POST /v2/metrics/api/v1/query_range` | Proxy Prometheus range query. |
| `GET\|POST /v2/metrics/query_range` | Short-form range query proxy. |
| `GET /v2/metrics/catalog` | List every metric the gateway emits. |
| `GET /v2/metrics/catalog/{name}` | Fetch one catalog entry, including labels and example PromQL. |

## Health

```http
GET /health
```

```json
{
  "status": "ok",
  "version": "0.1.0",
  "aerospike": {
    "connected": true,
    "generation": 3
  },
  "cache_state": [
    {"namespace": "products", "state": "warm", "warmed_through": 1747300000123, "warm_inflight": false}
  ]
}
```

Health always responds `200` while the process is up. The dashboard's
at-a-glance cards read it for degradation signals: `aerospike.connected`
and each `cache_state[].state` tell you whether the gateway is running on
a cold or disconnected NVMe cache — typically just after a restart.

## Metrics catalog

The catalog is the operator-facing manifest of every metric the gateway
emits. Each entry carries name, kind (histogram / counter / gauge),
family, labels, description, example PromQL, and (when applicable) the
alert shape it backs.

```http
GET /v2/metrics/catalog
```

```jsonc
{
  "version": "1",
  "entries": [
    {
      "name": "layer_query_duration_seconds",
      "kind": "histogram",
      "family": "query",
      "labels": ["pipeline_id", "namespace", "status"],
      "description": "Total wall-clock for a query through layer.",
      "example_promql": "histogram_quantile(0.99, sum by (le) (rate(layer_query_duration_seconds_bucket[5m])))",
      "alert": {
        "summary": "Query p99 above target",
        "expr": "histogram_quantile(0.99, ...) > 0.5",
        "for": "10m"
      }
    }
  ]
}
```

`version` bumps when the JSON shape changes incompatibly. The dashboard
observes the catalog and groups entries by family so operators don't
have to memorize which prefix lives where.

The same content is also exportable from the repo via
`cargo run -p metrics-catalog --bin export`.

## PromQL passthrough

The `/v2/metrics/api/v1/query` and `query_range` routes are thin
passthroughs to VictoriaMetrics. Response bodies match Prometheus's HTTP
API shape one-for-one. The short-form aliases under `/v2/metrics/query`
exist to make terminal use ergonomic:

```sh
curl -sG "$LAYER_GATEWAY_URL/v2/metrics/query" \
  --data-urlencode 'query=sum(layer_pipeline_stage_count{stage="pending"})'
```

The gateway does not rewrite queries. Auth happens at the gateway edge;
the upstream VictoriaMetrics instance is never customer-reachable.

---

# Dashboard

Source: https://hevlayer.com/docs/dashboard

import Callout from "../../components/docs/Callout.astro";

The Layer dashboard is the operator surface that ships in-cluster alongside
the gateway. It reads from the same gateway API customers do — no direct
database, Aerospike, or VictoriaMetrics access — and surfaces the views
that justify Layer's role as the operating layer between an application and
its vector store.

Deployments on EKS reach the dashboard at `https://dashboard.hevlayer.com`.
Self-hosted installs expose it via the `layer-dashboard` Service.

## Layout

The dashboard groups everything operators care about into six tabs:

| Tab | What it answers |
| --- | --- |
| **console** | What is happening right now? At-a-glance gauges + activity log. |
| **data** | What is in the indexes? Namespace inventory, snapshot history, schema. |
| **read** | Are queries healthy? Query latency, p99 overhead, Aerospike pool. |
| **write** | Are writes flowing? Pipelines, embed pools, claim/heartbeat state. |
| **cost** | Where is spend going? AWS + Turbopuffer cost lines stacked over time. |
| **observe** | Catalog of every metric the gateway exports, grouped by family. |

## Console

The first view a new operator opens. Two stripes:

- **At a glance** — single-number cards for queries/s, indexed rows/s,
  fetch p99, cache hit ratio, error budget burn. Each card links into the
  matching read / write / observe panel.
- **Activity log** — newest-first stream backed by `/v2/activity/snapshots`
  and the search-history endpoints. Filters are persisted in the URL so
  links survive a refresh.

## Data

The inventory view. Click a namespace to drill into:

- Schema and approximate row count proxied from Turbopuffer metadata.
- Recent snapshot SHAs with field histograms and skipped-field markers
  — see [snapshots](/docs/api/snapshots).
- The current freshness signals (`stable_as_of`, `is_stable`).
- The Index policy fields that govern the namespace — `distanceMetric` and
  the `cache.warming.threads` cap — read from the `Index` resource.
- A unified **jobs** panel covering snapshot, warm, and scan jobs (kind,
  id, status, progress, age) for the namespace.

Two operator actions live here:

- **Trigger snapshot** — materialize a snapshot for one field on demand
  (`POST /v2/namespaces/{ns}/snapshots`), picking the source (`origin`,
  `auto`, `stored`, `cache`).
- **Delete namespace** — `DELETE /v2/namespaces/{ns}`, behind a confirm
  dialog.

This is where operators answer "did the last cutover land?" and "what
shape is this namespace?" without leaving the dashboard.

## Read

Operator answer to "are queries healthy?". Pulls from `layer_query_*`
histograms and the cache metrics families:

- Query latency p50/p95/p99 over the window.
- Layer-side overhead (`query_overhead_seconds`) so the operator can see
  whether slowness is upstream or local.
- Cache hit ratio per namespace, computed from
  `layer_cache_lookups_total`.
- Aerospike pool depth and node state — visible silent-failure surface.
- Aerospike stop-writes, surfaced from
  `layer_aerospike_op_duration_seconds{status="aerospike_stop_writes"}`.

## Write

The pipeline operator view. Surfaces pending / in-flight / failed counts
per pipeline and per UDF, the same numbers KEDA scales from. Click into a
pipeline to see:

- Per-stage counts (`pending`, `embedding`, `indexed`, `failed`).
- Active claims with `worker_id`, lease expiry, heartbeat age.
- Embed pool size and the autoscaling rule attached.
- Reset / pause / resume controls for UDFs (mirrors of the
  `/v2/udfs/{id}/{pause,resume,reset-failed}` endpoints).

The **infra** sub-view leads with the **compute pools** defined in
`InfraRules/default` — the logical pools (name, kind, GPU type,
`maxReplicasPerWorkload`, selector/toleration summary) that pipelines and
UDFs select via `spec.scaling.pool` — above the Karpenter NodePools that
physically provision their nodes.

The write view is the first dashboard stop for PostgreSQL pressure. A
growing `pending` count with rising
`layer_pg_query_duration_seconds{status="pg_error"}` means the queue is
stalled at the indexing-state layer, not at Turbopuffer. Use the
[failure-mode runbook](/docs/failure-modes) before resizing or deleting any
queue state.

## Cost

Stacked-area chart driven by `/v2/cost`, `/v2/cost/timeseries`, and
`/v2/cost/rate-card`. Splits cost across AWS infrastructure lines (compute,
EBS, S3, NAT, ALB) computed from CloudWatch + AWS Pricing API and
Turbopuffer lines (storage, writes, queries) computed from usage metrics
× a code-resident rate card.

The instance picker uses the rate-card endpoint to project the impact of
changing instance types before applying it. Per-namespace attribution is
intentionally not modeled — this view is infra-level only.

## Observe

The full metrics catalog, grouped by family (Turbopuffer ops, cache,
fetch, pipeline progress, resource saturation). Each metric expands into a
sparkline that runs the corresponding PromQL through
`/v2/metrics/api/v1/query_range`. This is the surface operators use when
they need to confirm a hypothesis about behavior without leaving the
dashboard for Grafana.

## Operational notes

<Callout label="cache cold">
Pipeline status is cached in-memory in the gateway to protect PostgreSQL
during repeated dashboard or KEDA polling. `PIPELINE_STATUS_CACHE_TTL_MS`
defaults to 15000.
</Callout>

- Dashboard views should treat cache cold and upstream failures as
  separate operator states. A 503 `cache_cold` is recoverable on its own;
  a 502 from Turbopuffer is not.
- Customer workloads never receive the dashboard URL — only the gateway
  base URL and credentials.
- The dashboard is intentionally read-mostly. Mutating actions (UDF pause,
  InfraRules or scaling edits) are gated through CRD apply or explicit confirm
  dialogs rather than inline controls.

---

# Scans

Source: https://hevlayer.com/docs/scans

Scans answer ad hoc filter questions about a namespace. ID mode creates an
asynchronous job that returns matching document IDs. Count mode returns one
number synchronously and uses the latest snapshot when the filter is covered.

Use scans for bulk exports, manual inspection, UDF discovery debugging,
cache/origin consistency checks, or exact row counts for a filter.

## ID scans

```sh
curl -X POST http://gateway:8080/v2/namespaces/products/scans \
  -H 'content-type: application/json' \
  -d '{"mode": "ids", "source": "auto", "filters": ["category", "Eq", "Electronics"]}'
```

The create call returns `202 Accepted` with a job:

```json
{
  "id": "scan-uuid",
  "namespace": "products",
  "source": "auto",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "created_at": "2026-05-26T10:00:00Z"
}
```

Poll the job, then read results:

```sh
curl http://gateway:8080/v2/namespaces/products/scans/scan-uuid
curl 'http://gateway:8080/v2/namespaces/products/scans/scan-uuid/results?limit=1000'
```

## Count scans

```sh
curl -X POST http://gateway:8080/v2/namespaces/products/scans \
  -H 'content-type: application/json' \
  -d '{"mode": "count", "source": "auto", "filters": ["category", "Eq", "Electronics"]}'
```

```json
{
  "count": 4210,
  "served_by": "snapshot",
  "snapshot_sha": "3f9e8b21",
  "watermark_ms": 1747300000123,
  "elapsed_ms": 3
}
```

`source: auto` checks the latest snapshot first for single-field `Eq` and
`In` filters. If the field is fully present in the snapshot, the response
is served by `snapshot`. Otherwise auto falls through to cache or origin.
Use `source: snapshot` to require the snapshot path; unsupported filters
return `412 precondition_failed`.

## Sources

| Source | ID mode | Count mode |
| --- | --- | --- |
| `auto` | Cache when fresh enough, otherwise origin | Snapshot first, then cache/origin. |
| `snapshot` | Not supported | Latest snapshot only; requires eligible `Eq` or `In`. |
| `cache` | Aerospike document cache only | Aerospike document cache only. |
| `origin` | Turbopuffer paginated scan | Turbopuffer paginated scan. |

When `auto` resolves to cache, the gateway applies
`_hevlayer_upserted_at <= cache_warmed_through` before the user filter. This makes
the scan a stable warmed view instead of a mixed view of old and new rows.

## Filters

Scans accept the same Turbopuffer filter array as [query](/docs/api/query).
On origin scans, the filter is pushed to Turbopuffer. On cache scans, the
gateway evaluates it against cached document attributes.

Supported cache operators are `Eq`, `NotEq`, `Gt`, `Gte`, `Lt`, `Lte`,
`In`, `NotIn`, `And`, `Or`, and `Not`. If `auto` sees a filter the cache
cannot evaluate, it uses origin. Explicit `source: cache` with an
unsupported filter fails rather than returning partial results.

## Operational notes

- ID scan state is in-memory and ephemeral; it resets on gateway restart.
- Count scans have a deadline, default 30s and maximum 300s.
- Snapshot-served count scans are exact at the snapshot `watermark_ms`.
- Live count scans include `bounded`, `timed_out`, and shard fields.

---

# Pipelines

Source: https://hevlayer.com/docs/pipelines

import Diagram from "../../components/docs/Diagram.astro";

A pipeline indexes documents through staged work whose row count changes.
The common shape is **extract** (CPU) and **embed** (GPU). The gateway
tracks document state in PostgreSQL and exports queue depth so the
operator can autoscale workers through KEDA.

Once vectors land in Turbopuffer, query and fetch them through the namespace API — see [Query & Fetch](/docs/api/query).

## Pipeline flow

<Diagram>{`
CPU worker        Gateway                  GPU worker
   |               POST /v2/pipelines        |
   |---- chunks --> PUT /documents/{doc_id}   |
   |               chunks -> S3 + NVMe cache  |
   |               state  -> PostgreSQL       |
   |                                         |
   |               GET /status  <------ KEDA |
   |                                         |
   |               POST /claim <-------------|
   |               GET /chunks <-------------|
   |               PUT /vectors <------------|
   |               vectors -> Turbopuffer    |
`}</Diagram>

**CPU worker** — reads source data, extracts text/metadata, splits into chunks, calls the stage endpoint. Scales on input queue (e.g. SQS depth, Kafka lag).

**GPU worker** — polls the pipeline status endpoint for `pending_count > 0`, fetches chunks from the gateway, runs the embedding model, calls the vectors endpoint. Scales on `pending_count` via KEDA.

The gateway handles chunk storage (S3 backing plus embedded Aerospike cache), vector upsert (Turbopuffer), and state tracking (embedded PostgreSQL). Workers are stateless and never connect to gateway-internal stores.

## Prerequisites

Pipeline routes are registered only when `DATABASE_URL` is configured. The Helm chart sets `DATABASE_URL` to the gateway pod's loopback PostgreSQL sidecar. The migration runs automatically on startup.

```bash
export DATABASE_URL=postgres://hevlayer:hevlayer@localhost:5432/hevlayer
```

## Pipeline CRD

Declare a Pipeline when the operator should own the worker Deployment and
KEDA object. See [Pipeline CRD](/docs/kubernetes/pipeline-crd) for the
full resource reference.

```yaml
apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
  name: product-images
  namespace: layer
spec:
  target:
    namespace: products
  worker:
    image: ghcr.io/hev/product-image-worker:latest
    batchSize: 64
    timeoutSeconds: 60
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 8
```

`spec.scaling.pool` must name a compute pool in `InfraRules/default`.
`mode: fixed` pins replicas to `replicas.min`; `mode: disabled` and
`spec.paused: true` scale the worker to 0.

## Gateway API

### Create a pipeline

```bash
curl -X POST http://gateway:8080/v2/pipelines \
  -H 'content-type: application/json' \
  -d '{
    "id": "product-images",
    "target_namespace": "products",
    "distance_metric": "cosine_distance"
  }'
```

`distance_metric` defaults to `cosine_distance`. Returns 409 if the pipeline already exists.

### Stage a document (CPU worker)

```bash
curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW \
  -H 'content-type: application/json' \
  -d '{
    "chunks": [
      {"id": "asin-B08N5WRWNW-0", "text": "Wireless noise-cancelling headphones"},
      {"id": "asin-B08N5WRWNW-1", "text": "40-hour battery life", "metadata": {"page": 2}}
    ]
  }'
```

Each chunk is stored durably in S3 and cached in Aerospike (set:
`pipe_{target_namespace}`). The document is marked `pending`. Re-staging the
same document ID replaces the previous chunk backing and resets it to
`pending`.

### Get pipeline status (KEDA polling)

```bash
curl http://gateway:8080/v2/pipelines/product-images/status
```

```json
{
  "pipeline_id": "product-images",
  "counts": {"pending": 142, "indexed": 8530},
  "pending_count": 142
}
```

`pending_count` is the field KEDA watches. When it hits zero, GPU workers scale to zero.

### Read chunks and write vectors (GPU worker)

```bash
curl http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/chunks
```

After embedding, write vectors back. This upserts to Turbopuffer and marks the document `indexed`:

```bash
curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/vectors \
  -H 'content-type: application/json' \
  -d '{
    "vectors": [
      {"id": "asin-B08N5WRWNW-0", "vector": [0.0012, -0.043], "attributes": {"text": "..."}}
    ]
  }'
```

### Claim, heartbeat, stage

Workers claim staged documents through layer instead of mutating Postgres directly. Layer sets `claimed_by` and `claimed_at`, moves rows to the requested claim stage, recovers stale claims older than the lease, and uses `FOR UPDATE SKIP LOCKED` so multiple workers can claim concurrently.

```bash
POST /v2/pipelines/product-images/claim
{
  "stage": "pending",
  "claim_stage": "embedding",
  "limit": 2000,
  "worker_id": "gpu-worker-0",
  "lease_seconds": 900
}
```

Heartbeat long-running claims:

```bash
POST /v2/pipelines/product-images/documents/heartbeat
{
  "document_ids": ["B07XYZ123"],
  "stage": "embedding",
  "worker_id": "gpu-worker-0"
}
```

Move claimed documents to a final stage:

```bash
POST /v2/pipelines/product-images/documents/stage
{
  "document_ids": ["B07XYZ123"],
  "stage": "indexed",
  "from_stage": "embedding",
  "worker_id": "gpu-worker-0"
}
```

Use `stage: "pending"` for release and `stage: "failed"` for permanent failures. Use `create_missing: true` without `from_stage`/`worker_id` when a pipeline enqueues lightweight document IDs without chunks (e.g. aggregate refresh jobs).

Pipeline queues are segmented. Layer writes document IDs and chunk ID lists
into compressed S3 manifests and stores only segment leases and counters in
PostgreSQL, so queues scale by segment count rather than by one PostgreSQL row
per document. Set `PIPELINE_SEGMENT_SIZE` to tune the number of logical
documents per segment. The Helm default segment size is 10,000, so 1,000,000
lightweight IDs become about 100 PostgreSQL segment rows.

Segment manifests are queue state, not durable history. Layer deletes
superseded manifests after segment splits, deletes completed manifests when
documents move to `indexed`, and removes the pipeline segment prefix when the
pipeline is deleted.

## Document lifecycle

```
              stage_document()           write_vectors()
  (new doc) ──────────────────► pending ──────────────────► indexed
                                  ▲
                                  │ re-stage (idempotent)
```

- **pending** — chunks stored in Aerospike, waiting for embedding.
- **indexed** — vectors written to Turbopuffer.

Re-staging a document resets it to `pending` with new chunks. Useful for reprocessing after source data changes.

## Failure model

- Turbopuffer write failures are hard: the vectors route returns 502 and the document stays in `embedding` for re-claim.
- Aerospike cache failures do not block chunk reads when S3 backing is present; PostgreSQL connectivity surfaces as 500 and should be retried with backoff.
- Lease expiry is handled server-side. A worker that crashes mid-embedding has its documents recovered on the next claim sweep.

## Autoscaling

The operator emits KEDA directly from `Pipeline.spec.scaling`. For
manual workers that are not represented by a Pipeline CR, use the same
Prometheus signal:

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: gpu-embed-worker
spec:
  scaleTargetRef:
    name: gpu-embed-worker
  minReplicaCount: 0
  maxReplicaCount: 8
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://layer-gateway.layer.svc.cluster.local:8080/v2/metrics
        metricName: product_images_pending
        query: 'sum(layer_pipeline_stage_count{pipeline_id="product-images",stage="pending"}) or vector(0)'
        threshold: "50"              # 1 replica per 50 pending docs
        activationThreshold: "1"     # scale from 0 when any doc is pending
```

This keeps autoscaling close to the same source of truth Layer uses for
claims while keeping PostgreSQL private to the gateway pod.

### CPU workers — scale on input source

CPU workers scale on whatever feeds them — SQS queue depth, Kafka consumer lag, S3 event notifications, etc. This is independent of the pipeline API.

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-extract-worker
spec:
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/product-images
        queueLength: "10"
        awsRegion: us-east-1
```

---

# UDFs

Source: https://hevlayer.com/docs/udfs

import Diagram from "../../components/docs/Diagram.astro";

A UDF is a stateless worker that preserves row count: one input row
produces one derived attribute on the same row. Embeddings,
classifications, tags, and attribute backfills all use the same primitive.

Use a pipeline when external data becomes rows, or when one row fans out
into many rows. Use a UDF when rows already in Layer acquire derived
attributes.

<Diagram>{`
Gateway                              Worker Deployment
  | create ID scan                     |
  | POST /v2/namespaces/{ns}/scans     |
  | filters: spec.filter               |
  |                                     |
  | enqueue (namespace, id) rows        |
  | into the UDF queue                  |
  |                                     |
  | <----- POST /v2/udfs/{id}/claim ---|
  | -----> rows + input columns ------>|
  |                                     |  fn(*, id, title) -> list[str]
  | <- POST /v2/udfs/{id}/items/complete
  |                                     |
  | writeback: Turbopuffer patch_columns
`}</Diagram>

## Author a worker

The Python SDK turns a normal function into the claim/process/complete
loop.

```python
import asyncio
from hevlayer.udf import PermanentError, TransientError, run_udf_worker, udf


@udf(inputs=["id", "title", "description"], output="tags", kind="tags")
def tag_product(*, id: str, title: str | None, description: str | None) -> list[str]:
    if not title:
        raise PermanentError(f"{id}: missing title")
    try:
        text = f"{title} {description or ''}".lower()
    except TypeError as exc:
        raise TransientError(str(exc)) from exc

    tags: list[str] = []
    if "wireless" in text:
        tags.append("wireless")
    if "waterproof" in text:
        tags.append("waterproof")
    return tags or ["uncategorized"]


if __name__ == "__main__":
    asyncio.run(run_udf_worker(tag_product, udf_id="product-tags"))
```

Function parameters are keyword-only and named to match `inputs`.
Raise `TransientError` for retryable work and `PermanentError` for
unrecoverable input.

## Declare the function

Apply a `Function` CRD. The operator emits a worker `Deployment`,
optional `Service` for push dispatch, and a KEDA `ScaledObject` from
`spec.scaling`. The gateway uses the Function spec to register the UDF
queue and discovery policy.

```yaml
apiVersion: hevlayer.com/v1alpha1
kind: Function
metadata:
  name: product-tags
  namespace: hev-shop
spec:
  paused: false
  targetNamespaces:
    - amazon-products
  inputs:
    - id
    - title
    - description
  output:
    attribute: tags
    kind: tags
    version: v1
  filter:
    - "Or"
    - - ["tags_v", "NotEq", "v1"]
      - ["tags_v", "Eq", null]
  triggers:
    - discovery
  worker:
    image: ghcr.io/hev/hev-shop-udf-product-tags:latest
    dispatch: pull
    batchSize: 16
    timeoutSeconds: 30
  schedule:
    discoveryIntervalSeconds: 300
    leaseSeconds: 120
    maxInFlightBatches: 4
    maxConcurrentScans: 1
  retry:
    maxAttempts: 6
    initialBackoffSeconds: 5
    maxBackoffSeconds: 300
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 4
```

`spec.filter` is the same JSON tuple syntax used in Turbopuffer queries.
The 0.1 CRD preserves array-form filters, so compound expressions like
the example above can be applied directly.

The worker pod receives `HEVLAYER_UDF_ID`, `HEVLAYER_BASE_URL`,
`HEVLAYER_UDF_BATCH_SIZE`, `HEVLAYER_UDF_TIMEOUT_SECONDS`,
`HEVLAYER_UDF_LEASE_SECONDS`, and `LAYER_GATEWAY_API_KEY`.

The CRD is the source of truth for the worker shape. Use
`POST /v2/udfs/{id}/discover`, `claim`, and `complete` only for runtime
coordination and manual recovery; do not create a separate Deployment for
the same function unless you also take over scaling and placement.

## Scaling and placement

`spec.scaling` is the scaling contract for the Function worker.

| Field | Purpose |
| --- | --- |
| `pool` | Name of a compute pool in `InfraRules/default`. |
| `mode` | `autoscale`, `fixed`, or `disabled`. |
| `replicas.min` | Minimum worker replicas. Use `1` for warm workers. |
| `replicas.max` | Maximum worker replicas; must not exceed the pool cap. |

`InfraRules` owns shared placement: node selectors, tolerations, resource
requests, and per-workload replica ceilings. Workload specs choose a
pool; they do not repeat placement rules.

For extra pod-level config, set `spec.worker.podSpec`. It is
deep-merged into the operator pod spec. Container array overrides are
not merged.

## Gateway API

In Kubernetes installs the Function CRD is the source of truth and the
runtime API below is registered from it. The routes are the same surface
the Python SDK drives, and the path you reach for to register a UDF
without the operator or to coordinate and recover workers by hand.

### Spec routes

| Route | Behavior |
| --- | --- |
| `POST /v2/udfs` | Create a UDF definition and queue. |
| `GET /v2/udfs` | List UDFs. |
| `GET /v2/udfs/{id}` | Read a UDF. |
| `DELETE /v2/udfs/{id}` | Delete a UDF and its queue (does not delete written output). |
| `GET /v2/udfs/{id}/status` | Queue depth, in-flight, failed counts. |

The create body carries the same shape the CRD `spec` expresses:

```http
POST /v2/udfs
Content-Type: application/json

{
  "id": "product-tags",
  "spec": {
    "target_namespaces": ["amazon-products"],
    "inputs": ["id", "title", "description"],
    "output": {"attribute": "tags", "kind": "tags", "version": "v1"},
    "filter": ["Or", ["tags_v", "NotEq", "v1"], ["tags_v", "Eq", null]],
    "triggers": ["discovery"],
    "worker": {
      "image": "ghcr.io/hev/hev-shop-udf-product-tags:latest",
      "port": 8080,
      "batch_size": 16,
      "timeout_seconds": 30
    },
    "schedule": {
      "discovery_interval_seconds": 300,
      "lease_seconds": 120,
      "max_in_flight_batches": 4,
      "max_concurrent_scans": 1
    },
    "retry": {"max_attempts": 6, "initial_backoff_seconds": 5, "max_backoff_seconds": 300}
  }
}
```

### Lifecycle routes

| Route | Behavior |
| --- | --- |
| `POST /v2/udfs/{id}/pause` | Stop both discovery and dispatch. Workers drain in-flight then idle. |
| `POST /v2/udfs/{id}/resume` | Resume discovery and dispatch. |
| `POST /v2/udfs/{id}/reset-failed` | Move every row in `failed` back to `pending`. |
| `POST /v2/udfs/{id}/discover` | Trigger a discovery sweep immediately. |

`reset-failed` is the recovery path after a transient upstream
incident — for permanent issues, fix the input shape or bump
`spec.output.version` and re-apply.

### Worker coordination routes

| Route | Behavior |
| --- | --- |
| `POST /v2/udfs/{id}/claim` | Claim a batch of rows for processing. |
| `POST /v2/udfs/{id}/items/heartbeat` | Extend the lease on in-flight items. |
| `POST /v2/udfs/{id}/items/complete` | Report success and persist output. |
| `POST /v2/udfs/{id}/items/fail` | Report failure (transient or permanent). |

The Python SDK's `run_udf_worker` implements the full loop — most
workloads should never call these routes directly.

```http
POST /v2/udfs/product-tags/items/complete
Content-Type: application/json

{
  "worker_id": "udf-worker-0",
  "items": [
    {"namespace": "amazon-products", "id": "asin-B08N5WRWNW", "output": ["wireless", "waterproof"]}
  ]
}
```

`claim` returns the batch as `(namespace, id)` pairs alongside the input
columns the spec declared. Rows the gateway can't bind from the index
(missing required inputs) surface as bind errors, not silent skips, so
the worker can fail them explicitly rather than retry forever. On `fail`,
`kind: transient` honors `spec.retry` while `kind: permanent`
dead-letters immediately — the SDK derives `kind` from `TransientError` /
`PermanentError`.

### Writeback and discovery

UDF outputs are patched onto the target row as the named attribute.
`output.kind` is an SDK type hint; writeback semantics are the same for
tags, classifications, scalars, and vectors. When `spec.output.version`
is set, the gateway atomically writes the output and the matching
`{attribute}_v` marker in a single patch.

Discovery sweeps create an ID scan with `spec.filter` against each
`target_namespace`. Returned IDs are enqueued and deduplicated. The first
sweep after create/apply is implicit; subsequent sweeps run on
`schedule.discovery_interval_seconds`.

## Lifecycle

```sh
kubectl get function product-tags
kubectl describe function product-tags

curl -H "authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  $LAYER_GATEWAY_URL/v2/udfs/product-tags/status

kubectl patch function product-tags --type=merge -p '{"spec":{"paused":true}}'
kubectl patch function product-tags --type=merge -p '{"spec":{"paused":false}}'

curl -X POST -H "authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  $LAYER_GATEWAY_URL/v2/udfs/product-tags/reset-failed

kubectl delete function product-tags
```

Deletion garbage-collects the operator-managed Deployment, Service, and
ScaledObject. Written outputs are not deleted.

## Version markers

`spec.output.version` is the re-run safety rail. When set, the gateway
stamps `{attribute}_v` alongside every output write. Bump the version
and keep the canonical stale filter when a model, taxonomy, or prompt
changes.

## Tuning knobs

| Knob | What it bounds |
| --- | --- |
| `worker.batchSize` | Rows per worker batch. |
| `worker.timeoutSeconds` | Worker call timeout. |
| `schedule.leaseSeconds` | How long a claim is held before reissue. |
| `schedule.discoveryIntervalSeconds` | Time between discovery scan jobs. |
| `schedule.maxInFlightBatches` | Concurrent worker batches per UDF. |
| `schedule.maxConcurrentScans` | Concurrent namespace discovery jobs. |
| `retry.maxAttempts` | Tries before a row lands in `failed`. |

## Not in 0.1

- Cross-namespace aggregate UDFs.
- Chunkers or fan-out transforms; those remain pipelines.
- Multi-output UDFs.
- Managed image builds.

---

# hev-shop

Source: https://hevlayer.com/docs/hev-shop

import LinkGrid from "../../components/docs/LinkGrid.astro";

## What hev-shop is

hev-shop is a live semantic shopping app built on the Layer gateway. It turns Amazon Reviews 2023 product and review data into vectors, writes through Layer into Turbopuffer, and serves search, filters, product pages, and review-derived tags.

The running storefront is public so you can see what a Layer-backed workload looks like end to end. The source code is not currently open source — it ships as a reference starter kit granted to design-preview participants.

<LinkGrid links={[
	{ href: "https://hev-shop.com", label: "Live demo", description: "See the running semantic storefront." },
]} />

## Reference starter kit

Design-preview participants get private repo access and fork hev-shop as the starting point for their own workload. The pieces worth knowing before you fork:

- indexer/app/layer_client.py — single HTTP path to the Layer gateway.
- indexer/app/pipeline.py — claim, heartbeat, stage, and completion lifecycle.
- web/app/api/search/route.ts and web/lib/backend.ts — search through the backend with stable_as_of preserved.
- helm/hev-shop — pipeline-metric scaling and optional CPU/GPU node pools.

## Why it matters

The repo is not a generic ecommerce starter. It makes the application contract concrete: stage work, claim work, embed it, write vectors, query with freshness signals, and let the gateway own the Turbopuffer edge — so your team starts from a working pattern, not a blank slate.