# hev layer — full docs > Concatenated docs surface. Index at https://hevlayer.com/llms.txt. --- ## Search knowledge graph Source: https://hevlayer.com/docs/search-knowledge-graph Version: 2 Generated: 2026-06-05T21:31:25.527Z Content hash: 4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458 Context: ## Layer (hev layer) Layer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`. ### Core building blocks - **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime. - **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status. - **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic. ### Key concepts users ask about - **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`. - **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them. - **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response. - **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable. - **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts. - **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`. - **Dashboard** — read-mostly operator GUI reading the same gateway API. ### How users talk about it Users say "the gateway," "drop-in Turbopuffer client," "warm the cache," "strongly consistent query," "snapshot," "facet counts," "scan a filter," "stage/claim/embed," "UDF/function," "compute pool," and "scale to zero." Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache). Glossary: - Gateway: The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. Aliases: layer-gateway, the proxy, rust gateway. - stable watermark: Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads. Aliases: watermark, stableasof, consistency watermark. - pull-through cache: NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency. Aliases: document cache, nvme cache, aerospike. - UDF: A stateless worker that computes one derived attribute per row of an index, without changing row count. Aliases: user-defined function, function, udfs. - pipeline: A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output. Aliases: pipelines, indexing pipeline. - operator: The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources. Aliases: layer-operator, k8s operator, kubernetes operator. - CRD: Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules. Aliases: custom resource definition, index crd, function crd, pipeline crd, infrarules. - snapshot: A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable. Aliases: snapshots, facet snapshot, facet histogram. - scan: A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously. Aliases: scans, filter scan. - ask CLI: Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key. Aliases: ask, hevlayer-docs skill. Raw JSON: ```json { "version": 2, "generatedAt": "2026-06-05T21:31:25.527Z", "contentHash": "4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458", "context": "## Layer (hev layer)\n\nLayer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`.\n\n### Core building blocks\n- **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime.\n- **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status.\n- **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic.\n\n### Key concepts users ask about\n- **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`.\n- **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them.\n- **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response.\n- **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable.\n- **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts.\n- **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`.\n- **Dashboard** — read-mostly operator GUI reading the same gateway API.\n\n### How users talk about it\nUsers say \"the gateway,\" \"drop-in Turbopuffer client,\" \"warm the cache,\" \"strongly consistent query,\" \"snapshot,\" \"facet counts,\" \"scan a filter,\" \"stage/claim/embed,\" \"UDF/function,\" \"compute pool,\" and \"scale to zero.\" Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache).", "glossary": [ { "term": "Gateway", "aliases": [ "layer-gateway", "the proxy", "rust gateway" ], "definition": "The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime." }, { "term": "stable watermark", "aliases": [ "watermark", "stableasof", "consistency watermark" ], "definition": "Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads." }, { "term": "pull-through cache", "aliases": [ "document cache", "nvme cache", "aerospike" ], "definition": "NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency." }, { "term": "UDF", "aliases": [ "user-defined function", "function", "udfs" ], "definition": "A stateless worker that computes one derived attribute per row of an index, without changing row count." }, { "term": "pipeline", "aliases": [ "pipelines", "indexing pipeline" ], "definition": "A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output." }, { "term": "operator", "aliases": [ "layer-operator", "k8s operator", "kubernetes operator" ], "definition": "The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources." }, { "term": "CRD", "aliases": [ "custom resource definition", "index crd", "function crd", "pipeline crd", "infrarules" ], "definition": "Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules." }, { "term": "snapshot", "aliases": [ "snapshots", "facet snapshot", "facet histogram" ], "definition": "A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable." }, { "term": "scan", "aliases": [ "scans", "filter scan" ], "definition": "A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously." }, { "term": "ask CLI", "aliases": [ "ask", "hevlayer-docs skill" ], "definition": "Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key." } ], "overview": "## API\n- Introduction — `api/introduction`\n- Cache warm hint — GET /v1/namespaces/{ns}/hint_cache_warm — `api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm`\n- Client fall-through — `api/introduction#client-fall-through`\n- Compatibility posture — `api/introduction#compatibility-posture`\n- Cross-cutting conventions — `api/introduction#cross-cutting-conventions`\n- Enhancements to upstream routes — `api/introduction#enhancements-to-upstream-routes`\n- Install — `api/introduction#install`\n- Metadata — GET /v2/namespaces/{ns}/metadata — `api/introduction#metadata--get-v2namespacesnsmetadata`\n- Query — POST /v2/namespaces/{ns}/query — `api/introduction#query--post-v2namespacesnsquery`\n- Write — POST /v2/namespaces/{ns} and PATCH /v2/namespaces/{ns} — `api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns`\n- Metrics API — `api/metrics`\n- Health — `api/metrics#health`\n- Metrics catalog — `api/metrics#metrics-catalog`\n- PromQL passthrough — `api/metrics#promql-passthrough`\n- Routes — `api/metrics#routes`\n- Namespace metadata — `api/namespace-metadata`\n- List namespaces — `api/namespace-metadata#list-namespaces`\n- Request — `api/namespace-metadata#request`\n- The layer block — `api/namespace-metadata#the-layer-block`\n- Query & Fetch — `api/query`\n- Batch fetch — `api/query#batch-fetch`\n- Behavior matrix — `api/query#behavior-matrix`\n- Explain query — `api/query#explain-query`\n- Fetch — `api/query#fetch`\n- Filter shape — `api/query#filter-shape`\n- Query request — `api/query#query-request`\n- Single fetch — `api/query#single-fetch`\n- Strong-consistent reads — `api/query#strong-consistent-reads`\n- Tunables — `api/query#tunables`\n- Result Count — `api/result-count`\n- Scan — `api/scans`\n- Auto-Mode Policy — `api/scans#auto-mode-policy`\n- Count Mode — `api/scans#count-mode`\n- ID Mode — `api/scans#id-mode`\n- Routes — `api/scans#routes`\n- Query History — `api/search-history`\n- Clickstream entry — `api/search-history#clickstream-entry`\n- Query parameters — `api/search-history#query-parameters`\n- Routes — `api/search-history#routes`\n- Search history entry — `api/search-history#search-history-entry`\n- Storage — `api/search-history#storage`\n- Tag contract — `api/search-history#tag-contract`\n- Writing metadata — `api/search-history#writing-metadata`\n- Snapshot History — `api/snapshots`\n- Activity — `api/snapshots#activity`\n- Configure watched fields — `api/snapshots#configure-watched-fields`\n- Create a snapshot job — `api/snapshots#create-a-snapshot-job`\n- History — `api/snapshots#history`\n- Routes — `api/snapshots#routes`\n- Snapshot body — `api/snapshots#snapshot-body`\n- Warm cache — `api/warm-cache`\n- Cache-cold behavior — `api/warm-cache#cache-cold-behavior`\n- Hint-cache warm — `api/warm-cache#hint-cache-warm`\n- Layer warm — `api/warm-cache#layer-warm`\n- Write & Stage — `api/write`\n- Patch — `api/write#patch`\n- Pipeline stage — `api/write#pipeline-stage`\n- Side effects — `api/write#side-effects`\n- Upsert and delete — `api/write#upsert-and-delete`\n## Guides\n- Dashboard — `dashboard`\n- Console — `dashboard#console`\n- Cost — `dashboard#cost`\n- Data — `dashboard#data`\n- Layout — `dashboard#layout`\n- Observe — `dashboard#observe`\n- Operational notes — `dashboard#operational-notes`\n- Read — `dashboard#read`\n- Write — `dashboard#write`\n- hev-shop — `hev-shop`\n- Reference starter kit — `hev-shop#reference-starter-kit`\n- What hev-shop is — `hev-shop#what-hev-shop-is`\n- Why it matters — `hev-shop#why-it-matters`\n- Pipelines — `pipelines`\n- Autoscaling — `pipelines#autoscaling`\n- Claim, heartbeat, stage — `pipelines#claim-heartbeat-stage`\n- CPU workers — scale on input source — `pipelines#cpu-workers--scale-on-input-source`\n- Create a pipeline — `pipelines#create-a-pipeline`\n- Document lifecycle — `pipelines#document-lifecycle`\n- Failure model — `pipelines#failure-model`\n- Gateway API — `pipelines#gateway-api`\n- Get pipeline status (KEDA polling) — `pipelines#get-pipeline-status-keda-polling`\n- Pipeline CRD — `pipelines#pipeline-crd`\n- Pipeline flow — `pipelines#pipeline-flow`\n- Prerequisites — `pipelines#prerequisites`\n- Read chunks and write vectors (GPU worker) — `pipelines#read-chunks-and-write-vectors-gpu-worker`\n- Stage a document (CPU worker) — `pipelines#stage-a-document-cpu-worker`\n- Scans — `scans`\n- Count scans — `scans#count-scans`\n- Filters — `scans#filters`\n- ID scans — `scans#id-scans`\n- Operational notes — `scans#operational-notes`\n- Sources — `scans#sources`\n- Search Knowledge Graph — `search-knowledge-graph`\n- Current graph — `search-knowledge-graph#current-graph`\n- UDFs — `udfs`\n- Author a worker — `udfs#author-a-worker`\n- Declare the function — `udfs#declare-the-function`\n- Gateway API — `udfs#gateway-api`\n- Lifecycle — `udfs#lifecycle`\n- Lifecycle routes — `udfs#lifecycle-routes`\n- Not in 0.1 — `udfs#not-in-01`\n- Scaling and placement — `udfs#scaling-and-placement`\n- Spec routes — `udfs#spec-routes`\n- Tuning knobs — `udfs#tuning-knobs`\n- Version markers — `udfs#version-markers`\n- Worker coordination routes — `udfs#worker-coordination-routes`\n- Writeback and discovery — `udfs#writeback-and-discovery`\n## Operations\n- Failure Modes — `failure-modes`\n- Read — `failure-modes#read`\n- Write — `failure-modes#write`\n- Install — `install`\n- What ships in 0.1 — `install#what-ships-in-01`\n- Helm Install — `install/helm`\n- Install — `install/helm#install`\n- Required values — `install/helm#required-values`\n- What gets installed — `install/helm#what-gets-installed`\n- Terraform — `install/terraform`\n- Cluster: recommended — `install/terraform#cluster-recommended`\n- Cost notes — `install/terraform#cost-notes`\n- Outputs — `install/terraform#outputs`\n- What it sets up — `install/terraform#what-it-sets-up`\n- Function CRD — `kubernetes/function-crd`\n- Output — `kubernetes/function-crd#output`\n- Scaling — `kubernetes/function-crd#scaling`\n- Selection — `kubernetes/function-crd#selection`\n- Worker — `kubernetes/function-crd#worker`\n- Index CRD — `kubernetes/index-crd`\n- Backend — `kubernetes/index-crd#backend`\n- Cache policy — `kubernetes/index-crd#cache-policy`\n- Snapshot policy — `kubernetes/index-crd#snapshot-policy`\n- Status — `kubernetes/index-crd#status`\n- Operator Overview — `kubernetes/operator`\n- CRDs — `kubernetes/operator#crds`\n- Relationship to the gateway — `kubernetes/operator#relationship-to-the-gateway`\n- Scheduling and node pools — `kubernetes/operator#scheduling-and-node-pools`\n- Pipeline CRD — `kubernetes/pipeline-crd`\n- Scaling — `kubernetes/pipeline-crd#scaling`\n- Source — `kubernetes/pipeline-crd#source`\n- Status — `kubernetes/pipeline-crd#status`\n- Target — `kubernetes/pipeline-crd#target`\n- Worker — `kubernetes/pipeline-crd#worker`\n- InfraRules CRD — `kubernetes/scaling-crd`\n- Compute pools — `kubernetes/scaling-crd#compute-pools`\n- Document cache rules — `kubernetes/scaling-crd#document-cache-rules`\n- InfraRules — `kubernetes/scaling-crd#infrarules`\n- Workload scaling — `kubernetes/scaling-crd#workload-scaling`\n## Overview\n- Agents — `agents`\n- 1. Install the CLI — `agents#1-install-the-cli`\n- 2. Add the skill — `agents#2-add-the-skill`\n- 3. Ask — `agents#3-ask`\n- The verbs — `agents#the-verbs`\n- Why answers stay grounded — `agents#why-answers-stay-grounded`\n- Concepts — `concepts`\n- Control loops — `concepts#control-loops`\n- Gateway enhancements — `concepts#gateway-enhancements`\n- Glossary — `concepts#glossary`\n- Kubernetes autoscaling — `concepts#kubernetes-autoscaling`\n- Observability as code — `concepts#observability-as-code`\n- Pull-through cache — `concepts#pull-through-cache`\n- Scatter/gather — `concepts#scattergather`\n- Document model — `document-model`\n- No Guarantees — `guarantees`\n- Commitments — `guarantees#commitments`\n- Introduction — `index`\n- Limits — `limits`\n- No limits — `limits#no-limits`\n- Roadmap & Changelog — `roadmap`\n- 0.1 Release (UAT) — `roadmap#01-release-uat`\n- API hardening — `roadmap#api-hardening`\n- Later — `roadmap#later`\n- Lifecycle and operability — `roadmap#lifecycle-and-operability`\n- Search — `roadmap#search`\n- Surfaces — `roadmap#surfaces`\n- Up Next — `roadmap#up-next`\n- Tradeoffs — `tradeoffs`", "suggestions": [ "How do I get strongly consistent reads after a write?", "What's the difference between a pipeline and a UDF?", "What happens when the document cache is down?", "How do I install Layer into my cluster?", "Can my coding agent query these docs?" ], "nodes": [ { "id": "agents", "kind": "section", "title": "Agents", "heading": null, "group": "Overview", "url": "/docs/agents", "summary": "Coding agents can query the Layer docs from the command line using the ask CLI, the same search engine behind the site overlay, getting grounded answers with citations and no scraping, MCP server, or API key. Two commands wire it up.", "facts": [ { "kind": "code", "literal": "⌘K", "chunkId": "agents" }, { "kind": "value", "literal": "Callout.astro", "chunkId": "agents" } ], "sources": [ { "chunkId": "agents", "url": "/docs/agents", "anchor": null } ], "mode": "agent-primary", "terms": [ "coding", "agents", "query", "layer", "docs", "command", "line", "same", "search", "engine", "behind", "site", "overlay", "getting", "grounded", "answers", "citations", "scraping", "server", "commands", "wire", "callout", "astro", "agent", "install", "file", "skill", "these", "queryable", "ships", "read", "cite", "directly" ] }, { "id": "agents#1-install-the-cli", "kind": "section", "title": "Agents", "heading": "1. Install the CLI", "group": "Overview", "url": "/docs/agents#1-install-the-cli", "summary": "Install the self-contained ask CLI binary via go install; any agent harness that can run a shell command can then use it.", "facts": [ { "kind": "code", "literal": "go install github.com/hev/ask/cmd/ask@latest", "chunkId": "agents#1-install-the-cli" } ], "sources": [ { "chunkId": "agents#1-install-the-cli", "url": "/docs/agents#1-install-the-cli", "anchor": "1-install-the-cli" } ], "mode": "agent-primary", "terms": [ "install", "self", "contained", "binary", "agent", "harness", "shell", "command", "github", "latest" ] }, { "id": "agents#2-add-the-skill", "kind": "section", "title": "Agents", "heading": "2. Add the skill", "group": "Overview", "url": "/docs/agents#2-add-the-skill", "summary": "Add a one-file skill so an agent answers Layer questions from the docs rather than memory: for Claude Code drop a SKILL.md that points the keyless ask verbs at the public endpoint, and for other harnesses paste the same instructions into AGENTS.md. Section ids look like api/query#strong-consistent-reads and answers should cite the returned url.", "facts": [ { "kind": "code", "literal": "AGENTS.md", "chunkId": "agents#2-add-the-skill" } ], "sources": [ { "chunkId": "agents#2-add-the-skill", "url": "/docs/agents#2-add-the-skill", "anchor": "2-add-the-skill" } ], "mode": "agent-primary", "terms": [ "skill", "file", "agent", "answers", "layer", "questions", "docs", "rather", "memory", "claude", "code", "drop", "points", "keyless", "verbs", "public", "endpoint", "other", "harnesses", "paste", "same", "instructions", "agents", "section", "look", "like", "query", "strong", "consistent", "reads", "should", "cite", "returned", "once", "mkdir", "skills", "hevlayer", "name", "description", "user" ] }, { "id": "agents#3-ask", "kind": "section", "title": "Agents", "heading": "3. Ask", "group": "Overview", "url": "/docs/agents#3-ask", "summary": "Running the search verb against the endpoint returns ranked sections with titles, headings, deep-link URLs, and snippets; the agent typically then fetches the winning section and answers with its citation.", "facts": [ { "kind": "code", "literal": "ask --endpoint https://hevlayer.com/api/ask search \"cache is down\"", "chunkId": "agents#3-ask" }, { "kind": "code", "literal": "{\n \"results\": [\n {\n \"title\": \"Concepts\",\n \"heading\": \"Pull-through cache\",\n \"url\": \"/docs/concepts#pull-through-cache\",\n \"group\": \"Overview\",\n \"snippet\": \"Document reads are served by a pull-through cache: the gateway checks...\"\n }\n ]\n}", "chunkId": "agents#3-ask" }, { "kind": "code", "literal": "section get", "chunkId": "agents#3-ask" } ], "sources": [ { "chunkId": "agents#3-ask", "url": "/docs/agents#3-ask", "anchor": "3-ask" } ], "mode": "agent-primary", "terms": [ "running", "search", "verb", "against", "endpoint", "returns", "ranked", "sections", "titles", "headings", "deep", "link", "urls", "snippets", "agent", "typically", "fetches", "winning", "section", "answers", "citation", "https", "hevlayer", "cache", "down", "results", "title", "concepts", "heading", "pull", "through", "docs", "group", "overview", "snippet", "document", "reads", "served", "gateway", "checks" ] }, { "id": "agents#the-verbs", "kind": "section", "title": "Agents", "heading": "The verbs", "group": "Overview", "url": "/docs/agents#the-verbs", "summary": "The CLI exposes four read verbs: an orientation/section-map overview, a ranked search with snippets and deep links, a single-section detail fetch, and a glossary lookup that resolves a product term through its aliases.", "facts": [ { "kind": "code", "literal": "overview", "chunkId": "agents#the-verbs" }, { "kind": "code", "literal": "search \"\"", "chunkId": "agents#the-verbs" }, { "kind": "code", "literal": "section get \"\"", "chunkId": "agents#the-verbs" }, { "kind": "code", "literal": "glossary get \"\"", "chunkId": "agents#the-verbs" }, { "kind": "code", "literal": "watermark", "chunkId": "agents#the-verbs" } ], "sources": [ { "chunkId": "agents#the-verbs", "url": "/docs/agents#the-verbs", "anchor": "the-verbs" } ], "mode": "agent-primary", "terms": [ "verbs", "exposes", "four", "read", "orientation", "section", "overview", "ranked", "search", "snippets", "deep", "links", "single", "detail", "fetch", "glossary", "lookup", "resolves", "product", "term", "through", "aliases", "query", "watermark", "verb", "returns", "context", "plus", "full", "stable", "sections", "summary", "exact", "identifiers", "source", "resolved" ] }, { "id": "agents#why-answers-stay-grounded", "kind": "section", "title": "Agents", "heading": "Why answers stay grounded", "group": "Overview", "url": "/docs/agents#why-answers-stay-grounded", "summary": "Search runs over a committed, reviewable digest of the docs whose anchors are CI-verified against rendered pages so cited deep links always resolve, and the digest is rebuilt when docs change. Every verb is a keyless read; the docs are also available as plain-text llms files, though the CLI is the cheaper, better path for agents that can run commands.", "facts": [ { "kind": "value", "literal": "llms.txt", "chunkId": "agents#why-answers-stay-grounded" }, { "kind": "value", "literal": "llms-full.txt", "chunkId": "agents#why-answers-stay-grounded" } ], "sources": [ { "chunkId": "agents#why-answers-stay-grounded", "url": "/docs/agents#why-answers-stay-grounded", "anchor": "why-answers-stay-grounded" } ], "mode": "agent-primary", "terms": [ "answers", "stay", "grounded", "search", "runs", "committed", "reviewable", "digest", "docs", "whose", "anchors", "verified", "against", "rendered", "pages", "cited", "deep", "links", "always", "resolve", "rebuilt", "change", "every", "verb", "keyless", "read", "also", "available", "plain", "text", "llms", "files", "though", "cheaper", "better", "path", "agents", "commands", "full", "these" ] }, { "id": "api/introduction", "kind": "section", "title": "Introduction", "heading": null, "group": "API", "url": "/docs/api/introduction", "summary": "Layer matches the Turbopuffer wire contract so existing clients keep working when pointed at the gateway, and the docs describe only what Layer adds on top of each route, linking out to upstream for the underlying request/response shapes.", "facts": [ { "kind": "value", "literal": "Upstream.astro", "chunkId": "api/introduction" } ], "sources": [ { "chunkId": "api/introduction", "url": "/docs/api/introduction", "anchor": null } ], "mode": "source-primary", "terms": [ "layer", "matches", "turbopuffer", "wire", "contract", "existing", "clients", "keep", "working", "pointed", "gateway", "docs", "describe", "only", "adds", "route", "linking", "upstream", "underlying", "request", "response", "shapes", "astro", "point", "client", "equivalent", "site", "documents", "behavior", "itself", "follow", "link", "page", "shape" ] }, { "id": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm", "kind": "section", "title": "Introduction", "heading": "Cache warm hint — GET /v1/namespaces/{ns}/hint_cache_warm", "group": "API", "url": "/docs/api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm", "summary": "The cache warm hint route forwards the hint upstream and then runs Layer-side warm steps: a warm job to backfill the NVMe cache from origin and a mirror of the latest snapshot body into NVMe, with each step independently toggleable per request.", "facts": [ { "kind": "code", "literal": "GET /v1/namespaces/{ns}/hint_cache_warm", "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm" } ], "sources": [ { "chunkId": "api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm", "url": "/docs/api/introduction#cache-warm-hint--get-v1namespacesnshint_cache_warm", "anchor": "cache-warm-hint--get-v1namespacesnshint_cache_warm" } ], "mode": "source-primary", "terms": [ "cache", "warm", "hint", "namespaces", "route", "forwards", "upstream", "runs", "layer", "side", "steps", "backfill", "nvme", "origin", "mirror", "latest", "snapshot", "body", "step", "independently", "toggleable", "request", "turbopuffer", "hintcachewarm", "contract", "plus", "page" ] }, { "id": "api/introduction#client-fall-through", "kind": "section", "title": "Introduction", "heading": "Client fall-through", "group": "API", "url": "/docs/api/introduction#client-fall-through", "summary": "The Python SDK can fall through to Turbopuffer directly when the gateway is unreachable, but only for calls satisfiable without Layer state such as simple vector queries and raw Turbopuffer-compatible methods; Layer-only workflows like fetches, warm jobs, pipelines, UDFs, and search-by-id still fail fast because they depend on gateway-owned state. The fallback emits a warning, can be disabled, and is reported in the perf object.", "facts": [ { "kind": "code", "literal": "write_namespace", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "query_turbopuffer_namespace", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "LayerPerf.fallback", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "turbopuffer_direct", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "with_perf=True", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "nearest_to_id", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "fallback_to_turbopuffer=False", "chunkId": "api/introduction#client-fall-through" }, { "kind": "code", "literal": "AsyncHevlayer", "chunkId": "api/introduction#client-fall-through" } ], "sources": [ { "chunkId": "api/introduction#client-fall-through", "url": "/docs/api/introduction#client-fall-through", "anchor": "client-fall-through" } ], "mode": "source-primary", "terms": [ "client", "fall", "through", "python", "turbopuffer", "directly", "gateway", "unreachable", "only", "calls", "satisfiable", "without", "layer", "state", "such", "simple", "vector", "queries", "compatible", "methods", "workflows", "like", "fetches", "warm", "jobs", "pipelines", "udfs", "search", "still", "fail", "fast", "because", "depend", "owned", "fallback", "emits", "warning", "disabled", "reported", "perf" ] }, { "id": "api/introduction#compatibility-posture", "kind": "section", "title": "Introduction", "heading": "Compatibility posture", "group": "API", "url": "/docs/api/introduction#compatibility-posture", "summary": "Layer aims to be a drop-in for existing Turbopuffer clients; routes the upstream does not implement are namespaced separately so they never shadow upstream behavior, and a request to a route Layer does not proxy returns a 404 rather than silently re-routing.", "facts": [ { "kind": "code", "literal": "/v2/", "chunkId": "api/introduction#compatibility-posture" } ], "sources": [ { "chunkId": "api/introduction#compatibility-posture", "url": "/docs/api/introduction#compatibility-posture", "anchor": "compatibility-posture" } ], "mode": "source-primary", "terms": [ "compatibility", "posture", "layer", "aims", "drop", "existing", "turbopuffer", "clients", "routes", "upstream", "does", "implement", "namespaced", "separately", "never", "shadow", "behavior", "request", "route", "proxy", "returns", "rather", "silently", "routing", "under", "client", "sends", "doesn", "gateway", "might", "handle", "differently" ] }, { "id": "api/introduction#cross-cutting-conventions", "kind": "section", "title": "Introduction", "heading": "Cross-cutting conventions", "group": "API", "url": "/docs/api/introduction#cross-cutting-conventions", "summary": "Conventions apply to every proxied route: every write is server-stamped with an epoch-ms watermark attribute, the reserved attribute prefix is read-only to callers, Turbopuffer write/query failures are hard 5xx while cache failures are soft and never block, a cache header distinguishes hit/miss/miss-on-error, and reads through the watermark path report their freshness cut.", "facts": [ { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "_hevlayer_*", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "_hevlayer_", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "x-layer-cache", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "hit", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "miss", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "miss-on-error", "chunkId": "api/introduction#cross-cutting-conventions" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/introduction#cross-cutting-conventions" } ], "sources": [ { "chunkId": "api/introduction#cross-cutting-conventions", "url": "/docs/api/introduction#cross-cutting-conventions", "anchor": "cross-cutting-conventions" } ], "mode": "source-primary", "terms": [ "cross", "cutting", "conventions", "apply", "every", "proxied", "route", "write", "server", "stamped", "epoch", "watermark", "attribute", "reserved", "prefix", "read", "only", "callers", "turbopuffer", "query", "failures", "hard", "while", "cache", "soft", "never", "block", "header", "distinguishes", "miss", "error", "reads", "through", "path", "report", "their", "freshness", "hevlayer", "upserted", "layer" ] }, { "id": "api/introduction#enhancements-to-upstream-routes", "kind": "section", "title": "Introduction", "heading": "Enhancements to upstream routes", "group": "API", "url": "/docs/api/introduction#enhancements-to-upstream-routes", "summary": "Introduces the section listing the upstream-compatible routes whose bodies describe only the Layer overlay on top of each.", "facts": [], "sources": [ { "chunkId": "api/introduction#enhancements-to-upstream-routes", "url": "/docs/api/introduction#enhancements-to-upstream-routes", "anchor": "enhancements-to-upstream-routes" } ], "mode": "source-primary", "terms": [ "enhancements", "upstream", "routes", "introduces", "section", "listing", "compatible", "whose", "bodies", "describe", "only", "layer", "overlay", "below", "wire", "turbopuffer", "body", "describes", "overlays" ] }, { "id": "api/introduction#install", "kind": "section", "title": "Introduction", "heading": "Install", "group": "API", "url": "/docs/api/introduction#install", "summary": "The Python SDK is generated from the gateway OpenAPI spec, ships a typed async client, requires a recent Python, and reads gateway URL/key plus optional direct-fallback Turbopuffer connection info from environment variables. Other languages are generated on demand.", "facts": [ { "kind": "code", "literal": "pip install hevlayer", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "apps/layer-gateway/openapi.yaml", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "AsyncHevlayer", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "LAYER_GATEWAY_URL", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "LAYER_GATEWAY_API_KEY", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "TURBOPUFFER_API_KEY", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "TURBOPUFFER_API_URL", "chunkId": "api/introduction#install" }, { "kind": "code", "literal": "https://aws-us-east-1.turbopuffer.com", "chunkId": "api/introduction#install" }, { "kind": "value", "literal": "3.11", "chunkId": "api/introduction#install" } ], "sources": [ { "chunkId": "api/introduction#install", "url": "/docs/api/introduction#install", "anchor": "install" } ], "mode": "source-primary", "terms": [ "install", "python", "generated", "gateway", "openapi", "spec", "ships", "typed", "async", "client", "requires", "recent", "reads", "plus", "optional", "direct", "fallback", "turbopuffer", "connection", "info", "environment", "variables", "other", "languages", "demand", "hevlayer", "apps", "layer", "yaml", "asynchevlayer", "https", "east", "variable", "purpose", "layergatewayurl", "base", "layergatewayapikey", "sent", "every", "request" ] }, { "id": "api/introduction#metadata--get-v2namespacesnsmetadata", "kind": "section", "title": "Introduction", "heading": "Metadata — GET /v2/namespaces/{ns}/metadata", "group": "API", "url": "/docs/api/introduction#metadata--get-v2namespacesnsmetadata", "summary": "The namespace metadata route proxies the upstream schema, row count, index status, and timestamps verbatim, then enriches the response with a layer block carrying the freshness watermark and stability flag.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/{ns}/metadata", "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata" }, { "kind": "code", "literal": "layer", "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata" }, { "kind": "code", "literal": "is_stable", "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata" } ], "sources": [ { "chunkId": "api/introduction#metadata--get-v2namespacesnsmetadata", "url": "/docs/api/introduction#metadata--get-v2namespacesnsmetadata", "anchor": "metadata--get-v2namespacesnsmetadata" } ], "mode": "source-primary", "terms": [ "metadata", "namespaces", "namespace", "route", "proxies", "upstream", "schema", "count", "index", "status", "timestamps", "verbatim", "enriches", "response", "layer", "block", "carrying", "freshness", "watermark", "stability", "flag", "stable", "turbopuffer", "contract", "proxied", "enriched", "containing", "stableasof", "isstable", "page" ] }, { "id": "api/introduction#query--post-v2namespacesnsquery", "kind": "section", "title": "Introduction", "heading": "Query — POST /v2/namespaces/{ns}/query", "group": "API", "url": "/docs/api/introduction#query--post-v2namespacesnsquery", "summary": "The query route is upstream-compatible and adds strong-consistent reads via an injected watermark predicate while the index is updating, a one-shot retry with the filter forced on for queries racing a write storm, and a freshness timestamp echoed on every response.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}/query", "chunkId": "api/introduction#query--post-v2namespacesnsquery" }, { "kind": "code", "literal": "_hevlayer_upserted_at <= watermark", "chunkId": "api/introduction#query--post-v2namespacesnsquery" }, { "kind": "code", "literal": "updating", "chunkId": "api/introduction#query--post-v2namespacesnsquery" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/introduction#query--post-v2namespacesnsquery" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/introduction#query--post-v2namespacesnsquery" } ], "sources": [ { "chunkId": "api/introduction#query--post-v2namespacesnsquery", "url": "/docs/api/introduction#query--post-v2namespacesnsquery", "anchor": "query--post-v2namespacesnsquery" } ], "mode": "source-primary", "terms": [ "query", "post", "namespaces", "route", "upstream", "compatible", "adds", "strong", "consistent", "reads", "injected", "watermark", "predicate", "while", "index", "updating", "shot", "retry", "filter", "forced", "queries", "racing", "write", "storm", "freshness", "timestamp", "echoed", "every", "response", "hevlayer", "upserted", "stable", "turbopuffer", "contract", "vector", "request", "shape", "ranking", "filters", "attribute" ] }, { "id": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns", "kind": "section", "title": "Introduction", "heading": "Write — POST /v2/namespaces/{ns} and PATCH /v2/namespaces/{ns}", "group": "API", "url": "/docs/api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns", "summary": "The write and patch routes add a best-effort NVMe cache mirror before the upstream write, a server-stamped watermark attribute on every upsert and patch that powers query consistency, and rejection of writes to the reserved attribute prefix.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" }, { "kind": "code", "literal": "PATCH /v2/namespaces/{ns}", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" }, { "kind": "code", "literal": "patch_rows", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" }, { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" }, { "kind": "code", "literal": "_hevlayer_*", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns" } ], "sources": [ { "chunkId": "api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns", "url": "/docs/api/introduction#write--post-v2namespacesns-and-patch-v2namespacesns", "anchor": "write--post-v2namespacesns-and-patch-v2namespacesns" } ], "mode": "source-primary", "terms": [ "write", "post", "namespaces", "patch", "routes", "best", "effort", "nvme", "cache", "mirror", "before", "upstream", "server", "stamped", "watermark", "attribute", "every", "upsert", "powers", "query", "consistency", "rejection", "writes", "reserved", "prefix", "rows", "hevlayer", "upserted", "turbopuffer", "contract", "delete", "patchrows", "hevlayerupsertedat", "path", "attributes", "rejected", "page" ] }, { "id": "api/metrics", "kind": "section", "title": "Metrics API", "heading": null, "group": "API", "url": "/docs/api/metrics", "summary": "The gateway exposes a Prometheus-shaped metrics surface plus passthrough routes to a bundled VictoriaMetrics so callers can run PromQL without a separate scraper, and a self-describing catalog of every emitted metric backs both the dashboard's observe tab and external automation.", "facts": [ { "kind": "code", "literal": "vmsingle", "chunkId": "api/metrics" } ], "sources": [ { "chunkId": "api/metrics", "url": "/docs/api/metrics", "anchor": null } ], "mode": "source-primary", "terms": [ "gateway", "exposes", "prometheus", "shaped", "metrics", "surface", "plus", "passthrough", "routes", "bundled", "victoriametrics", "callers", "promql", "without", "separate", "scraper", "self", "describing", "catalog", "every", "emitted", "metric", "backs", "both", "dashboard", "observe", "external", "automation", "vmsingle", "exposition", "endpoint", "instance", "emits", "definitions", "label", "conventions", "example", "live", "below" ] }, { "id": "api/metrics#health", "kind": "section", "title": "Metrics API", "heading": "Health", "group": "API", "url": "/docs/api/metrics#health", "summary": "The health route always returns 200 while the process is up and reports version, cache backing connection state, and per-namespace cache state, which the dashboard reads for degradation signals like a cold or disconnected cache after a restart.", "facts": [ { "kind": "code", "literal": "GET /health", "chunkId": "api/metrics#health" }, { "kind": "code", "literal": "{\n \"status\": \"ok\",\n \"version\": \"0.1.0\",\n \"aerospike\": {\n \"connected\": true,\n \"generation\": 3\n },\n \"cache_state\": [\n {\"namespace\": \"products\", \"state\": \"warm\", \"warmed_through\": 1747300000123, \"warm_inflight\": false}\n ]\n}", "chunkId": "api/metrics#health" }, { "kind": "code", "literal": "200", "chunkId": "api/metrics#health" }, { "kind": "code", "literal": "aerospike.connected", "chunkId": "api/metrics#health" }, { "kind": "code", "literal": "cache_state[].state", "chunkId": "api/metrics#health" } ], "sources": [ { "chunkId": "api/metrics#health", "url": "/docs/api/metrics#health", "anchor": "health" } ], "mode": "source-primary", "terms": [ "health", "route", "always", "returns", "while", "process", "reports", "version", "cache", "backing", "connection", "state", "namespace", "dashboard", "reads", "degradation", "signals", "like", "cold", "disconnected", "after", "restart", "status", "aerospike", "connected", "true", "generation", "products", "warm", "warmed", "through", "1747300000123", "inflight", "false", "cachestate", "warmedthrough", "warminflight", "responds", "glance", "cards" ] }, { "id": "api/metrics#metrics-catalog", "kind": "section", "title": "Metrics API", "heading": "Metrics catalog", "group": "API", "url": "/docs/api/metrics#metrics-catalog", "summary": "The metrics catalog is an operator-facing manifest of every emitted metric, each entry carrying name, kind, family, labels, description, example PromQL, and any alert shape it backs, with a version that bumps on incompatible shape changes. The dashboard groups entries by family, and the same content is exportable from the repo.", "facts": [ { "kind": "code", "literal": "GET /v2/metrics/catalog", "chunkId": "api/metrics#metrics-catalog" }, { "kind": "code", "literal": "version", "chunkId": "api/metrics#metrics-catalog" }, { "kind": "code", "literal": "cargo run -p metrics-catalog --bin export", "chunkId": "api/metrics#metrics-catalog" } ], "sources": [ { "chunkId": "api/metrics#metrics-catalog", "url": "/docs/api/metrics#metrics-catalog", "anchor": "metrics-catalog" } ], "mode": "source-primary", "terms": [ "metrics", "catalog", "operator", "facing", "manifest", "every", "emitted", "metric", "entry", "carrying", "name", "kind", "family", "labels", "description", "example", "promql", "alert", "shape", "backs", "version", "bumps", "incompatible", "changes", "dashboard", "groups", "entries", "same", "content", "exportable", "repo", "cargo", "export", "gateway", "emits", "carries", "histogram", "counter", "gauge", "applicable" ] }, { "id": "api/metrics#promql-passthrough", "kind": "section", "title": "Metrics API", "heading": "PromQL passthrough", "group": "API", "url": "/docs/api/metrics#promql-passthrough", "summary": "The metrics query routes are thin, non-rewriting passthroughs to VictoriaMetrics whose response bodies match Prometheus's HTTP API one-for-one, with short-form aliases for ergonomic terminal use; auth happens at the gateway edge and the upstream metrics instance is never customer-reachable.", "facts": [ { "kind": "code", "literal": "curl -sG \"$LAYER_GATEWAY_URL/v2/metrics/query\" \\\n --data-urlencode 'query=sum(layer_pipeline_stage_count{stage=\"pending\"})'", "chunkId": "api/metrics#promql-passthrough" }, { "kind": "code", "literal": "/v2/metrics/api/v1/query", "chunkId": "api/metrics#promql-passthrough" }, { "kind": "code", "literal": "query_range", "chunkId": "api/metrics#promql-passthrough" }, { "kind": "code", "literal": "/v2/metrics/query", "chunkId": "api/metrics#promql-passthrough" } ], "sources": [ { "chunkId": "api/metrics#promql-passthrough", "url": "/docs/api/metrics#promql-passthrough", "anchor": "promql-passthrough" } ], "mode": "source-primary", "terms": [ "promql", "passthrough", "metrics", "query", "routes", "thin", "rewriting", "passthroughs", "victoriametrics", "whose", "response", "bodies", "match", "prometheus", "http", "short", "form", "aliases", "ergonomic", "terminal", "auth", "happens", "gateway", "edge", "upstream", "instance", "never", "customer", "reachable", "curl", "layer", "data", "urlencode", "pipeline", "stage", "count", "pending", "range", "queryrange", "shape" ] }, { "id": "api/metrics#routes", "kind": "section", "title": "Metrics API", "heading": "Routes", "group": "API", "url": "/docs/api/metrics#routes", "summary": "Lists the metrics routes: Prometheus exposition, health, instant and range PromQL proxies (full and short-form), the catalog listing, and single catalog-entry fetch.", "facts": [ { "kind": "code", "literal": "GET /metrics", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET /health", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET\\|POST /v2/metrics/api/v1/query", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET\\|POST /v2/metrics/query", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET\\|POST /v2/metrics/api/v1/query_range", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET\\|POST /v2/metrics/query_range", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET /v2/metrics/catalog", "chunkId": "api/metrics#routes" }, { "kind": "code", "literal": "GET /v2/metrics/catalog/{name}", "chunkId": "api/metrics#routes" } ], "sources": [ { "chunkId": "api/metrics#routes", "url": "/docs/api/metrics#routes", "anchor": "routes" } ], "mode": "source-primary", "terms": [ "routes", "lists", "metrics", "prometheus", "exposition", "health", "instant", "range", "promql", "proxies", "full", "short", "form", "catalog", "listing", "single", "entry", "fetch", "post", "query", "name", "route", "behavior", "gateway", "liveness", "nvme", "cache", "connection", "state", "namespace", "proxy", "queryrange", "list", "every", "metric", "emits", "including", "labels", "example" ] }, { "id": "api/namespace-metadata", "kind": "section", "title": "Namespace metadata", "heading": null, "group": "API", "url": "/docs/api/namespace-metadata", "summary": "Namespace metadata is proxied verbatim from the upstream endpoint for schema, row counts, index status, and timestamps, with Layer adding a single freshness sub-object on top.", "facts": [ { "kind": "code", "literal": "/v2/namespaces/{ns}/metadata", "chunkId": "api/namespace-metadata" }, { "kind": "value", "literal": "Upstream.astro", "chunkId": "api/namespace-metadata" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/namespace-metadata" } ], "sources": [ { "chunkId": "api/namespace-metadata", "url": "/docs/api/namespace-metadata", "anchor": null } ], "mode": "source-primary", "terms": [ "namespace", "metadata", "proxied", "verbatim", "upstream", "endpoint", "schema", "counts", "index", "status", "timestamps", "layer", "adding", "single", "freshness", "object", "namespaces", "astro", "turbopuffer", "read", "enriched", "signals", "payload", "follow", "contract", "adds" ] }, { "id": "api/namespace-metadata#list-namespaces", "kind": "section", "title": "Namespace metadata", "heading": "List namespaces", "group": "API", "url": "/docs/api/namespace-metadata#list-namespaces", "summary": "Listing namespaces is a Layer-only augmented, paged listing that enriches each row with freshness and cache signals and backs the dashboard inventory; a per-row metadata failure degrades to an error marker rather than dropping the namespace, and responses come from a short-TTL cache so dashboard polling does not fan out a call per namespace.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces?prefix=prod&page_size=100", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "{\n \"namespaces\": [\n {\n \"name\": \"products\",\n \"row_count\": 12500,\n \"size_bytes\": 48800000,\n \"stable_as_of_ms\": 1715600400000,\n \"is_stable\": true,\n \"cache_state\": {\"state\": \"warm\", \"warm_inflight\": false},\n \"last_write_ms\": 1715600399000,\n \"shadow\": false,\n \"labels\": {}\n }\n ],\n \"next_cursor\": \"...\"\n}", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "GET /v2/namespaces", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "prefix", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "cursor", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "next_cursor", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "page_size", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "metadata_error", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "NAMESPACE_LIST_CACHE_TTL_MS", "chunkId": "api/namespace-metadata#list-namespaces" }, { "kind": "code", "literal": "10000", "chunkId": "api/namespace-metadata#list-namespaces" } ], "sources": [ { "chunkId": "api/namespace-metadata#list-namespaces", "url": "/docs/api/namespace-metadata#list-namespaces", "anchor": "list-namespaces" } ], "mode": "source-primary", "terms": [ "list", "namespaces", "listing", "layer", "only", "augmented", "paged", "enriches", "freshness", "cache", "signals", "backs", "dashboard", "inventory", "metadata", "failure", "degrades", "error", "marker", "rather", "dropping", "namespace", "responses", "come", "short", "polling", "does", "call", "prefix", "prod", "page", "size", "name", "products", "count", "12500", "bytes", "48800000", "stable", "1715600400000" ] }, { "id": "api/namespace-metadata#request", "kind": "section", "title": "Namespace metadata", "heading": "Request", "group": "API", "url": "/docs/api/namespace-metadata#request", "summary": "A metadata request returns the upstream payload (schema, approximate counts, timestamps, index status) plus a Layer enhancement block carrying the freshness watermark and stability flag.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/products/metadata", "chunkId": "api/namespace-metadata#request" }, { "kind": "code", "literal": "{\n // Proxied from Turbopuffer verbatim\n \"schema\": { },\n \"approx_row_count\": 12500,\n \"approx_logical_bytes\": 48800000,\n \"created_at\": \"2026-03-15T10:30:45Z\",\n \"updated_at\": \"2026-05-12T18:49:00Z\",\n \"last_write_at\": \"2026-05-12T18:48:30Z\",\n \"index\": { \"status\": \"up-to-date\" },\n\n // Layer enhancement\n \"layer\": {\n \"stable_as_of\": 1715600400000,\n \"is_stable\": true\n }\n}", "chunkId": "api/namespace-metadata#request" } ], "sources": [ { "chunkId": "api/namespace-metadata#request", "url": "/docs/api/namespace-metadata#request", "anchor": "request" } ], "mode": "source-primary", "terms": [ "request", "metadata", "returns", "upstream", "payload", "schema", "approximate", "counts", "timestamps", "index", "status", "plus", "layer", "enhancement", "block", "carrying", "freshness", "watermark", "stability", "flag", "namespaces", "products", "proxied", "turbopuffer", "verbatim", "approx", "count", "12500", "logical", "bytes", "48800000", "created", "2026", "15t10", "updated", "12t18", "last", "write", "date", "stable" ] }, { "id": "api/namespace-metadata#the-layer-block", "kind": "section", "title": "Namespace metadata", "heading": "The layer block", "group": "API", "url": "/docs/api/namespace-metadata#the-layer-block", "summary": "The layer block exposes the epoch-ms watermark from the most recent stable poll and a boolean for whether that poll observed the index up-to-date; the boolean is the current signal driving the per-query filter-skip decision while the watermark is the historical cut a filtered query would apply. Both are null/false on cold start.", "facts": [ { "kind": "code", "literal": "layer", "chunkId": "api/namespace-metadata#the-layer-block" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/namespace-metadata#the-layer-block" }, { "kind": "code", "literal": "is_stable", "chunkId": "api/namespace-metadata#the-layer-block" }, { "kind": "code", "literal": "index.status == \"up-to-date\"", "chunkId": "api/namespace-metadata#the-layer-block" } ], "sources": [ { "chunkId": "api/namespace-metadata#the-layer-block", "url": "/docs/api/namespace-metadata#the-layer-block", "anchor": "the-layer-block" } ], "mode": "source-primary", "terms": [ "layer", "block", "exposes", "epoch", "watermark", "most", "recent", "stable", "poll", "boolean", "whether", "observed", "index", "date", "current", "signal", "driving", "query", "filter", "skip", "decision", "while", "historical", "filtered", "would", "apply", "both", "null", "false", "cold", "start", "status", "field", "meaning", "stableasof", "before", "watcher", "namespace", "isstable", "true" ] }, { "id": "api/query", "kind": "section", "title": "Query & Fetch", "heading": null, "group": "API", "url": "/docs/api/query", "summary": "Query is wire-compatible with the upstream query endpoint for vector and full-text search, with the documented shape covering what Layer adds on top, alongside a pull-through document fetch by id.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}/query", "chunkId": "api/query" }, { "kind": "value", "literal": "Upstream.astro", "chunkId": "api/query" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/query" } ], "sources": [ { "chunkId": "api/query", "url": "/docs/api/query", "anchor": null } ], "mode": "source-primary", "terms": [ "query", "wire", "compatible", "upstream", "endpoint", "vector", "full", "text", "search", "documented", "shape", "covering", "layer", "adds", "alongside", "pull", "through", "document", "fetch", "post", "namespaces", "astro", "turbopuffer", "similarity", "strong", "consistent", "watermark", "handling", "plus", "request", "schema", "filters", "ranking", "attribute", "selection", "below" ] }, { "id": "api/query#batch-fetch", "kind": "section", "title": "Query & Fetch", "heading": "Batch fetch", "group": "API", "url": "/docs/api/query#batch-fetch", "summary": "Batch fetch takes a list of ids and returns found documents and missing ids inline rather than a partial 404, preserving request order for the found documents and collecting unfound ids separately.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/documents\nContent-Type: application/json\n\n{\n \"ids\": [\"asin-1\", \"asin-2\", \"asin-3\"],\n \"include_attributes\": [\"title\"]\n}", "chunkId": "api/query#batch-fetch" }, { "kind": "code", "literal": "{\n \"documents\": [\n {\"id\": \"asin-1\", \"attributes\": {\"title\": \"...\"}},\n {\"id\": \"asin-3\", \"attributes\": {\"title\": \"...\"}}\n ],\n \"missing\": [\"asin-2\"]\n}", "chunkId": "api/query#batch-fetch" }, { "kind": "code", "literal": "documents", "chunkId": "api/query#batch-fetch" }, { "kind": "code", "literal": "missing", "chunkId": "api/query#batch-fetch" } ], "sources": [ { "chunkId": "api/query#batch-fetch", "url": "/docs/api/query#batch-fetch", "anchor": "batch-fetch" } ], "mode": "source-primary", "terms": [ "batch", "fetch", "takes", "list", "returns", "found", "documents", "missing", "inline", "rather", "partial", "preserving", "request", "order", "collecting", "unfound", "separately", "post", "namespaces", "products", "content", "type", "application", "json", "asin", "include", "attributes", "title", "includeattributes", "instead", "preserves", "gateway", "could", "find", "anywhere", "land" ] }, { "id": "api/query#behavior-matrix", "kind": "section", "title": "Query & Fetch", "heading": "Behavior matrix", "group": "API", "url": "/docs/api/query#behavior-matrix", "summary": "A matrix maps single- and batch-fetch outcomes by cache state: a hit serves cache, a miss with the document present upstream serves upstream and backfills, a miss with no upstream document is a 404 (single) or inline-missing (batch), and an unavailable cache serves upstream with a miss-on-error marker.", "facts": [ { "kind": "code", "literal": "missing", "chunkId": "api/query#behavior-matrix" }, { "kind": "code", "literal": "miss-on-error", "chunkId": "api/query#behavior-matrix" } ], "sources": [ { "chunkId": "api/query#behavior-matrix", "url": "/docs/api/query#behavior-matrix", "anchor": "behavior-matrix" } ], "mode": "source-primary", "terms": [ "behavior", "matrix", "maps", "single", "batch", "fetch", "outcomes", "cache", "state", "serves", "miss", "document", "present", "upstream", "backfills", "inline", "missing", "unavailable", "error", "marker", "backfill", "absent" ] }, { "id": "api/query#explain-query", "kind": "section", "title": "Query & Fetch", "heading": "Explain query", "group": "API", "url": "/docs/api/query#explain-query", "summary": "Explain query is proxied verbatim with no Layer overlay and no watermark filter, useful for inspecting upstream query planning per the upstream docs.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/explain_query", "chunkId": "api/query#explain-query" }, { "kind": "code", "literal": "explain_query", "chunkId": "api/query#explain-query" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/query#explain-query" } ], "sources": [ { "chunkId": "api/query#explain-query", "url": "/docs/api/query#explain-query", "anchor": "explain-query" } ], "mode": "source-primary", "terms": [ "explain", "query", "proxied", "verbatim", "layer", "overlay", "watermark", "filter", "useful", "inspecting", "upstream", "planning", "docs", "post", "namespaces", "products", "turbopuffer", "explainquery", "adds", "nothing", "applies", "inspect", "request", "response", "shape" ] }, { "id": "api/query#fetch", "kind": "section", "title": "Query & Fetch", "heading": "Fetch", "group": "API", "url": "/docs/api/query#fetch", "summary": "Fetch is a Layer-only surface with no upstream equivalent: the NVMe cache is checked first, and on miss or error the gateway falls through to Turbopuffer and backfills the cache best-effort.", "facts": [], "sources": [ { "chunkId": "api/query#fetch", "url": "/docs/api/query#fetch", "anchor": "fetch" } ], "mode": "source-primary", "terms": [ "fetch", "layer", "only", "surface", "upstream", "equivalent", "nvme", "cache", "checked", "first", "miss", "error", "gateway", "falls", "through", "turbopuffer", "backfills", "best", "effort", "there" ] }, { "id": "api/query#filter-shape", "kind": "section", "title": "Query & Fetch", "heading": "Filter shape", "group": "API", "url": "/docs/api/query#filter-shape", "summary": "Filters follow the upstream array syntax with leaf, conjunction, and disjunction forms, and Layer automatically combines the caller's filter with the watermark predicate so callers never see the reserved upsert-time attribute in their request or response.", "facts": [ { "kind": "code", "literal": "[\"category\", \"Eq\", \"Electronics\"] # leaf\n[\"And\", [[\"category\", \"Eq\", \"Electronics\"],\n [\"price\", \"Lte\", 200]]] # conjunction\n[\"Or\", [...]] # disjunction", "chunkId": "api/query#filter-shape" }, { "kind": "code", "literal": "And", "chunkId": "api/query#filter-shape" }, { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "api/query#filter-shape" } ], "sources": [ { "chunkId": "api/query#filter-shape", "url": "/docs/api/query#filter-shape", "anchor": "filter-shape" } ], "mode": "source-primary", "terms": [ "filter", "shape", "filters", "follow", "upstream", "array", "syntax", "leaf", "conjunction", "disjunction", "forms", "layer", "automatically", "combines", "caller", "watermark", "predicate", "callers", "never", "reserved", "upsert", "time", "attribute", "their", "request", "response", "category", "electronics", "price", "hevlayer", "upserted", "follows", "turbopuffer", "element", "hevlayerupsertedat" ] }, { "id": "api/query#query-request", "kind": "section", "title": "Query & Fetch", "heading": "Query request", "group": "API", "url": "/docs/api/query#query-request", "summary": "A query request posts a vector, top-k, filters, and selected attributes and returns ranked results with id, distance, and attributes, plus the freshness timestamp of the served response.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/query\nContent-Type: application/json\n\n{\n \"vector\": [0.0012, -0.043],\n \"top_k\": 10,\n \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n \"include_attributes\": [\"title\", \"category\"]\n}", "chunkId": "api/query#query-request" }, { "kind": "code", "literal": "{\n \"results\": [\n {\"id\": \"asin-B08N5WRWNW\", \"dist\": 0.42, \"attributes\": {\"title\": \"...\"}}\n ],\n \"stable_as_of\": 1715600400000\n}", "chunkId": "api/query#query-request" } ], "sources": [ { "chunkId": "api/query#query-request", "url": "/docs/api/query#query-request", "anchor": "query-request" } ], "mode": "source-primary", "terms": [ "query", "request", "posts", "vector", "filters", "selected", "attributes", "returns", "ranked", "results", "distance", "plus", "freshness", "timestamp", "served", "response", "post", "namespaces", "products", "content", "type", "application", "json", "0012", "category", "electronics", "include", "title", "asin", "b08n5wrwnw", "dist", "stable", "1715600400000", "topk", "includeattributes", "stableasof" ] }, { "id": "api/query#single-fetch", "kind": "section", "title": "Query & Fetch", "heading": "Single fetch", "group": "API", "url": "/docs/api/query#single-fetch", "summary": "Single fetch returns 200 with a cache header indicating hit, miss-with-backfill, or miss-on-error depending on cache and upstream state, and a 404 only when the document is absent from both layers.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category", "chunkId": "api/query#single-fetch" }, { "kind": "code", "literal": "x-layer-cache: hit", "chunkId": "api/query#single-fetch" }, { "kind": "code", "literal": "x-layer-cache: miss", "chunkId": "api/query#single-fetch" }, { "kind": "code", "literal": "x-layer-cache: miss-on-error", "chunkId": "api/query#single-fetch" } ], "sources": [ { "chunkId": "api/query#single-fetch", "url": "/docs/api/query#single-fetch", "anchor": "single-fetch" } ], "mode": "source-primary", "terms": [ "single", "fetch", "returns", "cache", "header", "indicating", "miss", "backfill", "error", "depending", "upstream", "state", "only", "document", "absent", "both", "layers", "namespaces", "products", "documents", "asin", "b08n5wrwnw", "include", "attributes", "title", "category", "layer", "includeattributes", "outcome", "status", "cached", "backfilled", "unavailable", "missing" ] }, { "id": "api/query#strong-consistent-reads", "kind": "section", "title": "Query & Fetch", "heading": "Strong-consistent reads", "group": "API", "url": "/docs/api/query#strong-consistent-reads", "summary": "Because the upstream indexes upserts asynchronously, a naive read after an upsert can be partial or rate-limited under write pressure; Layer runs queries at eventual consistency upstream, polls each namespace's index status to record a watermark, and per query injects a hidden upsert-time-bounded predicate only while the index is updating, retrying once with the filter forced on after a rate-limit. Every response reports the most recent watermark, omitted only on a cold-start gateway.", "facts": [ { "kind": "code", "literal": "consistency=eventual", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "index.status", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "poll_start - safety_margin", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "Updating", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "_hevlayer_upserted_at <= watermark", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "Stable", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "Unknown", "chunkId": "api/query#strong-consistent-reads" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/query#strong-consistent-reads" } ], "sources": [ { "chunkId": "api/query#strong-consistent-reads", "url": "/docs/api/query#strong-consistent-reads", "anchor": "strong-consistent-reads" } ], "mode": "source-primary", "terms": [ "strong", "consistent", "reads", "because", "upstream", "indexes", "upserts", "asynchronously", "naive", "read", "after", "upsert", "partial", "rate", "limited", "under", "write", "pressure", "layer", "runs", "queries", "eventual", "consistency", "polls", "namespace", "index", "status", "record", "watermark", "query", "injects", "hidden", "time", "bounded", "predicate", "only", "while", "updating", "retrying", "once" ] }, { "id": "api/query#tunables", "kind": "section", "title": "Query & Fetch", "heading": "Tunables", "group": "API", "url": "/docs/api/query#tunables", "summary": "Two environment tunables control consistency: how often the watcher polls each namespace, and the cushion between poll time and the recorded watermark to cover in-flight upserts.", "facts": [ { "kind": "code", "literal": "CONSISTENCY_POLL_INTERVAL_MS", "chunkId": "api/query#tunables" }, { "kind": "code", "literal": "CONSISTENCY_SAFETY_MARGIN_MS", "chunkId": "api/query#tunables" } ], "sources": [ { "chunkId": "api/query#tunables", "url": "/docs/api/query#tunables", "anchor": "tunables" } ], "mode": "source-primary", "terms": [ "tunables", "environment", "control", "consistency", "often", "watcher", "polls", "namespace", "cushion", "between", "poll", "time", "recorded", "watermark", "cover", "flight", "upserts", "interval", "safety", "margin", "variable", "default", "purpose", "consistencypollintervalms", "1000", "consistencysafetymarginms" ] }, { "id": "api/result-count", "kind": "section", "title": "Result Count", "heading": null, "group": "API", "url": "/docs/api/result-count", "summary": "Result count answers how many rows match a ranked FTS or vector query, distinct from scan count which counts rows matching a plain filter. It supports a bounded single-pass mode and an exhaustive recursive mode, carries a request deadline with a server-side maximum, and on timeout returns the partial count flagged as bounded and timed out.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/result-count\nContent-Type: application/json\n\n{\n \"query\": {\"field\": \"title\", \"fts\": \"wireless headphones\"},\n \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n \"mode\": \"bounded\",\n \"timeout_seconds\": 30\n}", "chunkId": "api/result-count" }, { "kind": "code", "literal": "{\n \"count\": 4210,\n \"bounded\": false,\n \"timed_out\": false,\n \"shards_saturated\": 0,\n \"shards_total\": 1,\n \"elapsed_ms\": 42\n}", "chunkId": "api/result-count" }, { "kind": "code", "literal": "field", "chunkId": "api/result-count" }, { "kind": "code", "literal": "fts", "chunkId": "api/result-count" }, { "kind": "code", "literal": "vector", "chunkId": "api/result-count" }, { "kind": "code", "literal": "max_distance", "chunkId": "api/result-count" }, { "kind": "code", "literal": "bounded", "chunkId": "api/result-count" }, { "kind": "code", "literal": "top_k", "chunkId": "api/result-count" }, { "kind": "code", "literal": "exhaustive", "chunkId": "api/result-count" }, { "kind": "code", "literal": "bounded: true", "chunkId": "api/result-count" }, { "kind": "code", "literal": "timed_out: true", "chunkId": "api/result-count" } ], "sources": [ { "chunkId": "api/result-count", "url": "/docs/api/result-count", "anchor": null } ], "mode": "source-primary", "terms": [ "result", "count", "answers", "many", "rows", "match", "ranked", "vector", "query", "distinct", "scan", "counts", "matching", "plain", "filter", "supports", "bounded", "single", "pass", "mode", "exhaustive", "recursive", "carries", "request", "deadline", "server", "side", "maximum", "timeout", "returns", "partial", "flagged", "timed", "post", "namespaces", "products", "content", "type", "application", "json" ] }, { "id": "api/scans", "kind": "section", "title": "Scan", "heading": null, "group": "API", "url": "/docs/api/scans", "summary": "Scans iterate a namespace by filter: ID mode creates an asynchronous job that returns matching IDs through a results route, while count mode returns a single number synchronously.", "facts": [ { "kind": "code", "literal": "mode: ids", "chunkId": "api/scans" }, { "kind": "code", "literal": "mode: count", "chunkId": "api/scans" } ], "sources": [ { "chunkId": "api/scans", "url": "/docs/api/scans", "anchor": null } ], "mode": "source-primary", "terms": [ "scans", "iterate", "namespace", "filter", "mode", "creates", "asynchronous", "returns", "matching", "through", "results", "route", "while", "count", "single", "number", "synchronously" ] }, { "id": "api/scans#auto-mode-policy", "kind": "section", "title": "Scan", "heading": "Auto-Mode Policy", "group": "API", "url": "/docs/api/scans#auto-mode-policy", "summary": "Auto mode ties cache freshness to the consistency watermark by tracking a per-namespace warmed-through marker; depending on whether the cache is empty, populated and fresh, or populated but stale, the gateway runs origin, serves cache, or serves cache while starting a background warm. When cache is used it adds a warmed-through upper-bound predicate so the scan is a stable warmed view.", "facts": [ { "kind": "code", "literal": "cache_warmed_through", "chunkId": "api/scans#auto-mode-policy" }, { "kind": "code", "literal": "cache_warmed_through >= watermark", "chunkId": "api/scans#auto-mode-policy" }, { "kind": "code", "literal": "cache_warmed_through < watermark", "chunkId": "api/scans#auto-mode-policy" }, { "kind": "code", "literal": "_hevlayer_upserted_at <= cache_warmed_through", "chunkId": "api/scans#auto-mode-policy" } ], "sources": [ { "chunkId": "api/scans#auto-mode-policy", "url": "/docs/api/scans#auto-mode-policy", "anchor": "auto-mode-policy" } ], "mode": "source-primary", "terms": [ "auto", "mode", "policy", "ties", "cache", "freshness", "consistency", "watermark", "tracking", "namespace", "warmed", "through", "marker", "depending", "whether", "empty", "populated", "fresh", "stale", "gateway", "runs", "origin", "serves", "while", "starting", "background", "warm", "adds", "upper", "bound", "predicate", "scan", "stable", "view", "hevlayer", "upserted", "same", "strong", "consistent", "queries" ] }, { "id": "api/scans#count-mode", "kind": "section", "title": "Scan", "heading": "Count Mode", "group": "API", "url": "/docs/api/scans#count-mode", "summary": "Count mode posts a filter and a source and returns a single count with the serving source and timing; snapshot reads are eligible only for a single leaf equality/membership filter on a field present in the latest snapshot, and unsupported filters fall through under auto or fail with a precondition error under an explicit snapshot source. Live count responses add bounded, timed-out, and shard fields.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/scans\nContent-Type: application/json\n\n{\n \"mode\": \"count\",\n \"source\": \"auto\",\n \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n \"timeout_seconds\": 30\n}", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "{\n \"count\": 4210,\n \"served_by\": \"snapshot\",\n \"snapshot_sha\": \"3f9e8b21\",\n \"watermark_ms\": 1747300000123,\n \"elapsed_ms\": 3\n}", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "{\n \"count\": 4210,\n \"served_by\": \"origin\",\n \"bounded\": false,\n \"timed_out\": false,\n \"shards_saturated\": 0,\n \"shards_total\": 1,\n \"elapsed_ms\": 42\n}", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "auto", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "snapshot", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "cache", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "origin", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "Eq", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "In", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "fields[]", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "And", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "Or", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "Not", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "412 precondition_failed", "chunkId": "api/scans#count-mode" }, { "kind": "code", "literal": "source: snapshot", "chunkId": "api/scans#count-mode" } ], "sources": [ { "chunkId": "api/scans#count-mode", "url": "/docs/api/scans#count-mode", "anchor": "count-mode" } ], "mode": "source-primary", "terms": [ "count", "mode", "posts", "filter", "source", "returns", "single", "serving", "timing", "snapshot", "reads", "eligible", "only", "leaf", "equality", "membership", "field", "present", "latest", "unsupported", "filters", "fall", "through", "under", "auto", "fail", "precondition", "error", "explicit", "live", "responses", "bounded", "timed", "shard", "fields", "post", "namespaces", "products", "scans", "content" ] }, { "id": "api/scans#id-mode", "kind": "section", "title": "Scan", "heading": "ID Mode", "group": "API", "url": "/docs/api/scans#id-mode", "summary": "ID mode posts a filter and returns an accepted job; once the job reports completed, the matching IDs are read paginated from a results route. Valid sources are auto, cache, and origin.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/scans\nContent-Type: application/json\n\n{\n \"source\": \"auto\",\n \"mode\": \"ids\",\n \"filters\": [\"category\", \"Eq\", \"Electronics\"],\n \"page_size\": 1000\n}", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "{\n \"id\": \"scan-uuid\",\n \"namespace\": \"products\",\n \"source\": \"auto\",\n \"effective_source\": \"origin\",\n \"status\": \"running\",\n \"progress\": 0,\n \"documents_scanned\": 0,\n \"created_at\": \"2026-05-26T10:00:00Z\"\n}", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "GET /v2/namespaces/products/scans/scan-uuid/results?limit=1000&offset=0", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "{\n \"ids\": [\"doc-1\", \"doc-2\"],\n \"total\": 2\n}", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "mode", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "ids", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "auto", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "cache", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "origin", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "202 Accepted", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "status", "chunkId": "api/scans#id-mode" }, { "kind": "code", "literal": "completed", "chunkId": "api/scans#id-mode" } ], "sources": [ { "chunkId": "api/scans#id-mode", "url": "/docs/api/scans#id-mode", "anchor": "id-mode" } ], "mode": "source-primary", "terms": [ "mode", "posts", "filter", "returns", "accepted", "once", "reports", "completed", "matching", "read", "paginated", "results", "route", "valid", "sources", "auto", "cache", "origin", "post", "namespaces", "products", "scans", "content", "type", "application", "json", "source", "filters", "category", "electronics", "page", "size", "1000", "scan", "uuid", "namespace", "effective", "status", "running", "progress" ] }, { "id": "api/scans#routes", "kind": "section", "title": "Scan", "heading": "Routes", "group": "API", "url": "/docs/api/scans#routes", "summary": "Lists the scan routes: create an ID job or return a count, list jobs, read one job, read completed results, and drop the in-memory job.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}/scans", "chunkId": "api/scans#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/scans", "chunkId": "api/scans#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/scans/{id}", "chunkId": "api/scans#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/scans/{id}/results", "chunkId": "api/scans#routes" }, { "kind": "code", "literal": "DELETE /v2/namespaces/{ns}/scans/{id}", "chunkId": "api/scans#routes" } ], "sources": [ { "chunkId": "api/scans#routes", "url": "/docs/api/scans#routes", "anchor": "routes" } ], "mode": "source-primary", "terms": [ "routes", "lists", "scan", "create", "return", "count", "list", "jobs", "read", "completed", "results", "drop", "memory", "post", "namespaces", "scans", "delete", "route", "method", "behavior", "namespace" ] }, { "id": "api/search-history", "kind": "section", "title": "Query History", "heading": null, "group": "API", "url": "/docs/api/search-history", "summary": "Layer logs every served query into a durable per-namespace JSONL trail in S3 mirrored to NVMe for fast recent reads, and records fetch events that consumers tag back to a query in a sibling clickstream feed, making a search session reconstructable for relevance tuning, A/B comparison, or incident review. Both surfaces are Layer-only.", "facts": [], "sources": [ { "chunkId": "api/search-history", "url": "/docs/api/search-history", "anchor": null } ], "mode": "source-primary", "terms": [ "layer", "logs", "every", "served", "query", "durable", "namespace", "jsonl", "trail", "mirrored", "nvme", "fast", "recent", "reads", "records", "fetch", "events", "consumers", "back", "sibling", "clickstream", "feed", "making", "search", "session", "reconstructable", "relevance", "tuning", "comparison", "incident", "review", "both", "surfaces", "only", "history", "backed", "gateway", "serves", "cache", "downstream" ] }, { "id": "api/search-history#clickstream-entry", "kind": "section", "title": "Query History", "heading": "Clickstream entry", "group": "API", "url": "/docs/api/search-history#clickstream-entry", "summary": "A clickstream entry records timestamps, a trace id joining it to the originating search-history entry, namespace, document id, tags, source, and whether the result was served from cache or an upstream fetch; the trace id is queryable to pull every event for a session.", "facts": [ { "kind": "code", "literal": "{\n \"events\": [\n {\n \"timestamp\": \"2026-05-22T08:00:02.143Z\",\n \"timestamp_nanos\": 1747900802143000000,\n \"trace_id\": \"f81d4fae-7dec-11d0-a765-00a0c91e6bf6\",\n \"namespace\": \"products\",\n \"doc_id\": \"asin-B08N5WRWNW\",\n \"tags\": [\"session:abc123\"],\n \"source\": \"fetch\",\n \"served_from\": \"cache\"\n }\n ],\n \"next_cursor\": \"1747900802142000000\"\n}", "chunkId": "api/search-history#clickstream-entry" }, { "kind": "code", "literal": "trace_id", "chunkId": "api/search-history#clickstream-entry" }, { "kind": "code", "literal": "served_from", "chunkId": "api/search-history#clickstream-entry" } ], "sources": [ { "chunkId": "api/search-history#clickstream-entry", "url": "/docs/api/search-history#clickstream-entry", "anchor": "clickstream-entry" } ], "mode": "source-primary", "terms": [ "clickstream", "entry", "records", "timestamps", "trace", "joining", "originating", "search", "history", "namespace", "document", "tags", "source", "whether", "result", "served", "cache", "upstream", "fetch", "queryable", "pull", "every", "event", "session", "events", "timestamp", "2026", "22t08", "143z", "nanos", "1747900802143000000", "f81d4fae", "7dec", "11d0", "a765", "00a0c91e6bf6", "products", "asin", "b08n5wrwnw", "abc123" ] }, { "id": "api/search-history#query-parameters", "kind": "section", "title": "Query History", "heading": "Query parameters", "group": "API", "url": "/docs/api/search-history#query-parameters", "summary": "History list calls accept a comma-separated tag filter with AND semantics, RFC3339 from/to time bounds, a pagination cursor returning entries strictly older than a given timestamp, and a capped limit.", "facts": [ { "kind": "code", "literal": "tag", "chunkId": "api/search-history#query-parameters" }, { "kind": "code", "literal": "from", "chunkId": "api/search-history#query-parameters" }, { "kind": "code", "literal": "to", "chunkId": "api/search-history#query-parameters" }, { "kind": "code", "literal": "before", "chunkId": "api/search-history#query-parameters" }, { "kind": "code", "literal": "timestamp_nanos", "chunkId": "api/search-history#query-parameters" }, { "kind": "code", "literal": "limit", "chunkId": "api/search-history#query-parameters" } ], "sources": [ { "chunkId": "api/search-history#query-parameters", "url": "/docs/api/search-history#query-parameters", "anchor": "query-parameters" } ], "mode": "source-primary", "terms": [ "query", "parameters", "history", "list", "calls", "accept", "comma", "separated", "filter", "semantics", "rfc3339", "time", "bounds", "pagination", "cursor", "returning", "entries", "strictly", "older", "given", "timestamp", "capped", "limit", "before", "nanos", "param", "purpose", "every", "must", "match", "return", "timestampnanos", "default" ] }, { "id": "api/search-history#routes", "kind": "section", "title": "Query History", "heading": "Routes", "group": "API", "url": "/docs/api/search-history#routes", "summary": "Two routes return the per-namespace query log and the correlated clickstream feed, both newest-first, with versioned aliases held for client compatibility.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/{ns}/search-history", "chunkId": "api/search-history#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/clickstream", "chunkId": "api/search-history#routes" }, { "kind": "code", "literal": "/v1/", "chunkId": "api/search-history#routes" } ], "sources": [ { "chunkId": "api/search-history#routes", "url": "/docs/api/search-history#routes", "anchor": "routes" } ], "mode": "source-primary", "terms": [ "routes", "return", "namespace", "query", "correlated", "clickstream", "feed", "both", "newest", "first", "versioned", "aliases", "held", "client", "compatibility", "namespaces", "search", "history", "route", "behavior", "fetch", "events", "versions", "identical" ] }, { "id": "api/search-history#search-history-entry", "kind": "section", "title": "Query History", "heading": "Search history entry", "group": "API", "url": "/docs/api/search-history#search-history-entry", "summary": "A search-history entry records wall-clock and nanosecond timestamps (the cursor), a trace id joining to the clickstream, the caller-supplied raw query string, the freshness watermark used, a structured query summary, the top result ids in rank order, and caller-supplied segmentation tags.", "facts": [ { "kind": "code", "literal": "timestamp", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "timestamp_nanos", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "trace_id", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "raw_query", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "x-hevlayer-search-query", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "stable_as_of", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "query", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "top_result_ids", "chunkId": "api/search-history#search-history-entry" }, { "kind": "code", "literal": "tags", "chunkId": "api/search-history#search-history-entry" }, { "kind": "value", "literal": "e.g", "chunkId": "api/search-history#search-history-entry" } ], "sources": [ { "chunkId": "api/search-history#search-history-entry", "url": "/docs/api/search-history#search-history-entry", "anchor": "search-history-entry" } ], "mode": "source-primary", "terms": [ "search", "history", "entry", "records", "wall", "clock", "nanosecond", "timestamps", "cursor", "trace", "joining", "clickstream", "caller", "supplied", "query", "string", "freshness", "watermark", "structured", "summary", "result", "rank", "order", "segmentation", "tags", "timestamp", "nanos", "hevlayer", "stable", "entries", "2026", "22t08", "000z", "timestampnanos", "1747900800000000000", "namespace", "products", "traceid", "f81d4fae", "7dec" ] }, { "id": "api/search-history#storage", "kind": "section", "title": "Query History", "heading": "Storage", "group": "API", "url": "/docs/api/search-history#storage", "summary": "History is stored as date-partitioned JSONL keyed by nanosecond timestamp; writes are best-effort and never block the query response, with the cache holding a recent window for fast reads and S3 as the durable store, so a cache outage degrades read latency but not durability.", "facts": [ { "kind": "code", "literal": "search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl", "chunkId": "api/search-history#storage" } ], "sources": [ { "chunkId": "api/search-history#storage", "url": "/docs/api/search-history#storage", "anchor": "storage" } ], "mode": "source-primary", "terms": [ "storage", "history", "stored", "date", "partitioned", "jsonl", "keyed", "nanosecond", "timestamp", "writes", "best", "effort", "never", "block", "query", "response", "cache", "holding", "recent", "window", "fast", "reads", "durable", "store", "outage", "degrades", "read", "latency", "durability", "search", "namespace", "yyyy", "nanos", "timestampnanos", "aerospike", "holds", "list", "calls", "walk", "prefix" ] }, { "id": "api/search-history#tag-contract", "kind": "section", "title": "Query History", "heading": "Tag contract", "group": "API", "url": "/docs/api/search-history#tag-contract", "summary": "Layer splits, trims, sorts, and dedupes tags from a header and query param before storing or matching them; commas are unescapable separators, and there are caps on tag count, tag length, and allowed characters. List filtering uses AND semantics so all requested tags must match.", "facts": [ { "kind": "code", "literal": "x-hevlayer-tags", "chunkId": "api/search-history#tag-contract" }, { "kind": "code", "literal": "?tag=", "chunkId": "api/search-history#tag-contract" }, { "kind": "code", "literal": "?tag=a,b", "chunkId": "api/search-history#tag-contract" } ], "sources": [ { "chunkId": "api/search-history#tag-contract", "url": "/docs/api/search-history#tag-contract", "anchor": "tag-contract" } ], "mode": "source-primary", "terms": [ "contract", "layer", "splits", "trims", "sorts", "dedupes", "tags", "header", "query", "param", "before", "storing", "matching", "commas", "unescapable", "separators", "there", "caps", "count", "length", "allowed", "characters", "list", "filtering", "uses", "semantics", "requested", "must", "match", "hevlayer", "whitespace", "drops", "empty", "values", "cannot", "escaped", "limits", "limit", "value", "unique" ] }, { "id": "api/search-history#writing-metadata", "kind": "section", "title": "Query History", "heading": "Writing metadata", "group": "API", "url": "/docs/api/search-history#writing-metadata", "summary": "Callers set a header to capture the human query input and another header for comma-separated segmentation tags, both exposed by the Python SDK on the query and history-list calls; the guidance is to keep the query text in the raw-query field and use tags only for segmentation.", "facts": [ { "kind": "code", "literal": "query = await client.query_namespace(\n \"products\",\n {\"vector\": embedding, \"top_k\": 10, \"include_attributes\": [\"title\"]},\n raw_query=\"wireless headphones\",\n tags=[\"app:hev-shop\", \"surface:storefront\", \"route:search\", \"page:first\"],\n)\n\nhistory = await client.list_search_history(\n \"products\",\n tags=[\"app:hev-shop\", \"route:search\", \"page:first\"],\n limit=20,\n)", "chunkId": "api/search-history#writing-metadata" }, { "kind": "code", "literal": "x-hevlayer-search-query", "chunkId": "api/search-history#writing-metadata" }, { "kind": "code", "literal": "x-hevlayer-tags", "chunkId": "api/search-history#writing-metadata" }, { "kind": "code", "literal": "raw_query", "chunkId": "api/search-history#writing-metadata" }, { "kind": "code", "literal": "tags", "chunkId": "api/search-history#writing-metadata" } ], "sources": [ { "chunkId": "api/search-history#writing-metadata", "url": "/docs/api/search-history#writing-metadata", "anchor": "writing-metadata" } ], "mode": "source-primary", "terms": [ "writing", "metadata", "callers", "header", "capture", "human", "query", "input", "another", "comma", "separated", "segmentation", "tags", "both", "exposed", "python", "history", "list", "calls", "guidance", "keep", "text", "field", "only", "await", "client", "namespace", "products", "vector", "embedding", "include", "attributes", "title", "wireless", "headphones", "shop", "surface", "storefront", "route", "search" ] }, { "id": "api/snapshots", "kind": "section", "title": "Snapshot History", "heading": null, "group": "API", "url": "/docs/api/snapshots", "summary": "Snapshots are materialized facet histograms for a namespace carrying facet listings and counts, stored durably in S3 and mirrored into the cache for the latest body; a route materializes a field on demand, and history and body routes read the durable chronology written by the consistency watcher.", "facts": [ { "kind": "code", "literal": "values[].v", "chunkId": "api/snapshots" }, { "kind": "code", "literal": "values[].n", "chunkId": "api/snapshots" }, { "kind": "code", "literal": "POST /snapshots", "chunkId": "api/snapshots" } ], "sources": [ { "chunkId": "api/snapshots", "url": "/docs/api/snapshots", "anchor": null } ], "mode": "source-primary", "terms": [ "snapshots", "materialized", "facet", "histograms", "namespace", "carrying", "listings", "counts", "stored", "durably", "mirrored", "cache", "latest", "body", "route", "materializes", "field", "demand", "history", "routes", "read", "durable", "chronology", "written", "consistency", "watcher", "values", "post", "snapshot", "jobs", "bodies", "activity", "streams", "carry", "aerospike", "materialize" ] }, { "id": "api/snapshots#activity", "kind": "section", "title": "Snapshot History", "heading": "Activity", "group": "API", "url": "/docs/api/snapshots#activity", "summary": "The snapshot activity stream returns snapshot lifecycle events filtered by a required epoch-ms lower bound, with optional limit, namespace filter, and pagination cursor; it covers snapshots only, as search history and clickstream have separate feeds.", "facts": [ { "kind": "code", "literal": "GET /v2/activity/snapshots?since=1747200000000&limit=50", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "since", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "ts_ms", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "limit", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "namespace", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "cursor", "chunkId": "api/snapshots#activity" }, { "kind": "code", "literal": "next_cursor", "chunkId": "api/snapshots#activity" } ], "sources": [ { "chunkId": "api/snapshots#activity", "url": "/docs/api/snapshots#activity", "anchor": "activity" } ], "mode": "source-primary", "terms": [ "activity", "snapshot", "stream", "returns", "lifecycle", "events", "filtered", "required", "epoch", "lower", "bound", "optional", "limit", "namespace", "filter", "pagination", "cursor", "covers", "snapshots", "only", "search", "history", "clickstream", "separate", "feeds", "since", "1747200000000", "next", "query", "param", "purpose", "tsms", "default", "exact", "nextcursor" ] }, { "id": "api/snapshots#configure-watched-fields", "kind": "section", "title": "Snapshot History", "heading": "Configure watched fields", "group": "API", "url": "/docs/api/snapshots#configure-watched-fields", "summary": "The consistency watcher only materializes snapshots for facet fields it is told to watch, configured via an environment variable mapping each namespace to its facet fields (also a Helm value); the default is empty which disables the snapshot writer. Auto-discovered namespaces are registered but only listed fields are materialized, and a minimum-interval setting floors the time between writes.", "facts": [ { "kind": "code", "literal": "export LAYER_FACET_FIELDS='{\n \"products\": [\"category\", \"brand\"],\n \"reviews\": [\"sentiment\", \"language\"]\n}'", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "LAYER_FACET_FIELDS", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "gateway.facetFields", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "source: stored", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "source: auto", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "GET /v2/namespaces", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "LAYER_SNAPSHOT_MIN_INTERVAL_MS", "chunkId": "api/snapshots#configure-watched-fields" }, { "kind": "code", "literal": "300000", "chunkId": "api/snapshots#configure-watched-fields" } ], "sources": [ { "chunkId": "api/snapshots#configure-watched-fields", "url": "/docs/api/snapshots#configure-watched-fields", "anchor": "configure-watched-fields" } ], "mode": "source-primary", "terms": [ "configure", "watched", "fields", "consistency", "watcher", "only", "materializes", "snapshots", "facet", "told", "watch", "configured", "environment", "variable", "mapping", "namespace", "also", "helm", "value", "default", "empty", "disables", "snapshot", "writer", "auto", "discovered", "namespaces", "registered", "listed", "materialized", "minimum", "interval", "setting", "floors", "time", "between", "writes", "export", "layer", "products" ] }, { "id": "api/snapshots#create-a-snapshot-job", "kind": "section", "title": "Snapshot History", "heading": "Create a snapshot job", "group": "API", "url": "/docs/api/snapshots#create-a-snapshot-job", "summary": "Creating a snapshot job posts a field, source, and optional filter and returns an accepted job to poll; valid sources are auto, stored, cache, and origin, where stored is fastest for configured fields, cache supports filters it can evaluate, and origin is authoritative and persists the computed body to S3. Completed jobs include a content SHA when a body was materialized.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/snapshots\nContent-Type: application/json\n\n{\n \"field\": \"category\",\n \"source\": \"auto\",\n \"filters\": [\"brand\", \"Eq\", \"Acme\"],\n \"page_size\": 1000\n}", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "{\n \"id\": \"snapshot-job-uuid\",\n \"namespace\": \"products\",\n \"field\": \"category\",\n \"source\": \"auto\",\n \"status\": \"running\",\n \"progress\": 0,\n \"documents_scanned\": 0,\n \"created_at\": \"2026-05-26T10:00:00Z\"\n}", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "GET /v2/namespaces/products/snapshot-jobs/snapshot-job-uuid", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "{\n \"id\": \"snapshot-job-uuid\",\n \"namespace\": \"products\",\n \"field\": \"category\",\n \"source\": \"origin\",\n \"status\": \"completed\",\n \"documents_scanned\": 12844,\n \"sha\": \"3f9e8b21\",\n \"stable_as_of\": 1747300000123\n}", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "auto", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "stored", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "cache", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "origin", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "202 Accepted", "chunkId": "api/snapshots#create-a-snapshot-job" }, { "kind": "code", "literal": "sha", "chunkId": "api/snapshots#create-a-snapshot-job" } ], "sources": [ { "chunkId": "api/snapshots#create-a-snapshot-job", "url": "/docs/api/snapshots#create-a-snapshot-job", "anchor": "create-a-snapshot-job" } ], "mode": "source-primary", "terms": [ "create", "snapshot", "creating", "posts", "field", "source", "optional", "filter", "returns", "accepted", "poll", "valid", "sources", "auto", "stored", "cache", "origin", "fastest", "configured", "fields", "supports", "filters", "evaluate", "authoritative", "persists", "computed", "body", "completed", "jobs", "include", "content", "materialized", "post", "namespaces", "products", "snapshots", "type", "application", "json", "category" ] }, { "id": "api/snapshots#history", "kind": "section", "title": "Snapshot History", "heading": "History", "group": "API", "url": "/docs/api/snapshots#history", "summary": "The history route lists durable snapshots newest-first as watermark/SHA pairs, accepting a capped limit and a before-cursor that takes 7-character SHA prefixes; it lists S3 keys only and does not read snapshot bodies.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/products/history?limit=20", "chunkId": "api/snapshots#history" }, { "kind": "code", "literal": "[\n {\"watermark_ms\": 1747300000123, \"sha\": \"3f9e8b21...\"},\n {\"watermark_ms\": 1747299600045, \"sha\": \"a1c5b09f...\"}\n]", "chunkId": "api/snapshots#history" }, { "kind": "code", "literal": "limit", "chunkId": "api/snapshots#history" }, { "kind": "code", "literal": "before", "chunkId": "api/snapshots#history" } ], "sources": [ { "chunkId": "api/snapshots#history", "url": "/docs/api/snapshots#history", "anchor": "history" } ], "mode": "source-primary", "terms": [ "history", "route", "lists", "durable", "snapshots", "newest", "first", "watermark", "pairs", "accepting", "capped", "limit", "before", "cursor", "takes", "character", "prefixes", "keys", "only", "does", "read", "snapshot", "bodies", "namespaces", "products", "1747300000123", "3f9e8b21", "1747299600045", "a1c5b09f", "watermarkms", "query", "param", "default", "purpose", "maximum", "entries", "returned", "none", "return", "older" ] }, { "id": "api/snapshots#routes", "kind": "section", "title": "Snapshot History", "heading": "Routes", "group": "API", "url": "/docs/api/snapshots#routes", "summary": "Lists the snapshot routes: create an on-demand job for one field, list and read jobs, read durable history, fetch a full body by SHA or prefix, and read the cross-namespace snapshot-write activity stream.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}/snapshots", "chunkId": "api/snapshots#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/snapshot-jobs", "chunkId": "api/snapshots#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/snapshot-jobs/{id}", "chunkId": "api/snapshots#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/history", "chunkId": "api/snapshots#routes" }, { "kind": "code", "literal": "GET /v2/namespaces/{ns}/snapshots/{sha}", "chunkId": "api/snapshots#routes" }, { "kind": "code", "literal": "GET /v2/activity/snapshots", "chunkId": "api/snapshots#routes" } ], "sources": [ { "chunkId": "api/snapshots#routes", "url": "/docs/api/snapshots#routes", "anchor": "routes" } ], "mode": "source-primary", "terms": [ "routes", "lists", "snapshot", "create", "demand", "field", "list", "read", "jobs", "durable", "history", "fetch", "full", "body", "prefix", "cross", "namespace", "write", "activity", "stream", "post", "namespaces", "snapshots", "route", "method", "behavior", "memory", "newest", "first", "char" ] }, { "id": "api/snapshots#snapshot-body", "kind": "section", "title": "Snapshot History", "heading": "Snapshot body", "group": "API", "url": "/docs/api/snapshots#snapshot-body", "summary": "A snapshot body returns the namespace, watermark, SHA, and per-field facet listings with their values and counts, plus a skipped-fields section for fields above the distinct-value cap; fields present in the listings are complete, while over-cap fields are reported as skipped rather than partially materialized.", "facts": [ { "kind": "code", "literal": "GET /v2/namespaces/products/snapshots/3f9e8b2", "chunkId": "api/snapshots#snapshot-body" }, { "kind": "code", "literal": "{\n \"namespace\": \"products\",\n \"watermark_ms\": 1747300000123,\n \"sha\": \"3f9e8b21\",\n \"fields\": [\n {\n \"name\": \"category\",\n \"values\": [\n {\"v\": \"books\", \"n\": 1240},\n {\"v\": \"electronics\", \"n\": 873}\n ]\n }\n ],\n \"fields_skipped\": [\n {\n \"name\": \"tags\",\n \"reason\": \"exceeded_cap\",\n \"distinct_observed\": 247000,\n \"cap\": 10000\n }\n ]\n}", "chunkId": "api/snapshots#snapshot-body" }, { "kind": "code", "literal": "fields[].values[].v", "chunkId": "api/snapshots#snapshot-body" }, { "kind": "code", "literal": "fields[].values[].n", "chunkId": "api/snapshots#snapshot-body" }, { "kind": "code", "literal": "fields[]", "chunkId": "api/snapshots#snapshot-body" }, { "kind": "code", "literal": "fields_skipped[]", "chunkId": "api/snapshots#snapshot-body" } ], "sources": [ { "chunkId": "api/snapshots#snapshot-body", "url": "/docs/api/snapshots#snapshot-body", "anchor": "snapshot-body" } ], "mode": "source-primary", "terms": [ "snapshot", "body", "returns", "namespace", "watermark", "field", "facet", "listings", "their", "values", "counts", "plus", "skipped", "fields", "section", "above", "distinct", "value", "present", "complete", "while", "reported", "rather", "partially", "materialized", "namespaces", "products", "snapshots", "3f9e8b2", "1747300000123", "3f9e8b21", "name", "category", "books", "1240", "electronics", "tags", "reason", "exceeded", "observed" ] }, { "id": "api/warm-cache", "kind": "section", "title": "Warm cache", "heading": null, "group": "API", "url": "/docs/api/warm-cache", "summary": "Layer exposes two warm surfaces: a Turbopuffer-compatible warm hint that advises the upstream index to preload and additionally runs Layer-side warm steps, and a Layer-only shortcut that creates a gateway warm job.", "facts": [ { "kind": "code", "literal": "hint_cache_warm", "chunkId": "api/warm-cache" }, { "kind": "code", "literal": "warm", "chunkId": "api/warm-cache" }, { "kind": "code", "literal": "GET /v1/namespaces/{ns}/hint_cache_warm", "chunkId": "api/warm-cache" }, { "kind": "value", "literal": "Upstream.astro", "chunkId": "api/warm-cache" }, { "kind": "value", "literal": "Callout.astro", "chunkId": "api/warm-cache" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/warm-cache" } ], "sources": [ { "chunkId": "api/warm-cache", "url": "/docs/api/warm-cache", "anchor": null } ], "mode": "source-primary", "terms": [ "layer", "exposes", "warm", "surfaces", "turbopuffer", "compatible", "hint", "advises", "upstream", "index", "preload", "additionally", "runs", "side", "steps", "only", "shortcut", "creates", "gateway", "cache", "namespaces", "astro", "callout", "namespace", "nvme", "snapshot", "mirror", "hintcachewarm", "matches", "call", "load" ] }, { "id": "api/warm-cache#cache-cold-behavior", "kind": "section", "title": "Warm cache", "heading": "Cache-cold behavior", "group": "API", "url": "/docs/api/warm-cache#cache-cold-behavior", "summary": "Warm jobs, cache scans, cache snapshot jobs, and pipeline chunk reads return a cache-cold error when the NVMe cache is unavailable, while fetch falls through to upstream with a miss-on-error marker; the split is deliberate because fetch is correctness-first and warming on a cold cache would be wasted work.", "facts": [ { "kind": "code", "literal": "cache_cold", "chunkId": "api/warm-cache#cache-cold-behavior" }, { "kind": "code", "literal": "x-layer-cache: miss-on-error", "chunkId": "api/warm-cache#cache-cold-behavior" } ], "sources": [ { "chunkId": "api/warm-cache#cache-cold-behavior", "url": "/docs/api/warm-cache#cache-cold-behavior", "anchor": "cache-cold-behavior" } ], "mode": "source-primary", "terms": [ "cache", "cold", "behavior", "warm", "jobs", "scans", "snapshot", "pipeline", "chunk", "reads", "return", "error", "nvme", "unavailable", "while", "fetch", "falls", "through", "upstream", "miss", "marker", "split", "deliberate", "because", "correctness", "first", "warming", "would", "wasted", "work", "layer", "cachecold", "many", "fall", "turbopuffer", "instead", "outage", "must", "turn", "missing" ] }, { "id": "api/warm-cache#hint-cache-warm", "kind": "section", "title": "Warm cache", "heading": "Hint-cache warm", "group": "API", "url": "/docs/api/warm-cache#hint-cache-warm", "summary": "The warm-hint route runs three default-on Layer steps (forward the hint upstream, start an origin warm job to backfill the cache, and mirror the latest snapshot body into NVMe), each independently disableable via query params; the response reports per-step status and includes a pollable warm job when the documents step is enabled.", "facts": [ { "kind": "code", "literal": "GET /v1/namespaces/products/hint_cache_warm", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "GET /v1/namespaces/products/hint_cache_warm?turbopuffer=false&documents=false&snapshots=true", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "turbopuffer=true", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "documents=true", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "snapshots=true", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "documents", "chunkId": "api/warm-cache#hint-cache-warm" }, { "kind": "code", "literal": "/warm-jobs/{id}", "chunkId": "api/warm-cache#hint-cache-warm" } ], "sources": [ { "chunkId": "api/warm-cache#hint-cache-warm", "url": "/docs/api/warm-cache#hint-cache-warm", "anchor": "hint-cache-warm" } ], "mode": "source-primary", "terms": [ "hint", "cache", "warm", "route", "runs", "three", "default", "layer", "steps", "forward", "upstream", "start", "origin", "backfill", "mirror", "latest", "snapshot", "body", "nvme", "independently", "disableable", "query", "params", "response", "reports", "step", "status", "includes", "pollable", "documents", "enabled", "namespaces", "products", "turbopuffer", "false", "snapshots", "true", "jobs", "hintcachewarm", "side" ] }, { "id": "api/warm-cache#layer-warm", "kind": "section", "title": "Warm cache", "heading": "Layer warm", "group": "API", "url": "/docs/api/warm-cache#layer-warm", "summary": "The Layer warm route creates an asynchronous job that pages through Turbopuffer, backfills the cache, and refreshes the warmed-through marker, intended for bootstrapping a namespace whose data was written outside the gateway; it returns an accepted warm job to poll.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products/warm?page_size=1000", "chunkId": "api/warm-cache#layer-warm" }, { "kind": "code", "literal": "{\n \"id\": \"warm-job-uuid\",\n \"namespace\": \"products\",\n \"status\": \"running\",\n \"progress\": 0,\n \"documents_scanned\": 0,\n \"created_at\": \"2026-05-26T10:00:00Z\"\n}", "chunkId": "api/warm-cache#layer-warm" }, { "kind": "code", "literal": "GET /v2/namespaces/products/warm-jobs/warm-job-uuid", "chunkId": "api/warm-cache#layer-warm" }, { "kind": "code", "literal": "POST /v2/namespaces/{ns}/warm", "chunkId": "api/warm-cache#layer-warm" }, { "kind": "code", "literal": "cache_warmed_through", "chunkId": "api/warm-cache#layer-warm" }, { "kind": "code", "literal": "202 Accepted", "chunkId": "api/warm-cache#layer-warm" } ], "sources": [ { "chunkId": "api/warm-cache#layer-warm", "url": "/docs/api/warm-cache#layer-warm", "anchor": "layer-warm" } ], "mode": "source-primary", "terms": [ "layer", "warm", "route", "creates", "asynchronous", "pages", "through", "turbopuffer", "backfills", "cache", "refreshes", "warmed", "marker", "intended", "bootstrapping", "namespace", "whose", "data", "written", "outside", "gateway", "returns", "accepted", "poll", "post", "namespaces", "products", "page", "size", "1000", "uuid", "status", "running", "progress", "documents", "scanned", "created", "2026", "26t10", "jobs" ] }, { "id": "api/write", "kind": "section", "title": "Write & Stage", "heading": null, "group": "API", "url": "/docs/api/write", "summary": "The write path is wire-compatible with the upstream write endpoint, with the documented shape showing only what Layer adds and the upstream docs covering the full request schema.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/{ns}", "chunkId": "api/write" }, { "kind": "value", "literal": "Upstream.astro", "chunkId": "api/write" }, { "kind": "value", "literal": "turbopuffer.com", "chunkId": "api/write" } ], "sources": [ { "chunkId": "api/write", "url": "/docs/api/write", "anchor": null } ], "mode": "source-primary", "terms": [ "write", "path", "wire", "compatible", "upstream", "endpoint", "documented", "shape", "showing", "only", "layer", "adds", "docs", "covering", "full", "request", "schema", "post", "namespaces", "astro", "turbopuffer", "upsert", "delete", "patch", "stage", "rows", "namespace", "below", "shows" ] }, { "id": "api/write#patch", "kind": "section", "title": "Write & Stage", "heading": "Patch", "group": "API", "url": "/docs/api/write#patch", "summary": "Patch preserves unspecified attributes and maps to the upstream patch-rows operation, but vectors cannot be patched (re-upsert the full document instead), and the upsert-time stamp is bumped on every patch so watermark-filtered reads see the patched row only after it is indexed.", "facts": [ { "kind": "code", "literal": "PATCH /v2/namespaces/products\nContent-Type: application/json\n\n{\n \"patches\": [\n {\"id\": \"asin-B08N5WRWNW\", \"attributes\": {\"category\": \"Audio\"}}\n ]\n}", "chunkId": "api/write#patch" }, { "kind": "code", "literal": "patch_rows", "chunkId": "api/write#patch" }, { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "api/write#patch" } ], "sources": [ { "chunkId": "api/write#patch", "url": "/docs/api/write#patch", "anchor": "patch" } ], "mode": "source-primary", "terms": [ "patch", "preserves", "unspecified", "attributes", "maps", "upstream", "rows", "operation", "vectors", "cannot", "patched", "upsert", "full", "document", "instead", "time", "stamp", "bumped", "every", "watermark", "filtered", "reads", "only", "after", "indexed", "namespaces", "products", "content", "type", "application", "json", "patches", "asin", "b08n5wrwnw", "category", "audio", "hevlayer", "upserted", "turbopuffer", "patchrows" ] }, { "id": "api/write#pipeline-stage", "kind": "section", "title": "Write & Stage", "heading": "Pipeline stage", "group": "API", "url": "/docs/api/write#pipeline-stage", "summary": "When a document is part of a pipeline the writer does not talk to the namespace directly; a CPU worker hands chunks to the pipeline, a GPU worker writes vectors back, and the gateway performs the namespace upsert. Staging stores chunks in the cache and marks the document pending, and re-staging the same id replaces the chunks and resets state.", "facts": [ { "kind": "code", "literal": "PUT /v2/pipelines/product-images/documents/asin-B08N5WRWNW\nContent-Type: application/json\n\n{\n \"chunks\": [\n {\"id\": \"asin-B08N5WRWNW-0\", \"text\": \"Wireless noise-cancelling headphones\"},\n {\"id\": \"asin-B08N5WRWNW-1\", \"text\": \"40-hour battery life\", \"metadata\": {\"page\": 2}}\n ]\n}", "chunkId": "api/write#pipeline-stage" }, { "kind": "code", "literal": "pending", "chunkId": "api/write#pipeline-stage" } ], "sources": [ { "chunkId": "api/write#pipeline-stage", "url": "/docs/api/write#pipeline-stage", "anchor": "pipeline-stage" } ], "mode": "source-primary", "terms": [ "pipeline", "stage", "document", "part", "writer", "does", "talk", "namespace", "directly", "worker", "hands", "chunks", "writes", "vectors", "back", "gateway", "performs", "upsert", "staging", "stores", "cache", "marks", "pending", "same", "replaces", "resets", "state", "pipelines", "product", "images", "documents", "asin", "b08n5wrwnw", "content", "type", "application", "json", "text", "wireless", "noise" ] }, { "id": "api/write#side-effects", "kind": "section", "title": "Write & Stage", "heading": "Side effects", "group": "API", "url": "/docs/api/write#side-effects", "summary": "Writes have two side effects: a best-effort NVMe cache mirror written before the upstream call that does not roll back on failure (resolved by re-sending the upsert), and the snapshot watcher re-evaluating freshness on its next poll and materializing a new snapshot if the histogram shape changed.", "facts": [], "sources": [ { "chunkId": "api/write#side-effects", "url": "/docs/api/write#side-effects", "anchor": "side-effects" } ], "mode": "source-primary", "terms": [ "side", "effects", "writes", "best", "effort", "nvme", "cache", "mirror", "written", "before", "upstream", "call", "does", "roll", "back", "failure", "resolved", "sending", "upsert", "snapshot", "watcher", "evaluating", "freshness", "next", "poll", "materializing", "histogram", "shape", "changed", "effect", "behavior", "here", "doesn", "gateway", "briefly", "didn", "reach", "index", "resolves", "evaluates" ] }, { "id": "api/write#upsert-and-delete", "kind": "section", "title": "Write & Stage", "heading": "Upsert and delete", "group": "API", "url": "/docs/api/write#upsert-and-delete", "summary": "Upsert and delete post lists of documents to upsert and ids to delete, returning success once the upstream write succeeds, an error when both lists are empty, and an upstream-failure error otherwise; NVMe cache writes happen first as a non-blocking best-effort side effect, and every upsert is server-stamped with a hidden upsert-time attribute that powers query consistency.", "facts": [ { "kind": "code", "literal": "POST /v2/namespaces/products\nContent-Type: application/json\n\n{\n \"upserts\": [\n {\n \"id\": \"asin-B08N5WRWNW\",\n \"vector\": [0.0012, -0.043],\n \"attributes\": {\"title\": \"Wireless headphones\", \"category\": \"Electronics\"}\n }\n ],\n \"deletes\": [\"asin-old-001\"]\n}", "chunkId": "api/write#upsert-and-delete" }, { "kind": "code", "literal": "upserts", "chunkId": "api/write#upsert-and-delete" }, { "kind": "code", "literal": "deletes", "chunkId": "api/write#upsert-and-delete" }, { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "api/write#upsert-and-delete" } ], "sources": [ { "chunkId": "api/write#upsert-and-delete", "url": "/docs/api/write#upsert-and-delete", "anchor": "upsert-and-delete" } ], "mode": "source-primary", "terms": [ "upsert", "delete", "post", "lists", "documents", "returning", "success", "once", "upstream", "write", "succeeds", "error", "both", "empty", "failure", "otherwise", "nvme", "cache", "writes", "happen", "first", "blocking", "best", "effort", "side", "effect", "every", "server", "stamped", "hidden", "time", "attribute", "powers", "query", "consistency", "namespaces", "products", "content", "type", "application" ] }, { "id": "concepts", "kind": "section", "title": "Concepts", "heading": null, "group": "Overview", "url": "/docs/concepts", "summary": "Introduces how the gateway composes Turbopuffer, the NVMe cache, PostgreSQL, S3, and metrics, and the core nouns the reader will work with.", "facts": [], "sources": [ { "chunkId": "concepts", "url": "/docs/concepts", "anchor": null } ], "mode": "agent-primary", "terms": [ "introduces", "gateway", "composes", "turbopuffer", "nvme", "cache", "postgresql", "metrics", "core", "nouns", "reader", "work" ] }, { "id": "concepts#control-loops", "kind": "section", "title": "Concepts", "heading": "Control loops", "group": "Overview", "url": "/docs/concepts#control-loops", "summary": "Layer uses a control loop as a core primitive that reconciles index state against metrics from the search system, which is how it applies row-level transformations and keeps an index's stable view current; related concepts are UDFs, snapshots, and the stable watermark.", "facts": [], "sources": [ { "chunkId": "concepts#control-loops", "url": "/docs/concepts#control-loops", "anchor": "control-loops" } ], "mode": "agent-primary", "terms": [ "control", "loops", "layer", "uses", "loop", "core", "primitive", "reconciles", "index", "state", "against", "metrics", "search", "system", "applies", "level", "transformations", "keeps", "stable", "view", "current", "related", "concepts", "udfs", "snapshots", "watermark", "managing", "indexes", "emitted" ] }, { "id": "concepts#gateway-enhancements", "kind": "section", "title": "Concepts", "heading": "Gateway enhancements", "group": "Overview", "url": "/docs/concepts#gateway-enhancements", "summary": "The gateway extends the search system with common query and filtering primitives using reserved attributes, and exposes everything through a single client so applications route every call through the gateway; Layer works best when traffic flows through it consistently, and schema changes on reserved attributes degrade gracefully rather than breaking outright.", "facts": [ { "kind": "code", "literal": "_hevlayer_*", "chunkId": "concepts#gateway-enhancements" } ], "sources": [ { "chunkId": "concepts#gateway-enhancements", "url": "/docs/concepts#gateway-enhancements", "anchor": "gateway-enhancements" } ], "mode": "agent-primary", "terms": [ "gateway", "enhancements", "extends", "search", "system", "common", "query", "filtering", "primitives", "reserved", "attributes", "exposes", "everything", "through", "single", "client", "applications", "route", "every", "call", "layer", "works", "best", "traffic", "flows", "consistently", "schema", "changes", "degrade", "gracefully", "rather", "breaking", "outright", "hevlayer", "helpful", "patterns", "changing", "those", "breaks", "guarantees" ] }, { "id": "concepts#glossary", "kind": "section", "title": "Concepts", "heading": "Glossary", "group": "Overview", "url": "/docs/concepts#glossary", "summary": "Defines Layer's core nouns: namespace, document, cache, stable watermark, pipeline, snapshot, facet listing and count, result count, scan, UDF, gateway, operator, shard, CRD, and PromQL, with a one-line current meaning for each.", "facts": [ { "kind": "code", "literal": "/v2/namespaces/{namespace}", "chunkId": "concepts#glossary" }, { "kind": "code", "literal": "fields[].values[].v", "chunkId": "concepts#glossary" }, { "kind": "code", "literal": "fields[].values[].n", "chunkId": "concepts#glossary" }, { "kind": "code", "literal": "_hevlayer_shard", "chunkId": "concepts#glossary" } ], "sources": [ { "chunkId": "concepts#glossary", "url": "/docs/concepts#glossary", "anchor": "glossary" } ], "mode": "agent-primary", "terms": [ "glossary", "defines", "layer", "core", "nouns", "namespace", "document", "cache", "stable", "watermark", "pipeline", "snapshot", "facet", "listing", "count", "result", "scan", "gateway", "operator", "shard", "promql", "line", "current", "meaning", "namespaces", "fields", "values", "hevlayer", "concept", "turbopuffer", "addressed", "through", "plus", "attributes", "optionally", "vector", "writing", "searching", "nvme", "backed" ] }, { "id": "concepts#kubernetes-autoscaling", "kind": "section", "title": "Concepts", "heading": "Kubernetes autoscaling", "group": "Overview", "url": "/docs/concepts#kubernetes-autoscaling", "summary": "Because Layer is stateless, every tier autoscales independently: a node autoscaler handles node-level scaling and a pod autoscaler scales against signals from an embedded PostgreSQL queue whose data is used for scaling decisions only and carries no non-recoverable state.", "facts": [], "sources": [ { "chunkId": "concepts#kubernetes-autoscaling", "url": "/docs/concepts#kubernetes-autoscaling", "anchor": "kubernetes-autoscaling" } ], "mode": "agent-primary", "terms": [ "kubernetes", "autoscaling", "because", "layer", "stateless", "every", "tier", "autoscales", "independently", "node", "autoscaler", "handles", "level", "scaling", "scales", "against", "signals", "embedded", "postgresql", "queue", "whose", "data", "decisions", "only", "carries", "recoverable", "state", "autoscale", "karpenter", "keda", "pods", "system" ] }, { "id": "concepts#observability-as-code", "kind": "section", "title": "Concepts", "heading": "Observability as code", "group": "Overview", "url": "/docs/concepts#observability-as-code", "summary": "Layer's observability contract lives in the service itself: the gateway emits a self-describing catalog of every metric (names, labels, example PromQL) so the metric surface is code rather than hand-maintained config, and the bundled dashboard, external automation, and an embedded Prometheus-compatible metrics store all read from it.", "facts": [], "sources": [ { "chunkId": "concepts#observability-as-code", "url": "/docs/concepts#observability-as-code", "anchor": "observability-as-code" } ], "mode": "agent-primary", "terms": [ "observability", "code", "layer", "contract", "lives", "service", "itself", "gateway", "emits", "self", "describing", "catalog", "every", "metric", "names", "labels", "example", "promql", "surface", "rather", "hand", "maintained", "config", "bundled", "dashboard", "external", "automation", "embedded", "prometheus", "compatible", "metrics", "store", "read", "defined", "exports", "victoriametrics", "instance", "lets", "against", "series" ] }, { "id": "concepts#pull-through-cache", "kind": "section", "title": "Concepts", "heading": "Pull-through cache", "group": "Overview", "url": "/docs/concepts#pull-through-cache", "summary": "Document reads are served by a pull-through cache that checks the NVMe-backed cache first and on a miss reads through to origin (or S3 for snapshots), returns the row, and backfills best-effort; the cache is a read accelerator, not a hard dependency, so reads still succeed if it is unavailable, and one logical cache serves every read path separated by set.", "facts": [ { "kind": "code", "literal": "set", "chunkId": "concepts#pull-through-cache" } ], "sources": [ { "chunkId": "concepts#pull-through-cache", "url": "/docs/concepts#pull-through-cache", "anchor": "pull-through-cache" } ], "mode": "agent-primary", "terms": [ "pull", "through", "cache", "document", "reads", "served", "checks", "nvme", "backed", "first", "miss", "origin", "snapshots", "returns", "backfills", "best", "effort", "read", "accelerator", "hard", "dependency", "still", "succeed", "unavailable", "logical", "serves", "every", "path", "separated", "gateway", "aerospike", "turbopuffer", "fall", "different", "uses", "fetch", "snapshot", "field", "values" ] }, { "id": "concepts#scattergather", "kind": "section", "title": "Concepts", "heading": "Scatter/gather", "group": "Overview", "url": "/docs/concepts#scattergather", "summary": "Layer can partition a single namespace into hash-bucket shards by assigning each row a reserved shard attribute, then scatters a query to every bucket in parallel and gathers, merges, and re-ranks the results down to the requested top-k; sharding is invisible to the client and the same path backs result count, scans, and UDF discovery scans.", "facts": [ { "kind": "code", "literal": "_hevlayer_shard", "chunkId": "concepts#scattergather" }, { "kind": "code", "literal": "top_k", "chunkId": "concepts#scattergather" }, { "kind": "flag", "literal": "-filtered", "chunkId": "concepts#scattergather" } ], "sources": [ { "chunkId": "concepts#scattergather", "url": "/docs/concepts#scattergather", "anchor": "scattergather" } ], "mode": "agent-primary", "terms": [ "scatter", "gather", "layer", "partition", "single", "namespace", "hash", "bucket", "shards", "assigning", "reserved", "shard", "attribute", "scatters", "query", "every", "parallel", "gathers", "merges", "ranks", "results", "down", "requested", "sharding", "invisible", "client", "same", "path", "backs", "result", "count", "scans", "discovery", "hevlayer", "filtered", "buckets", "hevlayershard", "xxh64", "modulo", "gateway" ] }, { "id": "dashboard", "kind": "section", "title": "Dashboard", "heading": null, "group": "Guides", "url": "/docs/dashboard", "summary": "The Layer dashboard is the in-cluster operator surface that reads only from the same gateway API customers use, surfacing the views that justify Layer's role between an application and its vector store; managed deployments reach it at a hosted URL and self-hosted installs expose it via a Service.", "facts": [ { "kind": "code", "literal": "https://dashboard.hevlayer.com", "chunkId": "dashboard" }, { "kind": "code", "literal": "layer-dashboard", "chunkId": "dashboard" }, { "kind": "value", "literal": "Callout.astro", "chunkId": "dashboard" } ], "sources": [ { "chunkId": "dashboard", "url": "/docs/dashboard", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "dashboard", "cluster", "operator", "surface", "reads", "only", "same", "gateway", "customers", "surfacing", "views", "justify", "role", "between", "application", "vector", "store", "managed", "deployments", "reach", "hosted", "self", "installs", "expose", "service", "https", "hevlayer", "callout", "astro", "pipeline", "worker", "scaling", "read", "write", "health", "cost", "observability", "operators", "ships" ] }, { "id": "dashboard#console", "kind": "section", "title": "Dashboard", "heading": "Console", "group": "Guides", "url": "/docs/dashboard#console", "summary": "The console is the first operator view, with an at-a-glance stripe of single-number cards (queries/s, indexed rows/s, fetch latency, cache hit ratio, error budget burn) that link into matching panels, and a newest-first activity log backed by the snapshot-activity and search-history feeds, with URL-persisted filters.", "facts": [ { "kind": "code", "literal": "/v2/activity/snapshots", "chunkId": "dashboard#console" } ], "sources": [ { "chunkId": "dashboard#console", "url": "/docs/dashboard#console", "anchor": "console" } ], "mode": "agent-primary", "terms": [ "console", "first", "operator", "view", "glance", "stripe", "single", "number", "cards", "queries", "indexed", "rows", "fetch", "latency", "cache", "ratio", "error", "budget", "burn", "link", "matching", "panels", "newest", "activity", "backed", "snapshot", "search", "history", "feeds", "persisted", "filters", "snapshots", "opens", "stripes", "card", "links", "read", "write", "observe", "panel" ] }, { "id": "dashboard#cost", "kind": "section", "title": "Dashboard", "heading": "Cost", "group": "Guides", "url": "/docs/dashboard#cost", "summary": "The cost view is a stacked-area chart driven by cost endpoints that splits spend across AWS infrastructure lines (from CloudWatch and the AWS Pricing API) and Turbopuffer lines (from usage metrics times a code-resident rate card), with an instance picker projecting the impact of changing instance types; per-namespace attribution is intentionally not modeled.", "facts": [ { "kind": "code", "literal": "/v2/cost", "chunkId": "dashboard#cost" }, { "kind": "code", "literal": "/v2/cost/timeseries", "chunkId": "dashboard#cost" }, { "kind": "code", "literal": "/v2/cost/rate-card", "chunkId": "dashboard#cost" } ], "sources": [ { "chunkId": "dashboard#cost", "url": "/docs/dashboard#cost", "anchor": "cost" } ], "mode": "agent-primary", "terms": [ "cost", "view", "stacked", "area", "chart", "driven", "endpoints", "splits", "spend", "across", "infrastructure", "lines", "cloudwatch", "pricing", "turbopuffer", "usage", "metrics", "times", "code", "resident", "rate", "card", "instance", "picker", "projecting", "impact", "changing", "types", "namespace", "attribution", "intentionally", "modeled", "timeseries", "compute", "computed", "storage", "writes", "queries", "uses", "endpoint" ] }, { "id": "dashboard#data", "kind": "section", "title": "Dashboard", "heading": "Data", "group": "Guides", "url": "/docs/dashboard#data", "summary": "The data view is the namespace inventory; drilling into a namespace shows schema and approximate row count, recent snapshot SHAs with histograms and skipped-field markers, current freshness signals, the governing Index policy fields, and a unified jobs panel. Two operator actions live here: trigger an on-demand snapshot and delete the namespace behind a confirm dialog.", "facts": [ { "kind": "code", "literal": "stable_as_of", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "is_stable", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "distanceMetric", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "cache.warming.threads", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "Index", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "POST /v2/namespaces/{ns}/snapshots", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "origin", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "auto", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "stored", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "cache", "chunkId": "dashboard#data" }, { "kind": "code", "literal": "DELETE /v2/namespaces/{ns}", "chunkId": "dashboard#data" } ], "sources": [ { "chunkId": "dashboard#data", "url": "/docs/dashboard#data", "anchor": "data" } ], "mode": "agent-primary", "terms": [ "data", "view", "namespace", "inventory", "drilling", "shows", "schema", "approximate", "count", "recent", "snapshot", "shas", "histograms", "skipped", "field", "markers", "current", "freshness", "signals", "governing", "index", "policy", "fields", "unified", "jobs", "panel", "operator", "actions", "live", "here", "trigger", "demand", "delete", "behind", "confirm", "dialog", "stable", "distancemetric", "cache", "warming" ] }, { "id": "dashboard#layout", "kind": "section", "title": "Dashboard", "heading": "Layout", "group": "Guides", "url": "/docs/dashboard#layout", "summary": "The dashboard groups everything into six tabs: console (what is happening now), data (what is in the indexes), read (query health), write (write flow and pipelines), cost (spend over time), and observe (the metrics catalog by family).", "facts": [], "sources": [ { "chunkId": "dashboard#layout", "url": "/docs/dashboard#layout", "anchor": "layout" } ], "mode": "agent-primary", "terms": [ "layout", "dashboard", "groups", "everything", "tabs", "console", "happening", "data", "indexes", "read", "query", "health", "write", "flow", "pipelines", "cost", "spend", "time", "observe", "metrics", "catalog", "family", "operators", "care", "about", "answers", "right", "glance", "gauges", "activity", "namespace", "inventory", "snapshot", "history", "schema", "queries", "healthy", "latency", "overhead", "aerospike" ] }, { "id": "dashboard#observe", "kind": "section", "title": "Dashboard", "heading": "Observe", "group": "Guides", "url": "/docs/dashboard#observe", "summary": "The observe view shows the full metrics catalog grouped by family, with each metric expanding into a sparkline that runs its PromQL through the range-query passthrough, used to confirm a behavior hypothesis without leaving the dashboard for an external tool.", "facts": [ { "kind": "code", "literal": "/v2/metrics/api/v1/query_range", "chunkId": "dashboard#observe" } ], "sources": [ { "chunkId": "dashboard#observe", "url": "/docs/dashboard#observe", "anchor": "observe" } ], "mode": "agent-primary", "terms": [ "observe", "view", "shows", "full", "metrics", "catalog", "grouped", "family", "metric", "expanding", "sparkline", "runs", "promql", "through", "range", "query", "passthrough", "confirm", "behavior", "hypothesis", "without", "leaving", "dashboard", "external", "tool", "turbopuffer", "cache", "fetch", "pipeline", "progress", "resource", "saturation", "expands", "corresponding", "queryrange", "surface", "operators", "need", "about", "grafana" ] }, { "id": "dashboard#operational-notes", "kind": "section", "title": "Dashboard", "heading": "Operational notes", "group": "Guides", "url": "/docs/dashboard#operational-notes", "summary": "Pipeline status is cached in-memory in the gateway to protect PostgreSQL during repeated dashboard or autoscaler polling; the dashboard treats a recoverable cache-cold state and a non-recoverable upstream failure as separate operator states, never receives the dashboard URL for customer workloads, and is intentionally read-mostly with mutations gated behind CRD apply or confirm dialogs.", "facts": [ { "kind": "code", "literal": "PIPELINE_STATUS_CACHE_TTL_MS", "chunkId": "dashboard#operational-notes" }, { "kind": "code", "literal": "cache_cold", "chunkId": "dashboard#operational-notes" } ], "sources": [ { "chunkId": "dashboard#operational-notes", "url": "/docs/dashboard#operational-notes", "anchor": "operational-notes" } ], "mode": "agent-primary", "terms": [ "operational", "notes", "pipeline", "status", "cached", "memory", "gateway", "protect", "postgresql", "during", "repeated", "dashboard", "autoscaler", "polling", "treats", "recoverable", "cache", "cold", "state", "upstream", "failure", "separate", "operator", "states", "never", "receives", "customer", "workloads", "intentionally", "read", "mostly", "mutations", "gated", "behind", "apply", "confirm", "dialogs", "keda", "pipelinestatuscachettlms", "defaults" ] }, { "id": "dashboard#read", "kind": "section", "title": "Dashboard", "heading": "Read", "group": "Guides", "url": "/docs/dashboard#read", "summary": "The read view answers whether queries are healthy, pulling from query and cache metric families to show query latency percentiles, Layer-side overhead so operators can tell upstream from local slowness, per-namespace cache hit ratio, and cache pool depth, node state, and stop-writes as a silent-failure surface.", "facts": [ { "kind": "code", "literal": "layer_query_*", "chunkId": "dashboard#read" }, { "kind": "code", "literal": "query_overhead_seconds", "chunkId": "dashboard#read" }, { "kind": "code", "literal": "layer_cache_lookups_total", "chunkId": "dashboard#read" }, { "kind": "code", "literal": "layer_aerospike_op_duration_seconds{status=\"aerospike_stop_writes\"}", "chunkId": "dashboard#read" } ], "sources": [ { "chunkId": "dashboard#read", "url": "/docs/dashboard#read", "anchor": "read" } ], "mode": "agent-primary", "terms": [ "read", "view", "answers", "whether", "queries", "healthy", "pulling", "query", "cache", "metric", "families", "show", "latency", "percentiles", "layer", "side", "overhead", "operators", "tell", "upstream", "local", "slowness", "namespace", "ratio", "pool", "depth", "node", "state", "stop", "writes", "silent", "failure", "surface", "seconds", "lookups", "total", "aerospike", "duration", "status", "operator" ] }, { "id": "dashboard#write", "kind": "section", "title": "Dashboard", "heading": "Write", "group": "Guides", "url": "/docs/dashboard#write", "summary": "The write view is the pipeline operator surface showing pending/in-flight/failed counts per pipeline and UDF (the same numbers the autoscaler uses), per-stage counts, active claims with lease and heartbeat state, embed pool size, and reset/pause/resume controls; an infra sub-view leads with the logical compute pools above the node pools, and it is the first stop for PostgreSQL pressure, pointing operators to the failure-mode runbook before resizing queue state.", "facts": [ { "kind": "code", "literal": "pending", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "embedding", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "indexed", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "failed", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "worker_id", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "/v2/udfs/{id}/{pause,resume,reset-failed}", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "maxReplicasPerWorkload", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "spec.scaling.pool", "chunkId": "dashboard#write" }, { "kind": "code", "literal": "layer_pg_query_duration_seconds{status=\"pg_error\"}", "chunkId": "dashboard#write" } ], "sources": [ { "chunkId": "dashboard#write", "url": "/docs/dashboard#write", "anchor": "write" } ], "mode": "agent-primary", "terms": [ "write", "view", "pipeline", "operator", "surface", "showing", "pending", "flight", "failed", "counts", "same", "numbers", "autoscaler", "uses", "stage", "active", "claims", "lease", "heartbeat", "state", "embed", "pool", "size", "reset", "pause", "resume", "controls", "infra", "leads", "logical", "compute", "pools", "above", "node", "first", "stop", "postgresql", "pressure", "pointing", "operators" ] }, { "id": "document-model", "kind": "section", "title": "Document model", "heading": null, "group": "Overview", "url": "/docs/document-model", "summary": "A Layer document is a Turbopuffer row (id, attributes, optional vector) read and written through the pull-through cache, with Layer reserving an attribute prefix for its own bookkeeping that callers and UDFs must not set; the gateway manages an upsert-time stamp that holds the read-consistency cut and a shard attribute for scatter/gather, and editing reserved attributes directly breaks guarantees but degrades gracefully.", "facts": [ { "kind": "code", "literal": "_hevlayer_*", "chunkId": "document-model" }, { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "document-model" }, { "kind": "code", "literal": "_hevlayer_upserted_at <= watermark", "chunkId": "document-model" }, { "kind": "code", "literal": "_hevlayer_shard", "chunkId": "document-model" }, { "kind": "code", "literal": "xxh64(id) % shard_count", "chunkId": "document-model" }, { "kind": "code", "literal": "_hevlayer_", "chunkId": "document-model" } ], "sources": [ { "chunkId": "document-model", "url": "/docs/document-model", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "document", "turbopuffer", "attributes", "optional", "vector", "read", "written", "through", "pull", "cache", "reserving", "attribute", "prefix", "bookkeeping", "callers", "udfs", "must", "gateway", "manages", "upsert", "time", "stamp", "holds", "consistency", "shard", "scatter", "gather", "editing", "reserved", "directly", "breaks", "guarantees", "degrades", "gracefully", "hevlayer", "upserted", "watermark", "xxh64", "count" ] }, { "id": "failure-modes", "kind": "section", "title": "Failure Modes", "heading": null, "group": "Operations", "url": "/docs/failure-modes", "summary": "Introduces how reads and writes degrade when the gateway, cache, or pipeline runs into trouble.", "facts": [], "sources": [ { "chunkId": "failure-modes", "url": "/docs/failure-modes", "anchor": null } ], "mode": "agent-primary", "terms": [ "introduces", "reads", "writes", "degrade", "gateway", "cache", "pipeline", "runs", "trouble" ] }, { "id": "failure-modes#read", "kind": "section", "title": "Failure Modes", "heading": "Read", "group": "Operations", "url": "/docs/failure-modes#read", "summary": "If the gateway is down, queries are down; the document cache is stateless and can scale to zero with no disruption, and no other component sits on the read path.", "facts": [], "sources": [ { "chunkId": "failure-modes#read", "url": "/docs/failure-modes#read", "anchor": "read" } ], "mode": "agent-primary", "terms": [ "read", "gateway", "down", "queries", "document", "cache", "stateless", "scale", "zero", "disruption", "other", "component", "sits", "path", "components", "impact" ] }, { "id": "failure-modes#write", "kind": "section", "title": "Failure Modes", "heading": "Write", "group": "Operations", "url": "/docs/failure-modes#write", "summary": "The primary write failure is a cache stop-writes during a multi-stage pipeline job: staged documents stay warm but lack vectors, and exceeding the cache drive allocation halts writes and degrades the pipeline to S3-backed chunk reads. Recovery works because chunk bodies are durable in S3 and pending state is in PostgreSQL, so workers resume after the cache refills; the Helm cache restarts on stop-writes and clears its backing file on start, making a pod restart a valid recovery action, with S3 and PostgreSQL as the durable recovery boundary.", "facts": [ { "kind": "code", "literal": "documentCache.autoRestartOnStopWrites: true", "chunkId": "failure-modes#write" }, { "kind": "code", "literal": "documentCache.storage.resetOnStart: true", "chunkId": "failure-modes#write" } ], "sources": [ { "chunkId": "failure-modes#write", "url": "/docs/failure-modes#write", "anchor": "write" } ], "mode": "agent-primary", "terms": [ "write", "primary", "failure", "cache", "stop", "writes", "during", "multi", "stage", "pipeline", "staged", "documents", "stay", "warm", "lack", "vectors", "exceeding", "drive", "allocation", "halts", "degrades", "backed", "chunk", "reads", "recovery", "works", "because", "bodies", "durable", "pending", "state", "postgresql", "workers", "resume", "after", "refills", "helm", "restarts", "clears", "backing" ] }, { "id": "guarantees", "kind": "section", "title": "No Guarantees", "heading": null, "group": "Overview", "url": "/docs/guarantees", "summary": "Layer does not offer hard guarantees; instead it makes a set of design, security, and distribution promises intended to make the software easy to use and durable, and this page tracks the status of those promises for infrastructure the customer is ultimately responsible for.", "facts": [ { "kind": "value", "literal": "Callout.astro", "chunkId": "guarantees" } ], "sources": [ { "chunkId": "guarantees", "url": "/docs/guarantees", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "does", "offer", "hard", "guarantees", "instead", "makes", "design", "security", "distribution", "promises", "intended", "make", "software", "easy", "durable", "page", "tracks", "status", "those", "infrastructure", "customer", "ultimately", "responsible", "callout", "astro", "here", "commit", "best", "provide", "secure", "hands", "while", "distribute", "believe", "stand", "test", "time", "covers", "specific" ] }, { "id": "guarantees#commitments", "kind": "section", "title": "No Guarantees", "heading": "Commitments", "group": "Overview", "url": "/docs/guarantees#commitments", "summary": "The commitments are: the search index stays in the customer's own search system (Layer will not reimplement indexing), history is backed up to a customer-specified S3 bucket (format may change before v1.0), customer document and chunk data is served from NVMe, the docs are accurate (inaccuracy is a bug to report), observability is documented as code and tested, the gateway degrades gracefully, and Layer stays client-compatible except where divergence is a deliberate improvement. It notes Layer was built by one person orchestrating agentic coding tools.", "facts": [ { "kind": "value", "literal": "v1.0", "chunkId": "guarantees#commitments" } ], "sources": [ { "chunkId": "guarantees#commitments", "url": "/docs/guarantees#commitments", "anchor": "commitments" } ], "mode": "agent-primary", "terms": [ "commitments", "search", "index", "stays", "customer", "system", "layer", "reimplement", "indexing", "history", "backed", "specified", "bucket", "format", "change", "before", "document", "chunk", "data", "served", "nvme", "docs", "accurate", "inaccuracy", "report", "observability", "documented", "code", "tested", "gateway", "degrades", "gracefully", "client", "compatible", "except", "divergence", "deliberate", "improvement", "notes", "built" ] }, { "id": "hev-shop", "kind": "section", "title": "hev-shop", "heading": null, "group": "Guides", "url": "/docs/hev-shop", "summary": "hev-shop is a reference semantic-search application built on Layer, with source included for design-preview participants.", "facts": [ { "kind": "value", "literal": "LinkGrid.astro", "chunkId": "hev-shop" } ], "sources": [ { "chunkId": "hev-shop", "url": "/docs/hev-shop", "anchor": null } ], "mode": "agent-primary", "terms": [ "shop", "reference", "semantic", "search", "application", "built", "layer", "source", "included", "design", "preview", "participants", "linkgrid", "astro" ] }, { "id": "hev-shop#reference-starter-kit", "kind": "section", "title": "hev-shop", "heading": "Reference starter kit", "group": "Guides", "url": "/docs/hev-shop#reference-starter-kit", "summary": "Design-preview participants get private repo access and fork hev-shop as a starting point; the pieces worth knowing are the single HTTP client path to the gateway, the claim/heartbeat/stage/completion pipeline lifecycle, the search route preserving the freshness timestamp, and the Helm chart with pipeline-metric scaling and optional CPU/GPU node pools.", "facts": [ { "kind": "value", "literal": "pipeline.py", "chunkId": "hev-shop#reference-starter-kit" }, { "kind": "value", "literal": "route.ts", "chunkId": "hev-shop#reference-starter-kit" }, { "kind": "value", "literal": "backend.ts", "chunkId": "hev-shop#reference-starter-kit" } ], "sources": [ { "chunkId": "hev-shop#reference-starter-kit", "url": "/docs/hev-shop#reference-starter-kit", "anchor": "reference-starter-kit" } ], "mode": "agent-primary", "terms": [ "reference", "starter", "design", "preview", "participants", "private", "repo", "access", "fork", "shop", "starting", "point", "pieces", "worth", "knowing", "single", "http", "client", "path", "gateway", "claim", "heartbeat", "stage", "completion", "pipeline", "lifecycle", "search", "route", "preserving", "freshness", "timestamp", "helm", "chart", "metric", "scaling", "optional", "node", "pools", "backend", "their" ] }, { "id": "hev-shop#what-hev-shop-is", "kind": "section", "title": "hev-shop", "heading": "What hev-shop is", "group": "Guides", "url": "/docs/hev-shop#what-hev-shop-is", "summary": "hev-shop is a live semantic shopping app built on the gateway that turns a public product/review dataset into vectors written through Layer into Turbopuffer and serves search, filters, product pages, and review-derived tags; the storefront is public but the source ships only as a reference starter kit to design-preview participants, not as open source.", "facts": [ { "kind": "value", "literal": "hev-shop.com", "chunkId": "hev-shop#what-hev-shop-is" } ], "sources": [ { "chunkId": "hev-shop#what-hev-shop-is", "url": "/docs/hev-shop#what-hev-shop-is", "anchor": "what-hev-shop-is" } ], "mode": "agent-primary", "terms": [ "shop", "live", "semantic", "shopping", "built", "gateway", "turns", "public", "product", "review", "dataset", "vectors", "written", "through", "layer", "turbopuffer", "serves", "search", "filters", "pages", "derived", "tags", "storefront", "source", "ships", "only", "reference", "starter", "design", "preview", "participants", "open", "amazon", "reviews", "2023", "data", "writes", "running", "backed", "workload" ] }, { "id": "hev-shop#why-it-matters", "kind": "section", "title": "hev-shop", "heading": "Why it matters", "group": "Guides", "url": "/docs/hev-shop#why-it-matters", "summary": "The repo is not a generic ecommerce starter but a concrete application contract (stage, claim, embed, write vectors, query with freshness signals, let the gateway own the Turbopuffer edge) so teams start from a working pattern rather than a blank slate.", "facts": [], "sources": [ { "chunkId": "hev-shop#why-it-matters", "url": "/docs/hev-shop#why-it-matters", "anchor": "why-it-matters" } ], "mode": "agent-primary", "terms": [ "matters", "repo", "generic", "ecommerce", "starter", "concrete", "application", "contract", "stage", "claim", "embed", "write", "vectors", "query", "freshness", "signals", "gateway", "turbopuffer", "edge", "teams", "start", "working", "pattern", "rather", "blank", "slate", "makes", "work", "team", "starts" ] }, { "id": "index", "kind": "section", "title": "Introduction", "heading": null, "group": "Overview", "url": "/docs", "summary": "Layer is a gateway and function runtime for retrieval systems that scales compute over multi-stage indexing pipelines and runs functions across every row of an index, with durable state in object storage. The customer runs two server components in their cluster: a Rust gateway that transparently proxies Turbopuffer (adding fetch, scans, snapshots, result count, and cache/write/pipeline semantics, and driving the function runtime) and a Kubernetes operator. The stateless compute tier is fully elastic, an optional dashboard manages config through CRDs, a node autoscaler provisions nodes for bursty GPU work, and the backing services (document cache, indexing-state store, metrics store) are all open source.", "facts": [ { "kind": "value", "literal": "Apache-2", "chunkId": "index" }, { "kind": "value", "literal": "AGPL-3", "chunkId": "index" }, { "kind": "value", "literal": "Diagram.astro", "chunkId": "index" }, { "kind": "value", "literal": "0.1", "chunkId": "index" }, { "kind": "value", "literal": "karpenter.sh", "chunkId": "index" }, { "kind": "value", "literal": "Apache-2.0", "chunkId": "index" }, { "kind": "value", "literal": "aerospike.com", "chunkId": "index" }, { "kind": "value", "literal": "AGPL-3.0", "chunkId": "index" }, { "kind": "value", "literal": "www.postgresql.org", "chunkId": "index" }, { "kind": "value", "literal": "victoriametrics.com", "chunkId": "index" }, { "kind": "value", "literal": "2.0", "chunkId": "index" }, { "kind": "value", "literal": "3.0", "chunkId": "index" } ], "sources": [ { "chunkId": "index", "url": "/docs", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "gateway", "function", "runtime", "retrieval", "systems", "scales", "compute", "multi", "stage", "indexing", "pipelines", "runs", "functions", "across", "every", "index", "durable", "state", "object", "storage", "customer", "server", "components", "their", "cluster", "rust", "transparently", "proxies", "turbopuffer", "adding", "fetch", "scans", "snapshots", "result", "count", "cache", "write", "pipeline", "semantics" ] }, { "id": "install", "kind": "section", "title": "Install", "heading": null, "group": "Operations", "url": "/docs/install", "summary": "A Layer install has two stages: Terraform provisions the required AWS resources (IAM, S3, ECR, networking, cost-read roles, and optionally a fresh cluster), and Helm installs the gateway, operator, and document cache into that cluster wired to those resources. Terraform can be skipped if the AWS resources already exist, at minimum an S3 bucket and gateway IAM role for snapshots and history.", "facts": [ { "kind": "value", "literal": "LinkGrid.astro", "chunkId": "install" } ], "sources": [ { "chunkId": "install", "url": "/docs/install", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "install", "stages", "terraform", "provisions", "required", "resources", "networking", "cost", "read", "roles", "optionally", "fresh", "cluster", "helm", "installs", "gateway", "operator", "document", "cache", "wired", "those", "skipped", "already", "exist", "minimum", "bucket", "role", "snapshots", "history", "linkgrid", "astro", "bring", "environment", "runtime", "recommended", "path", "wires", "produced", "skip" ] }, { "id": "install#what-ships-in-01", "kind": "section", "title": "Install", "heading": "What ships in 0.1", "group": "Operations", "url": "/docs/install#what-ships-in-01", "summary": "The 0.1 install is single-tenant: one Helm release per environment, one Turbopuffer credential per release, and one S3 bucket for snapshot and history data, with multi-tenant gateway scoping on the later roadmap and not yet exposed.", "facts": [ { "kind": "value", "literal": "0.1", "chunkId": "install#what-ships-in-01" }, { "kind": "value", "literal": "0.2", "chunkId": "install#what-ships-in-01" } ], "sources": [ { "chunkId": "install#what-ships-in-01", "url": "/docs/install#what-ships-in-01", "anchor": "what-ships-in-01" } ], "mode": "agent-primary", "terms": [ "ships", "install", "single", "tenant", "helm", "release", "environment", "turbopuffer", "credential", "bucket", "snapshot", "history", "data", "multi", "gateway", "scoping", "later", "roadmap", "exposed", "layer" ] }, { "id": "install/helm", "kind": "section", "title": "Helm Install", "heading": null, "group": "Operations", "url": "/docs/install/helm", "summary": "The Helm chart installs the gateway, operator, and document cache into a cluster that already has the AWS resources from Terraform or equivalent customer-managed resources.", "facts": [ { "kind": "code", "literal": "infra/helm/layer/", "chunkId": "install/helm" }, { "kind": "value", "literal": "Callout.astro", "chunkId": "install/helm" } ], "sources": [ { "chunkId": "install/helm", "url": "/docs/install/helm", "anchor": null } ], "mode": "agent-primary", "terms": [ "helm", "chart", "installs", "gateway", "operator", "document", "cache", "cluster", "already", "resources", "terraform", "equivalent", "customer", "managed", "infra", "layer", "callout", "astro", "install", "kubernetes", "manage" ] }, { "id": "install/helm#install", "kind": "section", "title": "Helm Install", "heading": "Install", "group": "Operations", "url": "/docs/install/helm#install", "summary": "Install with a Helm upgrade-install into a dedicated namespace using a customer values file; the chart is not published to a public Helm repo in 0.1, so it is installed from the source path or an artifact provided during onboarding.", "facts": [ { "kind": "code", "literal": "helm upgrade --install layer ./infra/helm/layer \\\n --namespace layer --create-namespace \\\n -f values.customer.yaml", "chunkId": "install/helm#install" }, { "kind": "value", "literal": "0.1", "chunkId": "install/helm#install" } ], "sources": [ { "chunkId": "install/helm#install", "url": "/docs/install/helm#install", "anchor": "install" } ], "mode": "agent-primary", "terms": [ "install", "helm", "upgrade", "dedicated", "namespace", "customer", "values", "file", "chart", "published", "public", "repo", "installed", "source", "path", "artifact", "provided", "during", "onboarding", "layer", "infra", "create", "yaml", "repository" ] }, { "id": "install/helm#required-values", "kind": "section", "title": "Helm Install", "heading": "Required values", "group": "Operations", "url": "/docs/install/helm#required-values", "summary": "Most of the chart is opinionated defaults; the values that must be brought from outside are the Turbopuffer API key (the one credential Layer cannot generate), the gateway image URL, the client bearer token, the S3 bucket, and the gateway IAM role ARN, with optional values for index GC, the dashboard cost role, and public ingress.", "facts": [ { "kind": "code", "literal": "turbopuffer.apiKey", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "gateway.image", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "gateway.apiKey", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "Authorization: Bearer …", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "s3.bucket", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "serviceAccount.roleArn", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "gateway.indexGc.enabled", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "Index", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "gateway.indexGc.indexNamespace", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "operator.discovery.indexNamespace", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "dashboard.serviceAccount.roleArn", "chunkId": "install/helm#required-values" }, { "kind": "code", "literal": "ingress.host", "chunkId": "install/helm#required-values" } ], "sources": [ { "chunkId": "install/helm#required-values", "url": "/docs/install/helm#required-values", "anchor": "required-values" } ], "mode": "agent-primary", "terms": [ "required", "values", "most", "chart", "opinionated", "defaults", "must", "brought", "outside", "turbopuffer", "credential", "layer", "cannot", "generate", "gateway", "image", "client", "bearer", "token", "bucket", "role", "optional", "index", "dashboard", "cost", "public", "ingress", "apikey", "authorization", "serviceaccount", "rolearn", "indexgc", "enabled", "indexnamespace", "operator", "discovery", "host", "typical", "install", "only" ] }, { "id": "install/helm#what-gets-installed", "kind": "section", "title": "Helm Install", "heading": "What gets installed", "group": "Operations", "url": "/docs/install/helm#what-gets-installed", "summary": "Helm installs the Rust gateway for compatible routes plus Layer extensions, the operator that reconciles the four CRDs, and the cache (scale-to-zero by default), along with supporting service accounts, IAM bindings, ingress, and the CRDs.", "facts": [ { "kind": "code", "literal": "layer-gateway", "chunkId": "install/helm#what-gets-installed" }, { "kind": "code", "literal": "layer-operator", "chunkId": "install/helm#what-gets-installed" }, { "kind": "code", "literal": "layer-document-cache", "chunkId": "install/helm#what-gets-installed" } ], "sources": [ { "chunkId": "install/helm#what-gets-installed", "url": "/docs/install/helm#what-gets-installed", "anchor": "what-gets-installed" } ], "mode": "agent-primary", "terms": [ "gets", "installed", "helm", "installs", "rust", "gateway", "compatible", "routes", "plus", "layer", "extensions", "operator", "reconciles", "four", "crds", "cache", "scale", "zero", "default", "along", "supporting", "service", "accounts", "bindings", "ingress", "document", "turbopuffer", "fetch", "scans", "snapshots", "warm", "jobs", "pipeline", "state", "reconciler", "index", "infrarules", "function", "documented", "kubernetes" ] }, { "id": "install/terraform", "kind": "section", "title": "Terraform", "heading": null, "group": "Operations", "url": "/docs/install/terraform", "summary": "The Terraform configuration provisions the AWS resources the gateway and operator need, being opinionated about what Layer requires and conservative about surrounding resources; DNS zones and TLS certificates are opt-in since most installs bring existing DNS and TLS.", "facts": [ { "kind": "code", "literal": "infra/terraform/", "chunkId": "install/terraform" }, { "kind": "value", "literal": "Callout.astro", "chunkId": "install/terraform" } ], "sources": [ { "chunkId": "install/terraform", "url": "/docs/install/terraform", "anchor": null } ], "mode": "agent-primary", "terms": [ "terraform", "configuration", "provisions", "resources", "gateway", "operator", "need", "being", "opinionated", "about", "layer", "requires", "conservative", "surrounding", "zones", "certificates", "since", "most", "installs", "bring", "existing", "infra", "callout", "astro", "leaves", "needs", "behave", "correctly", "around", "route53", "hosted" ] }, { "id": "install/terraform#cluster-recommended", "kind": "section", "title": "Terraform", "heading": "Cluster: recommended", "group": "Operations", "url": "/docs/install/terraform#cluster-recommended", "summary": "Design-partner installs should use a fresh cluster unless there is a specific reason not to; the cluster path provisions the VPC, control plane and node groups, a node autoscaler, a load balancer controller, and shared persistent storage. Installs reusing an existing cluster must supply the functional prerequisites themselves (S3 bucket, gateway and dashboard IAM, registry access, node autoscaling, and a load balancer controller for public ingress).", "facts": [ { "kind": "value", "literal": "0.1", "chunkId": "install/terraform#cluster-recommended" } ], "sources": [ { "chunkId": "install/terraform#cluster-recommended", "url": "/docs/install/terraform#cluster-recommended", "anchor": "cluster-recommended" } ], "mode": "agent-primary", "terms": [ "cluster", "recommended", "design", "partner", "installs", "should", "fresh", "unless", "there", "specific", "reason", "path", "provisions", "control", "plane", "node", "groups", "autoscaler", "load", "balancer", "controller", "shared", "persistent", "storage", "reusing", "existing", "must", "supply", "functional", "prerequisites", "themselves", "bucket", "gateway", "dashboard", "registry", "access", "autoscaling", "public", "ingress", "bind" ] }, { "id": "install/terraform#cost-notes", "kind": "section", "title": "Terraform", "heading": "Cost notes", "group": "Operations", "url": "/docs/install/terraform#cost-notes", "summary": "The Terraform deploys a cost-efficient footprint with autoscaling for on-demand indexing; at-rest fixed costs are mostly the cluster, NAT, and small storage, indexing bursts scale worker nodes up and back down, and heavier read use cases may need more read-side infrastructure with sizing help available from the vendor.", "facts": [], "sources": [ { "chunkId": "install/terraform#cost-notes", "url": "/docs/install/terraform#cost-notes", "anchor": "cost-notes" } ], "mode": "agent-primary", "terms": [ "cost", "notes", "terraform", "deploys", "efficient", "footprint", "autoscaling", "demand", "indexing", "rest", "fixed", "costs", "mostly", "cluster", "small", "storage", "bursts", "scale", "worker", "nodes", "back", "down", "heavier", "read", "cases", "need", "more", "side", "infrastructure", "sizing", "help", "available", "vendor", "designed", "deploy", "work", "private", "workers", "third", "party" ] }, { "id": "install/terraform#outputs", "kind": "section", "title": "Terraform", "heading": "Outputs", "group": "Operations", "url": "/docs/install/terraform#outputs", "summary": "Terraform emits the values the Helm chart needs (S3 bucket name, gateway and dashboard IAM role ARNs, image URLs, and cluster metadata) to be passed into the Helm values file.", "facts": [], "sources": [ { "chunkId": "install/terraform#outputs", "url": "/docs/install/terraform#outputs", "anchor": "outputs" } ], "mode": "agent-primary", "terms": [ "outputs", "terraform", "emits", "values", "helm", "chart", "needs", "bucket", "name", "gateway", "dashboard", "role", "arns", "image", "urls", "cluster", "metadata", "passed", "file", "install", "irsa", "cost", "read", "pass", "these", "described" ] }, { "id": "install/terraform#what-it-sets-up", "kind": "section", "title": "Terraform", "heading": "What it sets up", "group": "Operations", "url": "/docs/install/terraform#what-it-sets-up", "summary": "Terraform sets up an S3 bucket for durable snapshot/history/clickstream storage, IAM roles and policies for gateway/dashboard/worker access, image repositories for the gateway/operator/customer function images, an optional fresh cluster with VPC and node pools, and optional DNS zones and certificates.", "facts": [ { "kind": "code", "literal": "manage_public_dns=true", "chunkId": "install/terraform#what-it-sets-up" } ], "sources": [ { "chunkId": "install/terraform#what-it-sets-up", "url": "/docs/install/terraform#what-it-sets-up", "anchor": "what-it-sets-up" } ], "mode": "agent-primary", "terms": [ "sets", "terraform", "bucket", "durable", "snapshot", "history", "clickstream", "storage", "roles", "policies", "gateway", "dashboard", "worker", "access", "image", "repositories", "operator", "customer", "function", "images", "optional", "fresh", "cluster", "node", "pools", "zones", "certificates", "manage", "public", "true", "resource", "purpose", "namespace", "snapshots", "search", "events", "irsa", "cost", "read", "registry" ] }, { "id": "kubernetes/function-crd", "kind": "section", "title": "Function CRD", "heading": null, "group": "Operations", "url": "/docs/kubernetes/function-crd", "summary": "The Function CRD declares row-preserving compute over an index; the operator creates worker resources while the gateway owns discovery, queueing, retries, leases, and writeback. The spec names target namespaces, input columns, the output attribute and kind, a discovery filter, the worker image and dispatch settings, a discovery/lease schedule, retry policy, triggers, and inline scaling.", "facts": [ { "kind": "code", "literal": "Function", "chunkId": "kubernetes/function-crd" } ], "sources": [ { "chunkId": "kubernetes/function-crd", "url": "/docs/kubernetes/function-crd", "anchor": null } ], "mode": "agent-primary", "terms": [ "function", "declares", "preserving", "compute", "index", "operator", "creates", "worker", "resources", "while", "gateway", "owns", "discovery", "queueing", "retries", "leases", "writeback", "spec", "names", "target", "namespaces", "input", "columns", "output", "attribute", "kind", "filter", "image", "dispatch", "settings", "lease", "schedule", "retry", "policy", "triggers", "inline", "scaling", "stateless", "user", "defined" ] }, { "id": "kubernetes/function-crd#output", "kind": "section", "title": "Function CRD", "heading": "Output", "group": "Operations", "url": "/docs/kubernetes/function-crd#output", "summary": "An embedding-kind output should declare its dimension so consumers can validate vector shape; outputs are patched onto the target row through the gateway, and deleting a Function garbage-collects operator-managed resources but does not delete already-written attributes.", "facts": [ { "kind": "code", "literal": "output.kind: embedding", "chunkId": "kubernetes/function-crd#output" }, { "kind": "code", "literal": "output.dim", "chunkId": "kubernetes/function-crd#output" } ], "sources": [ { "chunkId": "kubernetes/function-crd#output", "url": "/docs/kubernetes/function-crd#output", "anchor": "output" } ], "mode": "agent-primary", "terms": [ "output", "embedding", "kind", "should", "declare", "dimension", "consumers", "validate", "vector", "shape", "outputs", "patched", "onto", "target", "through", "gateway", "deleting", "function", "garbage", "collects", "operator", "managed", "resources", "does", "delete", "already", "written", "attributes", "kubernetes" ] }, { "id": "kubernetes/function-crd#scaling", "kind": "section", "title": "Function CRD", "heading": "Scaling", "group": "Operations", "url": "/docs/kubernetes/function-crd#scaling", "summary": "Function scaling is inline under the spec; in autoscale mode the operator emits a scaling object triggered by UDF queue depth, the named pool must exist in the cluster infra rules, and a replica maximum above the pool's per-workload ceiling is rejected in status.", "facts": [ { "kind": "code", "literal": "spec.scaling", "chunkId": "kubernetes/function-crd#scaling" }, { "kind": "code", "literal": "ScaledObject", "chunkId": "kubernetes/function-crd#scaling" }, { "kind": "code", "literal": "mode: autoscale", "chunkId": "kubernetes/function-crd#scaling" }, { "kind": "code", "literal": "layer_udf_queue_depth", "chunkId": "kubernetes/function-crd#scaling" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "kubernetes/function-crd#scaling" }, { "kind": "code", "literal": "maxReplicasPerWorkload", "chunkId": "kubernetes/function-crd#scaling" } ], "sources": [ { "chunkId": "kubernetes/function-crd#scaling", "url": "/docs/kubernetes/function-crd#scaling", "anchor": "scaling" } ], "mode": "agent-primary", "terms": [ "scaling", "function", "inline", "under", "spec", "autoscale", "mode", "operator", "emits", "object", "triggered", "queue", "depth", "named", "pool", "must", "exist", "cluster", "infra", "rules", "replica", "maximum", "above", "workload", "ceiling", "rejected", "status", "scaledobject", "layer", "infrarules", "default", "maxreplicasperworkload", "keda", "layerudfqueuedepth", "trigger", "selected", "maxima" ] }, { "id": "kubernetes/function-crd#selection", "kind": "section", "title": "Function CRD", "heading": "Selection", "group": "Operations", "url": "/docs/kubernetes/function-crd#selection", "summary": "Functions select namespaces either explicitly by target list or by label selector on Index resources, and the filter preserves arbitrary JSON including array-form upstream filters, stored as-is by the operator and evaluated by the gateway during discovery.", "facts": [ { "kind": "code", "literal": "targetNamespaces", "chunkId": "kubernetes/function-crd#selection" }, { "kind": "code", "literal": "indexSelector", "chunkId": "kubernetes/function-crd#selection" }, { "kind": "code", "literal": "Index", "chunkId": "kubernetes/function-crd#selection" }, { "kind": "code", "literal": "filter", "chunkId": "kubernetes/function-crd#selection" } ], "sources": [ { "chunkId": "kubernetes/function-crd#selection", "url": "/docs/kubernetes/function-crd#selection", "anchor": "selection" } ], "mode": "agent-primary", "terms": [ "selection", "functions", "select", "namespaces", "either", "explicitly", "target", "list", "label", "selector", "index", "resources", "filter", "preserves", "arbitrary", "json", "including", "array", "form", "upstream", "filters", "stored", "operator", "evaluated", "gateway", "during", "discovery", "targetnamespaces", "indexselector", "explicit", "labels", "should", "choose", "turbopuffer", "stores", "shape", "evaluates" ] }, { "id": "kubernetes/function-crd#worker", "kind": "section", "title": "Function CRD", "heading": "Worker", "group": "Operations", "url": "/docs/kubernetes/function-crd#worker", "summary": "The worker block sets the image, dispatch mode (pull for SDK claim/poll workers, push for HTTP workers), push port, batch size, call timeout, and an optional pod-level merge patch; pull dispatch creates a Deployment while push dispatch also creates a Service and readiness probe.", "facts": [ { "kind": "code", "literal": "image", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "dispatch", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "pull", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "push", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "/run", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "port", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "batchSize", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "timeoutSeconds", "chunkId": "kubernetes/function-crd#worker" }, { "kind": "code", "literal": "podSpec", "chunkId": "kubernetes/function-crd#worker" } ], "sources": [ { "chunkId": "kubernetes/function-crd#worker", "url": "/docs/kubernetes/function-crd#worker", "anchor": "worker" } ], "mode": "agent-primary", "terms": [ "worker", "block", "sets", "image", "dispatch", "mode", "pull", "claim", "poll", "workers", "push", "http", "port", "batch", "size", "call", "timeout", "optional", "level", "merge", "patch", "creates", "deployment", "while", "also", "service", "readiness", "probe", "batchsize", "timeoutseconds", "podspec", "field", "purpose", "rows" ] }, { "id": "kubernetes/index-crd", "kind": "section", "title": "Index CRD", "heading": null, "group": "Operations", "url": "/docs/kubernetes/index-crd", "summary": "An Index represents one namespace exposed through the gateway, declaring the backend, snapshot policy, cache posture, consistency mode, and access metadata.", "facts": [ { "kind": "code", "literal": "Index", "chunkId": "kubernetes/index-crd" } ], "sources": [ { "chunkId": "kubernetes/index-crd", "url": "/docs/kubernetes/index-crd", "anchor": null } ], "mode": "agent-primary", "terms": [ "index", "represents", "namespace", "exposed", "through", "gateway", "declaring", "backend", "snapshot", "policy", "cache", "posture", "consistency", "mode", "access", "metadata", "declarative", "representation", "managed", "layer", "declares", "apiversion", "hevlayer", "kind", "name", "products", "spec", "turbopuffer", "region", "east", "distancemetric", "cosinedistance", "labels", "shop", "tags", "catalog", "interval", "retention", "never", "facetfields" ] }, { "id": "kubernetes/index-crd#backend", "kind": "section", "title": "Index CRD", "heading": "Backend", "group": "Operations", "url": "/docs/kubernetes/index-crd#backend", "summary": "The backend block sets the backend kind (Turbopuffer in 0.1), the region, an optional upstream namespace override defaulting to the Index name, and the vector distance metric.", "facts": [ { "kind": "code", "literal": "backend.kind", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "code", "literal": "turbopuffer", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "code", "literal": "backend.region", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "code", "literal": "backend.namespace", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "code", "literal": "backend.distanceMetric", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "code", "literal": "cosine_distance", "chunkId": "kubernetes/index-crd#backend" }, { "kind": "value", "literal": "0.1", "chunkId": "kubernetes/index-crd#backend" } ], "sources": [ { "chunkId": "kubernetes/index-crd#backend", "url": "/docs/kubernetes/index-crd#backend", "anchor": "backend" } ], "mode": "agent-primary", "terms": [ "backend", "block", "sets", "kind", "turbopuffer", "region", "optional", "upstream", "namespace", "override", "defaulting", "index", "name", "vector", "distance", "metric", "distancemetric", "cosine", "field", "purpose", "runtime", "identifier", "defaults", "default", "cosinedistance" ] }, { "id": "kubernetes/index-crd#cache-policy", "kind": "section", "title": "Index CRD", "heading": "Cache policy", "group": "Operations", "url": "/docs/kubernetes/index-crd#cache-policy", "summary": "The cache policy sets a warming thread count default while the cache remains ephemeral and durable snapshot history stays in S3.", "facts": [ { "kind": "code", "literal": "cache.warming.threads", "chunkId": "kubernetes/index-crd#cache-policy" } ], "sources": [ { "chunkId": "kubernetes/index-crd#cache-policy", "url": "/docs/kubernetes/index-crd#cache-policy", "anchor": "cache-policy" } ], "mode": "agent-primary", "terms": [ "cache", "policy", "sets", "warming", "thread", "count", "default", "while", "remains", "ephemeral", "durable", "snapshot", "history", "stays", "threads", "defaults", "aerospike" ] }, { "id": "kubernetes/index-crd#snapshot-policy", "kind": "section", "title": "Index CRD", "heading": "Snapshot policy", "group": "Operations", "url": "/docs/kubernetes/index-crd#snapshot-policy", "summary": "The snapshot policy's facet-fields list is the user-facing source of fields the gateway materializes into durable snapshots, with retention defaulting to never in 0.1 because automatic snapshot garbage collection has not shipped.", "facts": [ { "kind": "code", "literal": "snapshot.facetFields", "chunkId": "kubernetes/index-crd#snapshot-policy" }, { "kind": "code", "literal": "retention", "chunkId": "kubernetes/index-crd#snapshot-policy" }, { "kind": "code", "literal": "never", "chunkId": "kubernetes/index-crd#snapshot-policy" }, { "kind": "value", "literal": "0.1", "chunkId": "kubernetes/index-crd#snapshot-policy" } ], "sources": [ { "chunkId": "kubernetes/index-crd#snapshot-policy", "url": "/docs/kubernetes/index-crd#snapshot-policy", "anchor": "snapshot-policy" } ], "mode": "agent-primary", "terms": [ "snapshot", "policy", "facet", "fields", "list", "user", "facing", "source", "gateway", "materializes", "durable", "snapshots", "retention", "defaulting", "never", "because", "automatic", "garbage", "collection", "shipped", "facetfields", "defaults" ] }, { "id": "kubernetes/index-crd#status", "kind": "section", "title": "Index CRD", "heading": "Status", "group": "Operations", "url": "/docs/kubernetes/index-crd#status", "summary": "The operator reports observed generation, snapshot scheduling metadata, metadata sync state, and conditions on the Index status.", "facts": [], "sources": [ { "chunkId": "kubernetes/index-crd#status", "url": "/docs/kubernetes/index-crd#status", "anchor": "status" } ], "mode": "agent-primary", "terms": [ "status", "operator", "reports", "observed", "generation", "snapshot", "scheduling", "metadata", "sync", "state", "conditions", "index" ] }, { "id": "kubernetes/operator", "kind": "section", "title": "Operator Overview", "heading": null, "group": "Operations", "url": "/docs/kubernetes/operator", "summary": "The operator manages declarative state for a deployment, monitoring index changes and managing scaling through custom resource definitions; the gateway owns the read/write path while the operator owns everything expressed as desired cluster state, such as which indexes exist, how worker pools scale, and which functions run against which indexes.", "facts": [ { "kind": "code", "literal": "layer-operator", "chunkId": "kubernetes/operator" } ], "sources": [ { "chunkId": "kubernetes/operator", "url": "/docs/kubernetes/operator", "anchor": null } ], "mode": "agent-primary", "terms": [ "operator", "manages", "declarative", "state", "deployment", "monitoring", "index", "changes", "managing", "scaling", "through", "custom", "resource", "definitions", "gateway", "owns", "read", "write", "path", "while", "everything", "expressed", "desired", "cluster", "such", "indexes", "exist", "worker", "pools", "scale", "functions", "against", "layer", "reconciles", "relates", "serves", "crucial", "does", "abstractions", "known" ] }, { "id": "kubernetes/operator#crds", "kind": "section", "title": "Operator Overview", "heading": "CRDs", "group": "Operations", "url": "/docs/kubernetes/operator#crds", "summary": "The operator reconciles four resource kinds, each on its own page: Index (one per managed namespace), InfraRules (cluster-wide compute pools, cache rules, and shared scaling policy), Pipeline (staged work that changes row count), and Function (stateless functions that read and write attributes).", "facts": [], "sources": [ { "chunkId": "kubernetes/operator#crds", "url": "/docs/kubernetes/operator#crds", "anchor": "crds" } ], "mode": "agent-primary", "terms": [ "crds", "operator", "reconciles", "four", "resource", "kinds", "page", "index", "managed", "namespace", "infrarules", "cluster", "wide", "compute", "pools", "cache", "rules", "shared", "scaling", "policy", "pipeline", "staged", "work", "changes", "count", "function", "stateless", "functions", "read", "write", "attributes", "documented", "turbopuffer", "gateway", "should", "manage", "document", "user", "defined" ] }, { "id": "kubernetes/operator#relationship-to-the-gateway", "kind": "section", "title": "Operator Overview", "heading": "Relationship to the gateway", "group": "Operations", "url": "/docs/kubernetes/operator#relationship-to-the-gateway", "summary": "The gateway and operator are decoupled and neither sits in the other's hot path, so the gateway keeps serving even if the operator restarts or lags; the link is one-directional and read-only, with the gateway reading CRD status to inform what it serves but never writing CRDs, which are authored by the customer and reconciled by the operator.", "facts": [], "sources": [ { "chunkId": "kubernetes/operator#relationship-to-the-gateway", "url": "/docs/kubernetes/operator#relationship-to-the-gateway", "anchor": "relationship-to-the-gateway" } ], "mode": "agent-primary", "terms": [ "relationship", "gateway", "operator", "decoupled", "neither", "sits", "other", "path", "keeps", "serving", "even", "restarts", "lags", "link", "directional", "read", "only", "reading", "status", "inform", "serves", "never", "writing", "crds", "authored", "customer", "reconciled", "reconciles", "declarative", "state", "write", "restarted", "lagging", "between", "some", "features", "reads", "indexes", "exist", "worker" ] }, { "id": "kubernetes/operator#scheduling-and-node-pools", "kind": "section", "title": "Operator Overview", "heading": "Scheduling and node pools", "group": "Operations", "url": "/docs/kubernetes/operator#scheduling-and-node-pools", "summary": "The operator does not schedule pipeline and function pods onto general capacity; each compute pool pins to a dedicated labeled node pool via node selectors and tolerations so CPU and GPU work lands on the right isolated nodes. The shipped defaults assume the node autoscaler's pool label but any labeled pool works, and this is configured once on the cluster infra-rules object rather than per workload.", "facts": [ { "kind": "code", "literal": "nodeSelector", "chunkId": "kubernetes/operator#scheduling-and-node-pools" }, { "kind": "code", "literal": "tolerations", "chunkId": "kubernetes/operator#scheduling-and-node-pools" }, { "kind": "code", "literal": "karpenter.sh/nodepool", "chunkId": "kubernetes/operator#scheduling-and-node-pools" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "kubernetes/operator#scheduling-and-node-pools" }, { "kind": "value", "literal": "karpenter.sh", "chunkId": "kubernetes/operator#scheduling-and-node-pools" } ], "sources": [ { "chunkId": "kubernetes/operator#scheduling-and-node-pools", "url": "/docs/kubernetes/operator#scheduling-and-node-pools", "anchor": "scheduling-and-node-pools" } ], "mode": "agent-primary", "terms": [ "scheduling", "node", "pools", "operator", "does", "schedule", "pipeline", "function", "pods", "onto", "general", "capacity", "compute", "pool", "pins", "dedicated", "labeled", "selectors", "tolerations", "work", "lands", "right", "isolated", "nodes", "shipped", "defaults", "assume", "autoscaler", "label", "works", "configured", "once", "cluster", "infra", "rules", "object", "rather", "workload", "nodeselector", "karpenter" ] }, { "id": "kubernetes/pipeline-crd", "kind": "section", "title": "Pipeline CRD", "heading": null, "group": "Operations", "url": "/docs/kubernetes/pipeline-crd", "summary": "The Pipeline CRD declares worker-owned indexing work whose row count can change between input and output (ingestion, chunking, fan-out), as opposed to a Function used when existing rows gain a derived attribute without changing count; Pipelines and Functions share the same worker and scaling envelopes, with the cluster infra-rules object owning placement and pool limits and each workload choosing a pool. The spec names a target namespace, an open source reference, the worker, and inline scaling.", "facts": [ { "kind": "code", "literal": "Pipeline", "chunkId": "kubernetes/pipeline-crd" }, { "kind": "code", "literal": "spec.worker", "chunkId": "kubernetes/pipeline-crd" }, { "kind": "code", "literal": "spec.scaling", "chunkId": "kubernetes/pipeline-crd" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "kubernetes/pipeline-crd" } ], "sources": [ { "chunkId": "kubernetes/pipeline-crd", "url": "/docs/kubernetes/pipeline-crd", "anchor": null } ], "mode": "agent-primary", "terms": [ "pipeline", "declares", "worker", "owned", "indexing", "work", "whose", "count", "change", "between", "input", "output", "ingestion", "chunking", "opposed", "function", "existing", "rows", "gain", "derived", "attribute", "without", "changing", "pipelines", "functions", "share", "same", "scaling", "envelopes", "cluster", "infra", "rules", "object", "owning", "placement", "pool", "limits", "workload", "choosing", "spec" ] }, { "id": "kubernetes/pipeline-crd#scaling", "kind": "section", "title": "Pipeline CRD", "heading": "Scaling", "group": "Operations", "url": "/docs/kubernetes/pipeline-crd#scaling", "summary": "Pipeline scaling sets a pool that must exist in the cluster infra rules and a mode: autoscale creates a scaling object backed by pipeline queue depth, fixed pins the deployment to the minimum, and disabled (or pausing) scales it to zero.", "facts": [ { "kind": "code", "literal": "scaling:\n pool: cpu\n mode: autoscale\n replicas:\n min: 0\n max: 8", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "spec.scaling.pool", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "mode: autoscale", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "ScaledObject", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "mode: fixed", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "replicas.min", "chunkId": "kubernetes/pipeline-crd#scaling" }, { "kind": "code", "literal": "spec.paused: true", "chunkId": "kubernetes/pipeline-crd#scaling" } ], "sources": [ { "chunkId": "kubernetes/pipeline-crd#scaling", "url": "/docs/kubernetes/pipeline-crd#scaling", "anchor": "scaling" } ], "mode": "agent-primary", "terms": [ "scaling", "pipeline", "sets", "pool", "must", "exist", "cluster", "infra", "rules", "mode", "autoscale", "creates", "object", "backed", "queue", "depth", "fixed", "pins", "deployment", "minimum", "disabled", "pausing", "scales", "zero", "replicas", "spec", "infrarules", "default", "scaledobject", "paused", "true", "name", "keda", "also", "worker" ] }, { "id": "kubernetes/pipeline-crd#source", "kind": "section", "title": "Pipeline CRD", "heading": "Source", "group": "Operations", "url": "/docs/kubernetes/pipeline-crd#source", "summary": "The source reference is intentionally open JSON so operators can record the external feed (queue, stream, object events, partner API, or migration); the operator passes it through as declarative metadata and the worker image owns source-specific behavior.", "facts": [ { "kind": "code", "literal": "spec.sourceRef", "chunkId": "kubernetes/pipeline-crd#source" } ], "sources": [ { "chunkId": "kubernetes/pipeline-crd#source", "url": "/docs/kubernetes/pipeline-crd#source", "anchor": "source" } ], "mode": "agent-primary", "terms": [ "source", "reference", "intentionally", "open", "json", "operators", "record", "external", "feed", "queue", "stream", "object", "events", "partner", "migration", "operator", "passes", "through", "declarative", "metadata", "worker", "image", "owns", "specific", "behavior", "spec", "sourceref", "feeds", "kafka" ] }, { "id": "kubernetes/pipeline-crd#status", "kind": "section", "title": "Pipeline CRD", "heading": "Status", "group": "Operations", "url": "/docs/kubernetes/pipeline-crd#status", "summary": "The operator reports managed object references and readiness conditions on the Pipeline status, while queue counts and worker progress are served by the gateway pipeline status API.", "facts": [], "sources": [ { "chunkId": "kubernetes/pipeline-crd#status", "url": "/docs/kubernetes/pipeline-crd#status", "anchor": "status" } ], "mode": "agent-primary", "terms": [ "status", "operator", "reports", "managed", "object", "references", "readiness", "conditions", "pipeline", "while", "queue", "counts", "worker", "progress", "served", "gateway" ] }, { "id": "kubernetes/pipeline-crd#target", "kind": "section", "title": "Pipeline CRD", "heading": "Target", "group": "Operations", "url": "/docs/kubernetes/pipeline-crd#target", "summary": "The target namespace is the namespace the pipeline writes, and the gateway pipeline API owns document state, chunks, and vector writes for that target namespace.", "facts": [ { "kind": "code", "literal": "spec.target.namespace", "chunkId": "kubernetes/pipeline-crd#target" } ], "sources": [ { "chunkId": "kubernetes/pipeline-crd#target", "url": "/docs/kubernetes/pipeline-crd#target", "anchor": "target" } ], "mode": "agent-primary", "terms": [ "target", "namespace", "pipeline", "writes", "gateway", "owns", "document", "state", "chunks", "vector", "spec", "turbopuffer" ] }, { "id": "kubernetes/pipeline-crd#worker", "kind": "section", "title": "Pipeline CRD", "heading": "Worker", "group": "Operations", "url": "/docs/kubernetes/pipeline-crd#worker", "summary": "The Pipeline worker block sets the image, batch size, call timeout, and an optional pod-level merge patch, with the operator creating one Deployment per Pipeline.", "facts": [ { "kind": "code", "literal": "image", "chunkId": "kubernetes/pipeline-crd#worker" }, { "kind": "code", "literal": "batchSize", "chunkId": "kubernetes/pipeline-crd#worker" }, { "kind": "code", "literal": "timeoutSeconds", "chunkId": "kubernetes/pipeline-crd#worker" }, { "kind": "code", "literal": "podSpec", "chunkId": "kubernetes/pipeline-crd#worker" } ], "sources": [ { "chunkId": "kubernetes/pipeline-crd#worker", "url": "/docs/kubernetes/pipeline-crd#worker", "anchor": "worker" } ], "mode": "agent-primary", "terms": [ "worker", "pipeline", "block", "sets", "image", "batch", "size", "call", "timeout", "optional", "level", "merge", "patch", "operator", "creating", "deployment", "batchsize", "timeoutseconds", "podspec", "field", "purpose", "work", "items", "creates" ] }, { "id": "kubernetes/scaling-crd", "kind": "section", "title": "InfraRules CRD", "heading": null, "group": "Operations", "url": "/docs/kubernetes/scaling-crd", "summary": "InfraRules is the cluster-scoped policy object for Layer-managed runtime infrastructure, with exactly one object in 0.1; Pipelines and Functions do not reference a separate autoscaling resource but set inline scaling and choose a compute pool defined here.", "facts": [ { "kind": "code", "literal": "InfraRules", "chunkId": "kubernetes/scaling-crd" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "kubernetes/scaling-crd" }, { "kind": "code", "literal": "spec.scaling", "chunkId": "kubernetes/scaling-crd" }, { "kind": "code", "literal": "InfraRules/default.spec.computePools", "chunkId": "kubernetes/scaling-crd" }, { "kind": "value", "literal": "0.1", "chunkId": "kubernetes/scaling-crd" } ], "sources": [ { "chunkId": "kubernetes/scaling-crd", "url": "/docs/kubernetes/scaling-crd", "anchor": null } ], "mode": "agent-primary", "terms": [ "infrarules", "cluster", "scoped", "policy", "object", "layer", "managed", "runtime", "infrastructure", "exactly", "pipelines", "functions", "reference", "separate", "autoscaling", "resource", "inline", "scaling", "choose", "compute", "pool", "defined", "here", "default", "spec", "computepools", "wide", "pools", "document", "cache", "rules", "workload", "surface" ] }, { "id": "kubernetes/scaling-crd#compute-pools", "kind": "section", "title": "InfraRules CRD", "heading": "Compute pools", "group": "Operations", "url": "/docs/kubernetes/scaling-crd#compute-pools", "summary": "Each compute pool declares a name referenced by workloads, a class label, an optional GPU type, node selectors and tolerations applied to chosen worker pods, container resources, and a hard per-workload replica ceiling; a workload naming an unknown pool or exceeding the ceiling is left unready with a status condition.", "facts": [ { "kind": "code", "literal": "name", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "spec.scaling.pool", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "kind", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "cpu", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "gpu", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "gpuType", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "nodeSelector", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "tolerations", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "resources", "chunkId": "kubernetes/scaling-crd#compute-pools" }, { "kind": "code", "literal": "maxReplicasPerWorkload", "chunkId": "kubernetes/scaling-crd#compute-pools" } ], "sources": [ { "chunkId": "kubernetes/scaling-crd#compute-pools", "url": "/docs/kubernetes/scaling-crd#compute-pools", "anchor": "compute-pools" } ], "mode": "agent-primary", "terms": [ "compute", "pools", "pool", "declares", "name", "referenced", "workloads", "class", "label", "optional", "type", "node", "selectors", "tolerations", "applied", "chosen", "worker", "pods", "container", "resources", "hard", "workload", "replica", "ceiling", "naming", "unknown", "exceeding", "left", "unready", "status", "condition", "spec", "scaling", "kind", "gputype", "nodeselector", "maxreplicasperworkload", "field", "purpose", "pipeline" ] }, { "id": "kubernetes/scaling-crd#document-cache-rules", "kind": "section", "title": "InfraRules CRD", "heading": "Document cache rules", "group": "Operations", "url": "/docs/kubernetes/scaling-crd#document-cache-rules", "summary": "The document-cache block captures the operator-owned cache envelope (capacity, replication factor, node count); in 0.1 Helm still renders the cache scaling object directly while this section is the declared policy shape the operator reports and validates against.", "facts": [ { "kind": "code", "literal": "documentCache", "chunkId": "kubernetes/scaling-crd#document-cache-rules" }, { "kind": "code", "literal": "InfraRules", "chunkId": "kubernetes/scaling-crd#document-cache-rules" }, { "kind": "value", "literal": "0.1", "chunkId": "kubernetes/scaling-crd#document-cache-rules" } ], "sources": [ { "chunkId": "kubernetes/scaling-crd#document-cache-rules", "url": "/docs/kubernetes/scaling-crd#document-cache-rules", "anchor": "document-cache-rules" } ], "mode": "agent-primary", "terms": [ "document", "cache", "rules", "block", "captures", "operator", "owned", "envelope", "capacity", "replication", "factor", "node", "count", "helm", "still", "renders", "scaling", "object", "directly", "while", "section", "declared", "policy", "shape", "reports", "validates", "against", "documentcache", "infrarules", "keda" ] }, { "id": "kubernetes/scaling-crd#infrarules", "kind": "section", "title": "InfraRules CRD", "heading": "InfraRules", "group": "Operations", "url": "/docs/kubernetes/scaling-crd#infrarules", "summary": "The InfraRules object (which must be named default and can be rendered by Helm) declares compute pools, the document-cache envelope, and node scaling for the cluster's Layer infrastructure.", "facts": [ { "kind": "code", "literal": "default", "chunkId": "kubernetes/scaling-crd#infrarules" }, { "kind": "code", "literal": "operator.infraRules.create=true", "chunkId": "kubernetes/scaling-crd#infrarules" } ], "sources": [ { "chunkId": "kubernetes/scaling-crd#infrarules", "url": "/docs/kubernetes/scaling-crd#infrarules", "anchor": "infrarules" } ], "mode": "agent-primary", "terms": [ "infrarules", "object", "must", "named", "default", "rendered", "helm", "declares", "compute", "pools", "document", "cache", "envelope", "node", "scaling", "cluster", "layer", "infrastructure", "operator", "create", "true", "apiversion", "hevlayer", "v1alpha1", "kind", "metadata", "name", "spec", "computepools", "maxreplicasperworkload", "nodeselector", "karpenter", "nodepool", "tolerations", "resources", "requests", "500m", "memory", "512mi", "limits" ] }, { "id": "kubernetes/scaling-crd#workload-scaling", "kind": "section", "title": "InfraRules CRD", "heading": "Workload scaling", "group": "Operations", "url": "/docs/kubernetes/scaling-crd#workload-scaling", "summary": "Workload scaling chooses a pool and a mode: autoscale emits a scaling object letting queue depth scale the deployment between min and max, fixed sets replicas to the minimum with no scaling object, and disabled (or pausing) scales to zero; keep a cold-start-heavy worker warm by autoscaling with a minimum of one.", "facts": [ { "kind": "code", "literal": "scaling:\n pool: cpu\n mode: autoscale\n replicas:\n min: 0\n max: 4", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "autoscale", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "ScaledObject", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "min", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "max", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "fixed", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "replicas.min", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "disabled", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "mode: autoscale", "chunkId": "kubernetes/scaling-crd#workload-scaling" }, { "kind": "code", "literal": "replicas.min: 1", "chunkId": "kubernetes/scaling-crd#workload-scaling" } ], "sources": [ { "chunkId": "kubernetes/scaling-crd#workload-scaling", "url": "/docs/kubernetes/scaling-crd#workload-scaling", "anchor": "workload-scaling" } ], "mode": "agent-primary", "terms": [ "workload", "scaling", "chooses", "pool", "mode", "autoscale", "emits", "object", "letting", "queue", "depth", "scale", "deployment", "between", "fixed", "sets", "replicas", "minimum", "disabled", "pausing", "scales", "zero", "keep", "cold", "start", "heavy", "worker", "warm", "autoscaling", "scaledobject", "behavior", "emit", "keda", "emitted", "paused", "workloads", "also" ] }, { "id": "limits", "kind": "section", "title": "Limits", "heading": null, "group": "Overview", "url": "/docs/limits", "summary": "Layer inherits ceilings from its bundled components that will lift as demand grows: a single-node cache, a maximum number of Turbopuffer namespaces, a maximum cache size, and a distinct-value cap per scan facet field (fields over the cap are reported as skipped rather than partially materialized so emitted fields are always complete).", "facts": [ { "kind": "code", "literal": "fields_skipped[]", "chunkId": "limits" }, { "kind": "code", "literal": "fields[]", "chunkId": "limits" } ], "sources": [ { "chunkId": "limits", "url": "/docs/limits", "anchor": null } ], "mode": "agent-primary", "terms": [ "layer", "inherits", "ceilings", "bundled", "components", "lift", "demand", "grows", "single", "node", "cache", "maximum", "number", "turbopuffer", "namespaces", "size", "distinct", "value", "scan", "facet", "field", "fields", "reported", "skipped", "rather", "partially", "materialized", "emitted", "always", "complete", "current", "inherited", "ship", "limited", "certain", "constraints", "underlying", "these", "increases", "aerospike" ] }, { "id": "limits#no-limits", "kind": "section", "title": "Limits", "heading": "No limits", "group": "Overview", "url": "/docs/limits#no-limits", "summary": "Several things have no enforced ceiling but practical limits under load: CRD instance counts (bounded by cluster throughput), snapshot/search/clickstream history (durable in S3 with no automatic expiry, bounded by storage cost), UDF concurrency (bounded by cluster capacity), pipeline queue depth (kept compact via S3 manifests), and document size and attribute count (bounded by the underlying stores, not by Layer).", "facts": [ { "kind": "code", "literal": "Index", "chunkId": "limits#no-limits" }, { "kind": "code", "literal": "Function", "chunkId": "limits#no-limits" }, { "kind": "code", "literal": "Pipeline", "chunkId": "limits#no-limits" }, { "kind": "code", "literal": "Scaling", "chunkId": "limits#no-limits" } ], "sources": [ { "chunkId": "limits#no-limits", "url": "/docs/limits#no-limits", "anchor": "no-limits" } ], "mode": "agent-primary", "terms": [ "limits", "several", "things", "enforced", "ceiling", "practical", "under", "load", "instance", "counts", "bounded", "cluster", "throughput", "snapshot", "search", "clickstream", "history", "durable", "automatic", "expiry", "storage", "cost", "concurrency", "capacity", "pipeline", "queue", "depth", "kept", "compact", "manifests", "document", "size", "attribute", "count", "underlying", "stores", "layer", "index", "function", "scaling" ] }, { "id": "pipelines", "kind": "section", "title": "Pipelines", "heading": null, "group": "Guides", "url": "/docs/pipelines", "summary": "A pipeline indexes documents through staged work whose row count changes, commonly CPU extract then GPU embed; the gateway tracks document state in PostgreSQL and exports queue depth so the operator can autoscale workers, and once vectors land in Turbopuffer they are queried and fetched through the namespace API.", "facts": [ { "kind": "value", "literal": "Diagram.astro", "chunkId": "pipelines" } ], "sources": [ { "chunkId": "pipelines", "url": "/docs/pipelines", "anchor": null } ], "mode": "agent-primary", "terms": [ "pipeline", "indexes", "documents", "through", "staged", "work", "whose", "count", "changes", "commonly", "extract", "embed", "gateway", "tracks", "document", "state", "postgresql", "exports", "queue", "depth", "operator", "autoscale", "workers", "once", "vectors", "land", "turbopuffer", "queried", "fetched", "namespace", "diagram", "astro", "extraction", "embedding", "chunk", "handoff", "keda", "scaling", "signals", "common" ] }, { "id": "pipelines#autoscaling", "kind": "section", "title": "Pipelines", "heading": "Autoscaling", "group": "Guides", "url": "/docs/pipelines#autoscaling", "summary": "The operator emits the scaling object directly from a Pipeline's scaling spec; manual workers not represented by a Pipeline CR can use the same Prometheus pending-count signal via a scaling object so autoscaling stays close to the same source of truth Layer uses for claims while keeping PostgreSQL private to the gateway pod.", "facts": [ { "kind": "code", "literal": "Pipeline.spec.scaling", "chunkId": "pipelines#autoscaling" } ], "sources": [ { "chunkId": "pipelines#autoscaling", "url": "/docs/pipelines#autoscaling", "anchor": "autoscaling" } ], "mode": "agent-primary", "terms": [ "autoscaling", "operator", "emits", "scaling", "object", "directly", "pipeline", "spec", "manual", "workers", "represented", "same", "prometheus", "pending", "count", "signal", "stays", "close", "source", "truth", "layer", "uses", "claims", "while", "keeping", "postgresql", "private", "gateway", "keda", "apiversion", "v1alpha1", "kind", "scaledobject", "metadata", "name", "embed", "worker", "scaletargetref", "minreplicacount", "maxreplicacount" ] }, { "id": "pipelines#claim-heartbeat-stage", "kind": "section", "title": "Pipelines", "heading": "Claim, heartbeat, stage", "group": "Guides", "url": "/docs/pipelines#claim-heartbeat-stage", "summary": "Workers claim staged documents through the gateway rather than mutating PostgreSQL directly; the gateway records claim ownership and time, moves rows to the requested stage, recovers stale claims past their lease, and uses skip-locked semantics for concurrent claims, with heartbeat and stage routes to extend leases and move documents to final stages. Pipeline queues are segmented: document and chunk id lists go into compressed S3 manifests while only segment leases and counters live in PostgreSQL, so queues scale by segment count, with manifests treated as queue state rather than durable history and cleaned up as segments split, complete, or the pipeline is deleted.", "facts": [ { "kind": "code", "literal": "POST /v2/pipelines/product-images/claim\n{\n \"stage\": \"pending\",\n \"claim_stage\": \"embedding\",\n \"limit\": 2000,\n \"worker_id\": \"gpu-worker-0\",\n \"lease_seconds\": 900\n}", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "POST /v2/pipelines/product-images/documents/heartbeat\n{\n \"document_ids\": [\"B07XYZ123\"],\n \"stage\": \"embedding\",\n \"worker_id\": \"gpu-worker-0\"\n}", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "POST /v2/pipelines/product-images/documents/stage\n{\n \"document_ids\": [\"B07XYZ123\"],\n \"stage\": \"indexed\",\n \"from_stage\": \"embedding\",\n \"worker_id\": \"gpu-worker-0\"\n}", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "claimed_by", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "claimed_at", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "FOR UPDATE SKIP LOCKED", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "stage: \"pending\"", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "stage: \"failed\"", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "create_missing: true", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "from_stage", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "worker_id", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "PIPELINE_SEGMENT_SIZE", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "code", "literal": "indexed", "chunkId": "pipelines#claim-heartbeat-stage" }, { "kind": "value", "literal": "e.g", "chunkId": "pipelines#claim-heartbeat-stage" } ], "sources": [ { "chunkId": "pipelines#claim-heartbeat-stage", "url": "/docs/pipelines#claim-heartbeat-stage", "anchor": "claim-heartbeat-stage" } ], "mode": "agent-primary", "terms": [ "claim", "heartbeat", "stage", "workers", "staged", "documents", "through", "gateway", "rather", "mutating", "postgresql", "directly", "records", "ownership", "time", "moves", "rows", "requested", "recovers", "stale", "claims", "past", "their", "lease", "uses", "skip", "locked", "semantics", "concurrent", "routes", "extend", "leases", "move", "final", "stages", "pipeline", "queues", "segmented", "document", "chunk" ] }, { "id": "pipelines#cpu-workers--scale-on-input-source", "kind": "section", "title": "Pipelines", "heading": "CPU workers — scale on input source", "group": "Guides", "url": "/docs/pipelines#cpu-workers--scale-on-input-source", "summary": "CPU workers scale on whatever feeds them (queue depth, consumer lag, object event notifications), independent of the pipeline API, via a scaling object triggered on the input source.", "facts": [ { "kind": "code", "literal": "apiVersion: keda.sh/v1alpha1\nkind: ScaledObject\nmetadata:\n name: cpu-extract-worker\nspec:\n triggers:\n - type: aws-sqs-queue\n metadata:\n queueURL: https://sqs.us-east-1.amazonaws.com/123456789/product-images\n queueLength: \"10\"\n awsRegion: us-east-1", "chunkId": "pipelines#cpu-workers--scale-on-input-source" } ], "sources": [ { "chunkId": "pipelines#cpu-workers--scale-on-input-source", "url": "/docs/pipelines#cpu-workers--scale-on-input-source", "anchor": "cpu-workers--scale-on-input-source" } ], "mode": "agent-primary", "terms": [ "workers", "scale", "input", "source", "whatever", "feeds", "queue", "depth", "consumer", "object", "event", "notifications", "independent", "pipeline", "scaling", "triggered", "apiversion", "keda", "v1alpha1", "kind", "scaledobject", "metadata", "name", "extract", "worker", "spec", "triggers", "type", "queueurl", "https", "east", "amazonaws", "123456789", "product", "images", "queuelength", "awsregion", "kafka" ] }, { "id": "pipelines#create-a-pipeline", "kind": "section", "title": "Pipelines", "heading": "Create a pipeline", "group": "Guides", "url": "/docs/pipelines#create-a-pipeline", "summary": "A pipeline is created by posting an id, target namespace, and distance metric (defaulting to cosine); the call conflicts if the pipeline already exists.", "facts": [ { "kind": "code", "literal": "curl -X POST http://gateway:8080/v2/pipelines \\\n -H 'content-type: application/json' \\\n -d '{\n \"id\": \"product-images\",\n \"target_namespace\": \"products\",\n \"distance_metric\": \"cosine_distance\"\n }'", "chunkId": "pipelines#create-a-pipeline" }, { "kind": "code", "literal": "distance_metric", "chunkId": "pipelines#create-a-pipeline" }, { "kind": "code", "literal": "cosine_distance", "chunkId": "pipelines#create-a-pipeline" } ], "sources": [ { "chunkId": "pipelines#create-a-pipeline", "url": "/docs/pipelines#create-a-pipeline", "anchor": "create-a-pipeline" } ], "mode": "agent-primary", "terms": [ "create", "pipeline", "created", "posting", "target", "namespace", "distance", "metric", "defaulting", "cosine", "call", "conflicts", "already", "exists", "curl", "post", "http", "gateway", "8080", "pipelines", "content", "type", "application", "json", "product", "images", "products", "targetnamespace", "distancemetric", "cosinedistance", "defaults", "returns" ] }, { "id": "pipelines#document-lifecycle", "kind": "section", "title": "Pipelines", "heading": "Document lifecycle", "group": "Guides", "url": "/docs/pipelines#document-lifecycle", "summary": "A staged document moves from pending (chunks stored in the cache awaiting embedding) to indexed (vectors written to Turbopuffer); re-staging a document idempotently resets it to pending with new chunks, useful for reprocessing after source data changes.", "facts": [ { "kind": "code", "literal": "stage_document() write_vectors()\n (new doc) ──────────────────► pending ──────────────────► indexed\n ▲\n │ re-stage (idempotent)", "chunkId": "pipelines#document-lifecycle" }, { "kind": "code", "literal": "pending", "chunkId": "pipelines#document-lifecycle" } ], "sources": [ { "chunkId": "pipelines#document-lifecycle", "url": "/docs/pipelines#document-lifecycle", "anchor": "document-lifecycle" } ], "mode": "agent-primary", "terms": [ "document", "lifecycle", "staged", "moves", "pending", "chunks", "stored", "cache", "awaiting", "embedding", "indexed", "vectors", "written", "turbopuffer", "staging", "idempotently", "resets", "useful", "reprocessing", "after", "source", "data", "changes", "stage", "write", "idempotent", "stagedocument", "writevectors", "aerospike", "waiting" ] }, { "id": "pipelines#failure-model", "kind": "section", "title": "Pipelines", "heading": "Failure model", "group": "Guides", "url": "/docs/pipelines#failure-model", "summary": "Upstream write failures are hard: the vectors route errors and the document stays in the embedding stage for re-claim; cache failures do not block chunk reads when S3 backing is present, PostgreSQL connectivity surfaces as a retryable error, and lease expiry is handled server-side so a worker crashing mid-embedding has its documents recovered on the next claim sweep.", "facts": [ { "kind": "code", "literal": "embedding", "chunkId": "pipelines#failure-model" } ], "sources": [ { "chunkId": "pipelines#failure-model", "url": "/docs/pipelines#failure-model", "anchor": "failure-model" } ], "mode": "agent-primary", "terms": [ "failure", "model", "upstream", "write", "failures", "hard", "vectors", "route", "errors", "document", "stays", "embedding", "stage", "claim", "cache", "block", "chunk", "reads", "backing", "present", "postgresql", "connectivity", "surfaces", "retryable", "error", "lease", "expiry", "handled", "server", "side", "worker", "crashing", "documents", "recovered", "next", "sweep", "turbopuffer", "returns", "aerospike", "should" ] }, { "id": "pipelines#gateway-api", "kind": "section", "title": "Pipelines", "heading": "Gateway API", "group": "Guides", "url": "/docs/pipelines#gateway-api", "summary": "Section header introducing the gateway pipeline API.", "facts": [], "sources": [ { "chunkId": "pipelines#gateway-api", "url": "/docs/pipelines#gateway-api", "anchor": "gateway-api" } ], "mode": "agent-primary", "terms": [ "gateway", "section", "header", "introducing", "pipeline" ] }, { "id": "pipelines#get-pipeline-status-keda-polling", "kind": "section", "title": "Pipelines", "heading": "Get pipeline status (KEDA polling)", "group": "Guides", "url": "/docs/pipelines#get-pipeline-status-keda-polling", "summary": "The pipeline status route returns per-stage counts and a pending-count field that the autoscaler watches; when it reaches zero, GPU workers scale to zero.", "facts": [ { "kind": "code", "literal": "curl http://gateway:8080/v2/pipelines/product-images/status", "chunkId": "pipelines#get-pipeline-status-keda-polling" }, { "kind": "code", "literal": "{\n \"pipeline_id\": \"product-images\",\n \"counts\": {\"pending\": 142, \"indexed\": 8530},\n \"pending_count\": 142\n}", "chunkId": "pipelines#get-pipeline-status-keda-polling" }, { "kind": "code", "literal": "pending_count", "chunkId": "pipelines#get-pipeline-status-keda-polling" } ], "sources": [ { "chunkId": "pipelines#get-pipeline-status-keda-polling", "url": "/docs/pipelines#get-pipeline-status-keda-polling", "anchor": "get-pipeline-status-keda-polling" } ], "mode": "agent-primary", "terms": [ "pipeline", "status", "keda", "polling", "route", "returns", "stage", "counts", "pending", "count", "field", "autoscaler", "watches", "reaches", "zero", "workers", "scale", "curl", "http", "gateway", "8080", "pipelines", "product", "images", "indexed", "8530", "pipelineid", "pendingcount", "hits" ] }, { "id": "pipelines#pipeline-crd", "kind": "section", "title": "Pipelines", "heading": "Pipeline CRD", "group": "Guides", "url": "/docs/pipelines#pipeline-crd", "summary": "Declare a Pipeline CRD when the operator should own the worker Deployment and scaling object, naming a target namespace, the worker, and inline scaling whose pool must exist in the cluster infra rules; fixed mode pins to the minimum and disabled or paused scales to zero. Full detail is on the Pipeline CRD page.", "facts": [ { "kind": "code", "literal": "apiVersion: hevlayer.com/v1alpha1\nkind: Pipeline\nmetadata:\n name: product-images\n namespace: layer\nspec:\n target:\n namespace: products\n worker:\n image: ghcr.io/hev/product-image-worker:latest\n batchSize: 64\n timeoutSeconds: 60\n scaling:\n pool: cpu\n mode: autoscale\n replicas:\n min: 0\n max: 8", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "spec.scaling.pool", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "mode: fixed", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "replicas.min", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "mode: disabled", "chunkId": "pipelines#pipeline-crd" }, { "kind": "code", "literal": "spec.paused: true", "chunkId": "pipelines#pipeline-crd" } ], "sources": [ { "chunkId": "pipelines#pipeline-crd", "url": "/docs/pipelines#pipeline-crd", "anchor": "pipeline-crd" } ], "mode": "agent-primary", "terms": [ "pipeline", "declare", "operator", "should", "worker", "deployment", "scaling", "object", "naming", "target", "namespace", "inline", "whose", "pool", "must", "exist", "cluster", "infra", "rules", "fixed", "mode", "pins", "minimum", "disabled", "paused", "scales", "zero", "full", "detail", "page", "apiversion", "hevlayer", "v1alpha1", "kind", "metadata", "name", "product", "images", "layer", "spec" ] }, { "id": "pipelines#pipeline-flow", "kind": "section", "title": "Pipelines", "heading": "Pipeline flow", "group": "Guides", "url": "/docs/pipelines#pipeline-flow", "summary": "In the flow, a CPU worker reads source data, extracts and chunks it, and calls the stage endpoint (scaling on its input queue), while a GPU worker polls pipeline status for pending work, fetches chunks, embeds, and calls the vectors endpoint (scaling on pending count); the gateway handles chunk storage, vector upsert, and state tracking, and workers stay stateless and never connect to gateway-internal stores.", "facts": [ { "kind": "code", "literal": "pending_count > 0", "chunkId": "pipelines#pipeline-flow" }, { "kind": "code", "literal": "pending_count", "chunkId": "pipelines#pipeline-flow" }, { "kind": "value", "literal": "e.g", "chunkId": "pipelines#pipeline-flow" } ], "sources": [ { "chunkId": "pipelines#pipeline-flow", "url": "/docs/pipelines#pipeline-flow", "anchor": "pipeline-flow" } ], "mode": "agent-primary", "terms": [ "pipeline", "flow", "worker", "reads", "source", "data", "extracts", "chunks", "calls", "stage", "endpoint", "scaling", "input", "queue", "while", "polls", "status", "pending", "work", "fetches", "embeds", "vectors", "count", "gateway", "handles", "chunk", "storage", "vector", "upsert", "state", "tracking", "workers", "stay", "stateless", "never", "connect", "internal", "stores", "post", "pipelines" ] }, { "id": "pipelines#prerequisites", "kind": "section", "title": "Pipelines", "heading": "Prerequisites", "group": "Guides", "url": "/docs/pipelines#prerequisites", "summary": "Pipeline routes are registered only when the database connection is configured; the Helm chart points it at the gateway pod's loopback PostgreSQL sidecar and the migration runs automatically on startup.", "facts": [ { "kind": "code", "literal": "export DATABASE_URL=postgres://hevlayer:hevlayer@localhost:5432/hevlayer", "chunkId": "pipelines#prerequisites" }, { "kind": "code", "literal": "DATABASE_URL", "chunkId": "pipelines#prerequisites" } ], "sources": [ { "chunkId": "pipelines#prerequisites", "url": "/docs/pipelines#prerequisites", "anchor": "prerequisites" } ], "mode": "agent-primary", "terms": [ "prerequisites", "pipeline", "routes", "registered", "only", "database", "connection", "configured", "helm", "chart", "points", "gateway", "loopback", "postgresql", "sidecar", "migration", "runs", "automatically", "startup", "export", "postgres", "hevlayer", "localhost", "5432", "databaseurl", "sets" ] }, { "id": "pipelines#read-chunks-and-write-vectors-gpu-worker", "kind": "section", "title": "Pipelines", "heading": "Read chunks and write vectors (GPU worker)", "group": "Guides", "url": "/docs/pipelines#read-chunks-and-write-vectors-gpu-worker", "summary": "A GPU worker reads a document's chunks from the gateway, then after embedding writes vectors back through a route that upserts to Turbopuffer and marks the document indexed.", "facts": [ { "kind": "code", "literal": "curl http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/chunks", "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker" }, { "kind": "code", "literal": "curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/vectors \\\n -H 'content-type: application/json' \\\n -d '{\n \"vectors\": [\n {\"id\": \"asin-B08N5WRWNW-0\", \"vector\": [0.0012, -0.043], \"attributes\": {\"text\": \"...\"}}\n ]\n }'", "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker" }, { "kind": "code", "literal": "indexed", "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker" } ], "sources": [ { "chunkId": "pipelines#read-chunks-and-write-vectors-gpu-worker", "url": "/docs/pipelines#read-chunks-and-write-vectors-gpu-worker", "anchor": "read-chunks-and-write-vectors-gpu-worker" } ], "mode": "agent-primary", "terms": [ "read", "chunks", "write", "vectors", "worker", "reads", "document", "gateway", "after", "embedding", "writes", "back", "through", "route", "upserts", "turbopuffer", "marks", "indexed", "curl", "http", "8080", "pipelines", "product", "images", "documents", "asin", "b08n5wrwnw", "content", "type", "application", "json", "vector", "0012", "attributes", "text" ] }, { "id": "pipelines#stage-a-document-cpu-worker", "kind": "section", "title": "Pipelines", "heading": "Stage a document (CPU worker)", "group": "Guides", "url": "/docs/pipelines#stage-a-document-cpu-worker", "summary": "A CPU worker stages a document by putting its chunks to the gateway; each chunk is stored durably in S3 and cached, the document is marked pending, and re-staging the same id replaces the previous chunk backing and resets it to pending.", "facts": [ { "kind": "code", "literal": "curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW \\\n -H 'content-type: application/json' \\\n -d '{\n \"chunks\": [\n {\"id\": \"asin-B08N5WRWNW-0\", \"text\": \"Wireless noise-cancelling headphones\"},\n {\"id\": \"asin-B08N5WRWNW-1\", \"text\": \"40-hour battery life\", \"metadata\": {\"page\": 2}}\n ]\n }'", "chunkId": "pipelines#stage-a-document-cpu-worker" }, { "kind": "code", "literal": "pipe_{target_namespace}", "chunkId": "pipelines#stage-a-document-cpu-worker" }, { "kind": "code", "literal": "pending", "chunkId": "pipelines#stage-a-document-cpu-worker" } ], "sources": [ { "chunkId": "pipelines#stage-a-document-cpu-worker", "url": "/docs/pipelines#stage-a-document-cpu-worker", "anchor": "stage-a-document-cpu-worker" } ], "mode": "agent-primary", "terms": [ "stage", "document", "worker", "stages", "putting", "chunks", "gateway", "chunk", "stored", "durably", "cached", "marked", "pending", "staging", "same", "replaces", "previous", "backing", "resets", "curl", "http", "8080", "pipelines", "product", "images", "documents", "asin", "b08n5wrwnw", "content", "type", "application", "json", "text", "wireless", "noise", "cancelling", "headphones", "hour", "battery", "life" ] }, { "id": "roadmap", "kind": "section", "title": "Roadmap & Changelog", "heading": null, "group": "Overview", "url": "/docs/roadmap", "summary": "Introduces where hev layer is headed next and what has already shipped.", "facts": [], "sources": [ { "chunkId": "roadmap", "url": "/docs/roadmap", "anchor": null } ], "mode": "agent-primary", "terms": [ "introduces", "layer", "headed", "next", "already", "shipped" ] }, { "id": "roadmap#01-release-uat", "kind": "section", "title": "Roadmap & Changelog", "heading": "0.1 Release (UAT)", "group": "Overview", "url": "/docs/roadmap#01-release-uat", "summary": "Header marking the 0.1 release acceptance-testing milestone.", "facts": [ { "kind": "value", "literal": "0.1", "chunkId": "roadmap#01-release-uat" } ], "sources": [ { "chunkId": "roadmap#01-release-uat", "url": "/docs/roadmap#01-release-uat", "anchor": "01-release-uat" } ], "mode": "agent-primary", "terms": [ "release", "header", "marking", "acceptance", "testing", "milestone" ] }, { "id": "roadmap#api-hardening", "kind": "section", "title": "Roadmap & Changelog", "heading": "API hardening", "group": "Overview", "url": "/docs/roadmap#api-hardening", "summary": "Planned API hardening items: consolidating the scaling CRDs, redesigning the Index CRD, snapshot-scan naming conventions, and removing unused APIs, several tracked by RFCs.", "facts": [ { "kind": "code", "literal": "Pipeline", "chunkId": "roadmap#api-hardening" }, { "kind": "code", "literal": "UDF", "chunkId": "roadmap#api-hardening" }, { "kind": "code", "literal": "InfraRules", "chunkId": "roadmap#api-hardening" }, { "kind": "code", "literal": "Index", "chunkId": "roadmap#api-hardening" }, { "kind": "value", "literal": "github.com", "chunkId": "roadmap#api-hardening" }, { "kind": "value", "literal": "0012-crd-scaling-consolidation.md", "chunkId": "roadmap#api-hardening" }, { "kind": "value", "literal": "0013-index-policy-surface.md", "chunkId": "roadmap#api-hardening" }, { "kind": "value", "literal": "0014-snapshot-noun-scan-verb.md", "chunkId": "roadmap#api-hardening" } ], "sources": [ { "chunkId": "roadmap#api-hardening", "url": "/docs/roadmap#api-hardening", "anchor": "api-hardening" } ], "mode": "agent-primary", "terms": [ "hardening", "planned", "items", "consolidating", "scaling", "crds", "redesigning", "index", "snapshot", "scan", "naming", "conventions", "removing", "unused", "apis", "several", "tracked", "rfcs", "pipeline", "infrarules", "github", "0012", "consolidation", "0013", "policy", "surface", "0014", "noun", "verb", "redesign", "remove" ] }, { "id": "roadmap#later", "kind": "section", "title": "Roadmap & Changelog", "heading": "Later", "group": "Overview", "url": "/docs/roadmap#later", "summary": "Longer-horizon roadmap items including scoped API keys and entitlements, soft delete with restore, hybrid fuzzy text fusion, typeahead, temporal as-of queries, branching, an exact kNN result cache, A/B variant indexes, per-query LLM-judged quality, pipeline crash recovery, dead-letter listing, a Python UDF push dev experience, and a cost API.", "facts": [ { "kind": "code", "literal": "as_of", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "/query", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "/scans", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "/fetch", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "/snapshots", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "copy_from_with_filter", "chunkId": "roadmap#later" }, { "kind": "code", "literal": "layer push", "chunkId": "roadmap#later" }, { "kind": "value", "literal": "github.com", "chunkId": "roadmap#later" }, { "kind": "value", "literal": "0022-hybrid-text-fusion.md", "chunkId": "roadmap#later" }, { "kind": "value", "literal": "0020-temporal-queries.md", "chunkId": "roadmap#later" } ], "sources": [ { "chunkId": "roadmap#later", "url": "/docs/roadmap#later", "anchor": "later" } ], "mode": "agent-primary", "terms": [ "later", "longer", "horizon", "roadmap", "items", "including", "scoped", "keys", "entitlements", "soft", "delete", "restore", "hybrid", "fuzzy", "text", "fusion", "typeahead", "temporal", "queries", "branching", "exact", "result", "cache", "variant", "indexes", "query", "judged", "quality", "pipeline", "crash", "recovery", "dead", "letter", "listing", "python", "push", "experience", "cost", "scans", "fetch" ] }, { "id": "roadmap#lifecycle-and-operability", "kind": "section", "title": "Roadmap & Changelog", "heading": "Lifecycle and operability", "group": "Overview", "url": "/docs/roadmap#lifecycle-and-operability", "summary": "Shipped lifecycle and operability items: autoscaling compute for pipelines and UDFs, a document-cache endpoint for multi-stage pipelines, index snapshot history, coordinated delete, and the Helm and Terraform install scripts.", "facts": [], "sources": [ { "chunkId": "roadmap#lifecycle-and-operability", "url": "/docs/roadmap#lifecycle-and-operability", "anchor": "lifecycle-and-operability" } ], "mode": "agent-primary", "terms": [ "lifecycle", "operability", "shipped", "items", "autoscaling", "compute", "pipelines", "udfs", "document", "cache", "endpoint", "multi", "stage", "index", "snapshot", "history", "coordinated", "delete", "helm", "terraform", "install", "scripts", "building" ] }, { "id": "roadmap#search", "kind": "section", "title": "Roadmap & Changelog", "heading": "Search", "group": "Overview", "url": "/docs/roadmap#search", "summary": "Shipped search features: strongly consistent queries during heavy writes, result count over ranked queries via scatter/gather, precomputed facet listings and counts in snapshots, scans for filter IDs and counts, search-by-id, search history saved to S3, and enhanced namespace metadata.", "facts": [ { "kind": "code", "literal": "_hevlayer_upserted_at", "chunkId": "roadmap#search" } ], "sources": [ { "chunkId": "roadmap#search", "url": "/docs/roadmap#search", "anchor": "search" } ], "mode": "agent-primary", "terms": [ "search", "shipped", "features", "strongly", "consistent", "queries", "during", "heavy", "writes", "result", "count", "ranked", "scatter", "gather", "precomputed", "facet", "listings", "counts", "snapshots", "scans", "filter", "history", "saved", "enhanced", "namespace", "metadata", "hevlayer", "upserted", "hevlayerupsertedat", "vector", "available", "snapshot", "document", "cached" ] }, { "id": "roadmap#surfaces", "kind": "section", "title": "Roadmap & Changelog", "heading": "Surfaces", "group": "Overview", "url": "/docs/roadmap#surfaces", "summary": "Shipped surfaces: a dashboard MVP with basic CRD management and observability, and an official Python SDK.", "facts": [], "sources": [ { "chunkId": "roadmap#surfaces", "url": "/docs/roadmap#surfaces", "anchor": "surfaces" } ], "mode": "agent-primary", "terms": [ "surfaces", "shipped", "dashboard", "basic", "management", "observability", "official", "python" ] }, { "id": "roadmap#up-next", "kind": "section", "title": "Roadmap & Changelog", "heading": "Up Next", "group": "Overview", "url": "/docs/roadmap#up-next", "summary": "Near-term items: count and scan primitives and route renames, an indexing failure-mode runbook, embedding UDF writeback via re-upsert, a namespace-init UDF, a snapshot-aware ready signal, a full dashboard redesign, and a kube-style CLI over the gateway REST API.", "facts": [ { "kind": "code", "literal": "layer.is_stable", "chunkId": "roadmap#up-next" }, { "kind": "code", "literal": "layer", "chunkId": "roadmap#up-next" }, { "kind": "value", "literal": "github.com", "chunkId": "roadmap#up-next" }, { "kind": "value", "literal": "0019-count-and-scan-primitives.md", "chunkId": "roadmap#up-next" } ], "sources": [ { "chunkId": "roadmap#up-next", "url": "/docs/roadmap#up-next", "anchor": "up-next" } ], "mode": "agent-primary", "terms": [ "next", "near", "term", "items", "count", "scan", "primitives", "route", "renames", "indexing", "failure", "mode", "runbook", "embedding", "writeback", "upsert", "namespace", "init", "snapshot", "aware", "ready", "signal", "full", "dashboard", "redesign", "kube", "style", "gateway", "rest", "layer", "stable", "github", "0019", "filter", "truncation", "removal", "aerospike", "stop", "writes", "postgres" ] }, { "id": "scans", "kind": "section", "title": "Scans", "heading": null, "group": "Guides", "url": "/docs/scans", "summary": "Scans answer ad hoc filter questions about a namespace: ID mode creates an async job returning matching ids, and count mode returns one number synchronously using the latest snapshot when the filter is covered; uses include bulk exports, manual inspection, UDF discovery debugging, cache/origin consistency checks, and exact filter row counts.", "facts": [], "sources": [ { "chunkId": "scans", "url": "/docs/scans", "anchor": null } ], "mode": "agent-primary", "terms": [ "scans", "answer", "filter", "questions", "about", "namespace", "mode", "creates", "async", "returning", "matching", "count", "returns", "number", "synchronously", "latest", "snapshot", "covered", "uses", "include", "bulk", "exports", "manual", "inspection", "discovery", "debugging", "cache", "origin", "consistency", "checks", "exact", "counts", "shaped", "jobs", "synchronous", "asynchronous", "document" ] }, { "id": "scans#count-scans", "kind": "section", "title": "Scans", "heading": "Count scans", "group": "Guides", "url": "/docs/scans#count-scans", "summary": "A count scan posts a filter and source and returns a count with the serving source; auto checks the latest snapshot first for single-field equality and membership filters and falls through to cache or origin otherwise, while an explicit snapshot source requires a supported filter and fails with a precondition error if unsupported.", "facts": [ { "kind": "code", "literal": "curl -X POST http://gateway:8080/v2/namespaces/products/scans \\\n -H 'content-type: application/json' \\\n -d '{\"mode\": \"count\", \"source\": \"auto\", \"filters\": [\"category\", \"Eq\", \"Electronics\"]}'", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "{\n \"count\": 4210,\n \"served_by\": \"snapshot\",\n \"snapshot_sha\": \"3f9e8b21\",\n \"watermark_ms\": 1747300000123,\n \"elapsed_ms\": 3\n}", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "source: auto", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "Eq", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "In", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "snapshot", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "source: snapshot", "chunkId": "scans#count-scans" }, { "kind": "code", "literal": "412 precondition_failed", "chunkId": "scans#count-scans" } ], "sources": [ { "chunkId": "scans#count-scans", "url": "/docs/scans#count-scans", "anchor": "count-scans" } ], "mode": "agent-primary", "terms": [ "count", "scans", "scan", "posts", "filter", "source", "returns", "serving", "auto", "checks", "latest", "snapshot", "first", "single", "field", "equality", "membership", "filters", "falls", "through", "cache", "origin", "otherwise", "while", "explicit", "requires", "supported", "fails", "precondition", "error", "unsupported", "curl", "post", "http", "gateway", "8080", "namespaces", "products", "content", "type" ] }, { "id": "scans#filters", "kind": "section", "title": "Scans", "heading": "Filters", "group": "Guides", "url": "/docs/scans#filters", "summary": "Scans accept the same filter array as query; on origin scans the filter is pushed to Turbopuffer and on cache scans the gateway evaluates a supported set of operators against cached attributes. Auto uses origin when the cache cannot evaluate a filter, while an explicit cache source with an unsupported filter fails rather than returning partial results.", "facts": [ { "kind": "code", "literal": "Eq", "chunkId": "scans#filters" }, { "kind": "code", "literal": "NotEq", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Gt", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Gte", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Lt", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Lte", "chunkId": "scans#filters" }, { "kind": "code", "literal": "In", "chunkId": "scans#filters" }, { "kind": "code", "literal": "NotIn", "chunkId": "scans#filters" }, { "kind": "code", "literal": "And", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Or", "chunkId": "scans#filters" }, { "kind": "code", "literal": "Not", "chunkId": "scans#filters" }, { "kind": "code", "literal": "auto", "chunkId": "scans#filters" }, { "kind": "code", "literal": "source: cache", "chunkId": "scans#filters" } ], "sources": [ { "chunkId": "scans#filters", "url": "/docs/scans#filters", "anchor": "filters" } ], "mode": "agent-primary", "terms": [ "filters", "scans", "accept", "same", "filter", "array", "query", "origin", "pushed", "turbopuffer", "cache", "gateway", "evaluates", "supported", "operators", "against", "cached", "attributes", "auto", "uses", "cannot", "evaluate", "while", "explicit", "source", "unsupported", "fails", "rather", "returning", "partial", "results", "noteq", "notin", "document", "sees" ] }, { "id": "scans#id-scans", "kind": "section", "title": "Scans", "heading": "ID scans", "group": "Guides", "url": "/docs/scans#id-scans", "summary": "An ID scan posts a filter and source and returns an accepted job; the caller polls the job and then reads the matching ids paginated from a results route.", "facts": [ { "kind": "code", "literal": "curl -X POST http://gateway:8080/v2/namespaces/products/scans \\\n -H 'content-type: application/json' \\\n -d '{\"mode\": \"ids\", \"source\": \"auto\", \"filters\": [\"category\", \"Eq\", \"Electronics\"]}'", "chunkId": "scans#id-scans" }, { "kind": "code", "literal": "{\n \"id\": \"scan-uuid\",\n \"namespace\": \"products\",\n \"source\": \"auto\",\n \"status\": \"running\",\n \"progress\": 0,\n \"documents_scanned\": 0,\n \"created_at\": \"2026-05-26T10:00:00Z\"\n}", "chunkId": "scans#id-scans" }, { "kind": "code", "literal": "curl http://gateway:8080/v2/namespaces/products/scans/scan-uuid\ncurl 'http://gateway:8080/v2/namespaces/products/scans/scan-uuid/results?limit=1000'", "chunkId": "scans#id-scans" }, { "kind": "code", "literal": "202 Accepted", "chunkId": "scans#id-scans" } ], "sources": [ { "chunkId": "scans#id-scans", "url": "/docs/scans#id-scans", "anchor": "id-scans" } ], "mode": "agent-primary", "terms": [ "scans", "scan", "posts", "filter", "source", "returns", "accepted", "caller", "polls", "reads", "matching", "paginated", "results", "route", "curl", "post", "http", "gateway", "8080", "namespaces", "products", "content", "type", "application", "json", "mode", "auto", "filters", "category", "electronics", "uuid", "namespace", "status", "running", "progress", "documents", "scanned", "created", "2026", "26t10" ] }, { "id": "scans#operational-notes", "kind": "section", "title": "Scans", "heading": "Operational notes", "group": "Guides", "url": "/docs/scans#operational-notes", "summary": "ID scan state is in-memory and resets on gateway restart, count scans carry a deadline with a server-side maximum, snapshot-served counts are exact at the snapshot watermark, and live counts include bounded, timed-out, and shard fields.", "facts": [ { "kind": "code", "literal": "watermark_ms", "chunkId": "scans#operational-notes" }, { "kind": "code", "literal": "bounded", "chunkId": "scans#operational-notes" }, { "kind": "code", "literal": "timed_out", "chunkId": "scans#operational-notes" } ], "sources": [ { "chunkId": "scans#operational-notes", "url": "/docs/scans#operational-notes", "anchor": "operational-notes" } ], "mode": "agent-primary", "terms": [ "operational", "notes", "scan", "state", "memory", "resets", "gateway", "restart", "count", "scans", "carry", "deadline", "server", "side", "maximum", "snapshot", "served", "counts", "exact", "watermark", "live", "include", "bounded", "timed", "shard", "fields", "ephemeral", "default", "300s", "watermarkms", "timedout" ] }, { "id": "scans#sources", "kind": "section", "title": "Scans", "heading": "Sources", "group": "Guides", "url": "/docs/scans#sources", "summary": "Lists the scan sources per mode: auto (cache when fresh else origin; snapshot first then cache/origin for counts), snapshot (count only, requiring eligible equality/membership), cache (cache only), and origin (paginated upstream scan). When auto resolves to cache the gateway adds a warmed-through upper bound before the user filter so the scan is a stable warmed view.", "facts": [ { "kind": "code", "literal": "auto", "chunkId": "scans#sources" }, { "kind": "code", "literal": "snapshot", "chunkId": "scans#sources" }, { "kind": "code", "literal": "Eq", "chunkId": "scans#sources" }, { "kind": "code", "literal": "In", "chunkId": "scans#sources" }, { "kind": "code", "literal": "cache", "chunkId": "scans#sources" }, { "kind": "code", "literal": "origin", "chunkId": "scans#sources" }, { "kind": "code", "literal": "_hevlayer_upserted_at <= cache_warmed_through", "chunkId": "scans#sources" } ], "sources": [ { "chunkId": "scans#sources", "url": "/docs/scans#sources", "anchor": "sources" } ], "mode": "agent-primary", "terms": [ "sources", "lists", "scan", "mode", "auto", "cache", "fresh", "else", "origin", "snapshot", "first", "counts", "count", "only", "requiring", "eligible", "equality", "membership", "paginated", "upstream", "resolves", "gateway", "adds", "warmed", "through", "upper", "bound", "before", "user", "filter", "stable", "view", "hevlayer", "upserted", "source", "enough", "otherwise", "supported", "latest", "requires" ] }, { "id": "search-knowledge-graph", "kind": "section", "title": "Search Knowledge Graph", "heading": null, "group": "Guides", "url": "/docs/search-knowledge-graph", "summary": "This page documents the generated knowledge graph the docs search bundles to expand domain terms before ranking pages, including query context, canonical terms, aliases, and the raw JSON artifact rendered from the committed site build.", "facts": [ { "kind": "value", "literal": "KnowledgeGraphView.astro", "chunkId": "search-knowledge-graph" } ], "sources": [ { "chunkId": "search-knowledge-graph", "url": "/docs/search-knowledge-graph", "anchor": null } ], "mode": "agent-primary", "terms": [ "page", "documents", "generated", "knowledge", "graph", "docs", "search", "bundles", "expand", "domain", "terms", "before", "ranking", "pages", "including", "query", "context", "canonical", "aliases", "json", "artifact", "rendered", "committed", "site", "build", "knowledgegraphview", "astro", "currently", "bundled", "layer", "endpoint", "uses", "candidate", "renders", "exact" ] }, { "id": "search-knowledge-graph#current-graph", "kind": "section", "title": "Search Knowledge Graph", "heading": "Current graph", "group": "Guides", "url": "/docs/search-knowledge-graph#current-graph", "summary": "Header introducing the rendering of the current committed knowledge-graph artifact.", "facts": [], "sources": [ { "chunkId": "search-knowledge-graph#current-graph", "url": "/docs/search-knowledge-graph#current-graph", "anchor": "current-graph" } ], "mode": "agent-primary", "terms": [ "current", "graph", "header", "introducing", "rendering", "committed", "knowledge", "artifact" ] }, { "id": "tradeoffs", "kind": "section", "title": "Tradeoffs", "heading": null, "group": "Overview", "url": "/docs/tradeoffs", "summary": "This page makes Layer's design tradeoffs explicit, with configuration offered where possible: it adds query-path latency through an extra network hop (not configurable) and an optionally configurable strong-consistency query plan, and increases index storage through secondary indexing for upsert-time filtering and for scatter/gather sharding (both not configurable).", "facts": [ { "kind": "flag", "literal": "--muted", "chunkId": "tradeoffs" }, { "kind": "flag", "literal": "--signal", "chunkId": "tradeoffs" } ], "sources": [ { "chunkId": "tradeoffs", "url": "/docs/tradeoffs", "anchor": null } ], "mode": "agent-primary", "terms": [ "page", "makes", "layer", "design", "tradeoffs", "explicit", "configuration", "offered", "possible", "adds", "query", "path", "latency", "through", "extra", "network", "configurable", "optionally", "strong", "consistency", "plan", "increases", "index", "storage", "secondary", "indexing", "upsert", "time", "filtering", "scatter", "gather", "sharding", "both", "muted", "signal", "current", "product", "posture", "cases", "trying" ] }, { "id": "udfs", "kind": "section", "title": "UDFs", "heading": null, "group": "Guides", "url": "/docs/udfs", "summary": "A UDF is a stateless worker that preserves row count, producing one derived attribute per input row, used for embeddings, classifications, tags, and backfills; use a pipeline when external data becomes rows or one row fans out into many, and a UDF when existing rows acquire derived attributes. The gateway runs an ID-scan discovery, enqueues rows, leases them to a worker via claim/complete, and writes results back to Turbopuffer.", "facts": [ { "kind": "value", "literal": "Diagram.astro", "chunkId": "udfs" }, { "kind": "value", "literal": "spec.filter", "chunkId": "udfs" } ], "sources": [ { "chunkId": "udfs", "url": "/docs/udfs", "anchor": null } ], "mode": "agent-primary", "terms": [ "stateless", "worker", "preserves", "count", "producing", "derived", "attribute", "input", "embeddings", "classifications", "tags", "backfills", "pipeline", "external", "data", "becomes", "rows", "fans", "many", "existing", "acquire", "attributes", "gateway", "runs", "scan", "discovery", "enqueues", "leases", "claim", "complete", "writes", "results", "back", "turbopuffer", "diagram", "astro", "spec", "filter", "user", "defined" ] }, { "id": "udfs#author-a-worker", "kind": "section", "title": "UDFs", "heading": "Author a worker", "group": "Guides", "url": "/docs/udfs#author-a-worker", "summary": "The Python SDK turns a normal function into the claim/process/complete loop via a decorator declaring inputs, output attribute, and kind; function parameters are keyword-only and named to match the inputs, and the author raises a transient error for retryable work and a permanent error for unrecoverable input.", "facts": [ { "kind": "code", "literal": "inputs", "chunkId": "udfs#author-a-worker" }, { "kind": "code", "literal": "TransientError", "chunkId": "udfs#author-a-worker" }, { "kind": "code", "literal": "PermanentError", "chunkId": "udfs#author-a-worker" } ], "sources": [ { "chunkId": "udfs#author-a-worker", "url": "/docs/udfs#author-a-worker", "anchor": "author-a-worker" } ], "mode": "agent-primary", "terms": [ "author", "worker", "python", "turns", "normal", "function", "claim", "process", "complete", "loop", "decorator", "declaring", "inputs", "output", "attribute", "kind", "parameters", "keyword", "only", "named", "match", "raises", "transient", "error", "retryable", "work", "permanent", "unrecoverable", "input", "transienterror", "permanenterror", "hevlayer", "import", "runudfworker", "title", "description", "tags", "tagproduct", "none", "list" ] }, { "id": "udfs#declare-the-function", "kind": "section", "title": "UDFs", "heading": "Declare the function", "group": "Guides", "url": "/docs/udfs#declare-the-function", "summary": "A UDF is declared by applying a Function CRD, from which the operator emits a worker Deployment, an optional push Service, and a scaling object, and from which the gateway registers the UDF queue and discovery policy; the filter uses the same tuple syntax as upstream queries, the worker pod receives Layer environment variables, and the CRD is the source of truth so the runtime routes are only for registration without the operator or for manual recovery.", "facts": [ { "kind": "code", "literal": "Function", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "Deployment", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "Service", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "ScaledObject", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "spec.scaling", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "spec.filter", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "HEVLAYER_UDF_ID", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "HEVLAYER_BASE_URL", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "HEVLAYER_UDF_BATCH_SIZE", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "HEVLAYER_UDF_TIMEOUT_SECONDS", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "HEVLAYER_UDF_LEASE_SECONDS", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "LAYER_GATEWAY_API_KEY", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/discover", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "claim", "chunkId": "udfs#declare-the-function" }, { "kind": "code", "literal": "complete", "chunkId": "udfs#declare-the-function" }, { "kind": "value", "literal": "0.1", "chunkId": "udfs#declare-the-function" } ], "sources": [ { "chunkId": "udfs#declare-the-function", "url": "/docs/udfs#declare-the-function", "anchor": "declare-the-function" } ], "mode": "agent-primary", "terms": [ "declare", "function", "declared", "applying", "operator", "emits", "worker", "deployment", "optional", "push", "service", "scaling", "object", "gateway", "registers", "queue", "discovery", "policy", "filter", "uses", "same", "tuple", "syntax", "upstream", "queries", "receives", "layer", "environment", "variables", "source", "truth", "runtime", "routes", "only", "registration", "without", "manual", "recovery", "scaledobject", "spec" ] }, { "id": "udfs#gateway-api", "kind": "section", "title": "UDFs", "heading": "Gateway API", "group": "Guides", "url": "/docs/udfs#gateway-api", "summary": "In Kubernetes installs the Function CRD is the source of truth and the runtime API is registered from it; these routes are the same surface the Python SDK drives and the path for registering a UDF without the operator or coordinating and recovering workers by hand.", "facts": [], "sources": [ { "chunkId": "udfs#gateway-api", "url": "/docs/udfs#gateway-api", "anchor": "gateway-api" } ], "mode": "agent-primary", "terms": [ "gateway", "kubernetes", "installs", "function", "source", "truth", "runtime", "registered", "these", "routes", "same", "surface", "python", "drives", "path", "registering", "without", "operator", "coordinating", "recovering", "workers", "hand", "below", "reach", "register", "coordinate", "recover" ] }, { "id": "udfs#lifecycle", "kind": "section", "title": "UDFs", "heading": "Lifecycle", "group": "Guides", "url": "/docs/udfs#lifecycle", "summary": "UDF lifecycle is managed via kubectl on the Function resource and gateway routes: read status, pause and resume by patching the spec, reset failed rows, and delete; deletion garbage-collects the operator-managed Deployment, Service, and scaling object but does not delete written outputs.", "facts": [], "sources": [ { "chunkId": "udfs#lifecycle", "url": "/docs/udfs#lifecycle", "anchor": "lifecycle" } ], "mode": "agent-primary", "terms": [ "lifecycle", "managed", "kubectl", "function", "resource", "gateway", "routes", "read", "status", "pause", "resume", "patching", "spec", "reset", "failed", "rows", "delete", "deletion", "garbage", "collects", "operator", "deployment", "service", "scaling", "object", "does", "written", "outputs", "product", "tags", "describe", "curl", "authorization", "bearer", "layergatewayapikey", "layergatewayurl", "udfs", "patch", "type", "merge" ] }, { "id": "udfs#lifecycle-routes", "kind": "section", "title": "UDFs", "heading": "Lifecycle routes", "group": "Guides", "url": "/docs/udfs#lifecycle-routes", "summary": "Lists the lifecycle routes: pause (stop discovery and dispatch, draining in-flight), resume, reset-failed (move failed rows back to pending), and discover (trigger an immediate sweep); reset-failed is the recovery path after a transient incident, while permanent issues need fixing the input shape or bumping the output version and re-applying.", "facts": [ { "kind": "code", "literal": "POST /v2/udfs/{id}/pause", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/resume", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/reset-failed", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "failed", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "pending", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/discover", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "reset-failed", "chunkId": "udfs#lifecycle-routes" }, { "kind": "code", "literal": "spec.output.version", "chunkId": "udfs#lifecycle-routes" } ], "sources": [ { "chunkId": "udfs#lifecycle-routes", "url": "/docs/udfs#lifecycle-routes", "anchor": "lifecycle-routes" } ], "mode": "agent-primary", "terms": [ "lifecycle", "routes", "lists", "pause", "stop", "discovery", "dispatch", "draining", "flight", "resume", "reset", "failed", "move", "rows", "back", "pending", "discover", "trigger", "immediate", "sweep", "recovery", "path", "after", "transient", "incident", "while", "permanent", "issues", "need", "fixing", "input", "shape", "bumping", "output", "version", "applying", "post", "udfs", "spec", "route" ] }, { "id": "udfs#not-in-01", "kind": "section", "title": "UDFs", "heading": "Not in 0.1", "group": "Guides", "url": "/docs/udfs#not-in-01", "summary": "Not in 0.1: cross-namespace aggregate UDFs, chunkers or fan-out transforms (which remain pipelines), multi-output UDFs, and managed image builds.", "facts": [ { "kind": "value", "literal": "0.1", "chunkId": "udfs#not-in-01" } ], "sources": [ { "chunkId": "udfs#not-in-01", "url": "/docs/udfs#not-in-01", "anchor": "not-in-01" } ], "mode": "agent-primary", "terms": [ "cross", "namespace", "aggregate", "udfs", "chunkers", "transforms", "remain", "pipelines", "multi", "output", "managed", "image", "builds", "those" ] }, { "id": "udfs#scaling-and-placement", "kind": "section", "title": "UDFs", "heading": "Scaling and placement", "group": "Guides", "url": "/docs/udfs#scaling-and-placement", "summary": "A UDF's scaling spec names a compute pool, a mode, and replica min/max bounded by the pool ceiling, with the minimum set to one for warm workers; the cluster infra-rules object owns shared placement (selectors, tolerations, resource requests, replica ceilings) so workload specs only choose a pool, and extra pod-level config is deep-merged from the worker pod spec without merging container array overrides.", "facts": [ { "kind": "code", "literal": "spec.scaling", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "pool", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "InfraRules/default", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "mode", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "autoscale", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "fixed", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "disabled", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "replicas.min", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "replicas.max", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "InfraRules", "chunkId": "udfs#scaling-and-placement" }, { "kind": "code", "literal": "spec.worker.podSpec", "chunkId": "udfs#scaling-and-placement" } ], "sources": [ { "chunkId": "udfs#scaling-and-placement", "url": "/docs/udfs#scaling-and-placement", "anchor": "scaling-and-placement" } ], "mode": "agent-primary", "terms": [ "scaling", "placement", "spec", "names", "compute", "pool", "mode", "replica", "bounded", "ceiling", "minimum", "warm", "workers", "cluster", "infra", "rules", "object", "owns", "shared", "selectors", "tolerations", "resource", "requests", "ceilings", "workload", "specs", "only", "choose", "extra", "level", "config", "deep", "merged", "worker", "without", "merging", "container", "array", "overrides", "infrarules" ] }, { "id": "udfs#spec-routes", "kind": "section", "title": "UDFs", "heading": "Spec routes", "group": "Guides", "url": "/docs/udfs#spec-routes", "summary": "Lists the UDF spec routes (create a definition and queue, list, read, delete which preserves written output, and read status counts); the create body carries the same shape the CRD spec expresses, covering target namespaces, inputs, output, filter, triggers, worker, schedule, and retry.", "facts": [ { "kind": "code", "literal": "POST /v2/udfs", "chunkId": "udfs#spec-routes" }, { "kind": "code", "literal": "GET /v2/udfs", "chunkId": "udfs#spec-routes" }, { "kind": "code", "literal": "GET /v2/udfs/{id}", "chunkId": "udfs#spec-routes" }, { "kind": "code", "literal": "DELETE /v2/udfs/{id}", "chunkId": "udfs#spec-routes" }, { "kind": "code", "literal": "GET /v2/udfs/{id}/status", "chunkId": "udfs#spec-routes" }, { "kind": "code", "literal": "spec", "chunkId": "udfs#spec-routes" } ], "sources": [ { "chunkId": "udfs#spec-routes", "url": "/docs/udfs#spec-routes", "anchor": "spec-routes" } ], "mode": "agent-primary", "terms": [ "spec", "routes", "lists", "create", "definition", "queue", "list", "read", "delete", "preserves", "written", "output", "status", "counts", "body", "carries", "same", "shape", "expresses", "covering", "target", "namespaces", "inputs", "filter", "triggers", "worker", "schedule", "retry", "post", "udfs", "route", "behavior", "does", "depth", "flight", "failed", "content", "type", "application", "json" ] }, { "id": "udfs#tuning-knobs", "kind": "section", "title": "UDFs", "heading": "Tuning knobs", "group": "Guides", "url": "/docs/udfs#tuning-knobs", "summary": "Lists the UDF tuning knobs: rows per batch, worker call timeout, claim lease duration, time between discovery scans, concurrent in-flight batches per UDF, concurrent discovery scans, and retry attempts before a row lands in failed.", "facts": [ { "kind": "code", "literal": "worker.batchSize", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "worker.timeoutSeconds", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "schedule.leaseSeconds", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "schedule.discoveryIntervalSeconds", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "schedule.maxInFlightBatches", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "schedule.maxConcurrentScans", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "retry.maxAttempts", "chunkId": "udfs#tuning-knobs" }, { "kind": "code", "literal": "failed", "chunkId": "udfs#tuning-knobs" } ], "sources": [ { "chunkId": "udfs#tuning-knobs", "url": "/docs/udfs#tuning-knobs", "anchor": "tuning-knobs" } ], "mode": "agent-primary", "terms": [ "tuning", "knobs", "lists", "rows", "batch", "worker", "call", "timeout", "claim", "lease", "duration", "time", "between", "discovery", "scans", "concurrent", "flight", "batches", "retry", "attempts", "before", "lands", "failed", "batchsize", "timeoutseconds", "schedule", "leaseseconds", "discoveryintervalseconds", "maxinflightbatches", "maxconcurrentscans", "maxattempts", "knob", "bounds", "long", "held", "reissue", "scan", "jobs", "namespace", "tries" ] }, { "id": "udfs#version-markers", "kind": "section", "title": "UDFs", "heading": "Version markers", "group": "Guides", "url": "/docs/udfs#version-markers", "summary": "The output version field is the re-run safety rail: when set, the gateway stamps a per-output version marker alongside every write, so bumping the version and keeping the canonical stale filter triggers re-processing when a model, taxonomy, or prompt changes.", "facts": [ { "kind": "code", "literal": "spec.output.version", "chunkId": "udfs#version-markers" }, { "kind": "code", "literal": "{attribute}_v", "chunkId": "udfs#version-markers" } ], "sources": [ { "chunkId": "udfs#version-markers", "url": "/docs/udfs#version-markers", "anchor": "version-markers" } ], "mode": "agent-primary", "terms": [ "version", "markers", "output", "field", "safety", "rail", "gateway", "stamps", "marker", "alongside", "every", "write", "bumping", "keeping", "canonical", "stale", "filter", "triggers", "processing", "model", "taxonomy", "prompt", "changes", "spec", "attribute", "bump", "keep" ] }, { "id": "udfs#worker-coordination-routes", "kind": "section", "title": "UDFs", "heading": "Worker coordination routes", "group": "Guides", "url": "/docs/udfs#worker-coordination-routes", "summary": "Lists the worker coordination routes (claim a batch, heartbeat to extend leases, complete to persist output, and fail) that the SDK's worker loop implements so most workloads never call them directly; claim returns batches as namespace/id pairs with declared input columns, rows that cannot be bound surface as explicit bind errors, and on failure a transient kind honors retry while a permanent kind dead-letters immediately.", "facts": [ { "kind": "code", "literal": "POST /v2/udfs/product-tags/items/complete\nContent-Type: application/json\n\n{\n \"worker_id\": \"udf-worker-0\",\n \"items\": [\n {\"namespace\": \"amazon-products\", \"id\": \"asin-B08N5WRWNW\", \"output\": [\"wireless\", \"waterproof\"]}\n ]\n}", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/claim", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/items/heartbeat", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/items/complete", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "POST /v2/udfs/{id}/items/fail", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "run_udf_worker", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "claim", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "(namespace, id)", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "fail", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "kind: transient", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "spec.retry", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "kind: permanent", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "kind", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "TransientError", "chunkId": "udfs#worker-coordination-routes" }, { "kind": "code", "literal": "PermanentError", "chunkId": "udfs#worker-coordination-routes" } ], "sources": [ { "chunkId": "udfs#worker-coordination-routes", "url": "/docs/udfs#worker-coordination-routes", "anchor": "worker-coordination-routes" } ], "mode": "agent-primary", "terms": [ "worker", "coordination", "routes", "lists", "claim", "batch", "heartbeat", "extend", "leases", "complete", "persist", "output", "fail", "loop", "implements", "most", "workloads", "never", "call", "directly", "returns", "batches", "namespace", "pairs", "declared", "input", "columns", "rows", "cannot", "bound", "surface", "explicit", "bind", "errors", "failure", "transient", "kind", "honors", "retry", "while" ] }, { "id": "udfs#writeback-and-discovery", "kind": "section", "title": "UDFs", "heading": "Writeback and discovery", "group": "Guides", "url": "/docs/udfs#writeback-and-discovery", "summary": "UDF outputs are patched onto the target row as the named attribute with the same writeback semantics across kinds, and when the output version is set the gateway atomically writes the output and its version marker in a single patch; discovery sweeps run an ID scan with the spec filter per target namespace, enqueue and dedupe returned ids, run an implicit first sweep after apply, and run subsequent sweeps on the configured interval.", "facts": [ { "kind": "code", "literal": "output.kind", "chunkId": "udfs#writeback-and-discovery" }, { "kind": "code", "literal": "spec.output.version", "chunkId": "udfs#writeback-and-discovery" }, { "kind": "code", "literal": "{attribute}_v", "chunkId": "udfs#writeback-and-discovery" }, { "kind": "code", "literal": "spec.filter", "chunkId": "udfs#writeback-and-discovery" }, { "kind": "code", "literal": "target_namespace", "chunkId": "udfs#writeback-and-discovery" }, { "kind": "code", "literal": "schedule.discovery_interval_seconds", "chunkId": "udfs#writeback-and-discovery" } ], "sources": [ { "chunkId": "udfs#writeback-and-discovery", "url": "/docs/udfs#writeback-and-discovery", "anchor": "writeback-and-discovery" } ], "mode": "agent-primary", "terms": [ "writeback", "discovery", "outputs", "patched", "onto", "target", "named", "attribute", "same", "semantics", "across", "kinds", "output", "version", "gateway", "atomically", "writes", "marker", "single", "patch", "sweeps", "scan", "spec", "filter", "namespace", "enqueue", "dedupe", "returned", "implicit", "first", "sweep", "after", "apply", "subsequent", "configured", "interval", "kind", "schedule", "seconds", "type" ] } ], "edges": [] } ``` --- # Introduction Source: https://hevlayer.com/docs import Diagram from "../../components/docs/Diagram.astro"; import { layerMapDiagram } from "../../lib/diagrams"; Layer provides a set of drop-in enhancements to your favorite retrieval systems. Layer lets you scale your own compute over [multi-stage pipelines](/docs/pipelines), reason about the state of your index, observe clickstream, track cost, and more. {layerMapDiagram} You run two server components in your own cluster: a Rust **gateway** and a Kubernetes **operator**. The **gateway** is a transparent proxy in front of Turbopuffer. It extends native clients with [fetch](/docs/api/query#fetch), [scans](/docs/scans), [snapshots](/docs/api/snapshots), [result count](/docs/api/result-count), and operator-facing semantics around the cache, write path, and [pipelines](/docs/pipelines) — you swap in Layer's drop-in client and change nothing else. It also drives the function runtime: discovering [UDF](/docs/udfs) work, leasing it to worker pools, retrying, and writing results back, with KEDA scaling each pool to zero between bursts. In addition to a set of [wire-compatible clients](/docs/install), Layer also ships an optional GUI [dashboard](/docs/dashboard). The dashboard manages cluster configuration through CRDs; all other state is persisted in object storage (S3). No durable state lives in a Layer process, so the compute tier is stateless and fully elastic. Because indexing is bursty — especially GPU-bound work — our [Terraform](/docs/install/terraform) installs [Karpenter](https://karpenter.sh) as a cluster autoscaler to provision and scale the nodes Layer's compute runs on. The remaining backing services are the document cache, the indexing-state store, and the metrics store. Every component Layer runs alongside is open source: - **[Karpenter](https://karpenter.sh)** — cluster autoscaler that provisions and scales nodes for Layer's bursty, GPU-bound compute (Apache-2.0). - **[Aerospike](https://aerospike.com)** — NVMe-backed ephemeral document cache (AGPL-3.0). - **[PostgreSQL](https://www.postgresql.org)** — indexing-state store for the pipeline and embed queue (PostgreSQL License). - **[VictoriaMetrics](https://victoriametrics.com)** — metrics store (Apache-2.0). To get started, see the [install guide](/docs/install). For more technical detail, see [Concepts](/docs/concepts), [Guarantees](/docs/guarantees), and [Tradeoffs](/docs/tradeoffs). --- # Concepts Source: https://hevlayer.com/docs/concepts ## Control loops Layer uses a control loop as a core primitive for managing your indexes. It reconciles index state against metrics emitted by the search system, which is how Layer applies row-level transformations ([UDFs](/docs/udfs)) and keeps an index's stable view current. Related: [UDFs](/docs/udfs), [snapshots](/docs/api/snapshots), stable watermark. ## Kubernetes autoscaling Because Layer is stateless, you can autoscale every tier independently. Karpenter handles node-level scaling, and KEDA scales pods against signals from an embedded PostgreSQL queue. The data in that queue is used for scaling decisions only — it carries no non-recoverable system state. ## Gateway enhancements Where helpful, the gateway extends your search system with common query patterns and filtering primitives. Layer's enhancements use reserved `_hevlayer_*` attributes; changing the schema on those attributes breaks Layer's guarantees but should degrade gracefully. All functionality is exposed through a single client, so applications can route every call through the gateway — Layer works best when traffic flows through it consistently, even for requests that need no extra behavior. ## Scatter/gather Layer can partition a single namespace into hash buckets — shards — by assigning each row a reserved `_hevlayer_shard` attribute (xxh64 of its id, modulo the shard count). The gateway then scatters a query to every bucket in parallel, one `_hevlayer_shard`-filtered query per shard, and gathers the results: it merges and re-ranks the combined rows down to your requested `top_k` before returning them. Sharding stays invisible to the client — you issue one query and get one ranked result set. The same scatter/gather path backs [result count](/docs/api/result-count), [scans](/docs/scans), and [UDF](/docs/udfs) discovery scans. ## Pull-through cache Document [reads](/docs/api/query#fetch) are served by a pull-through cache: the gateway checks the NVMe-backed cache (Aerospike) first, and on a miss reads through to Turbopuffer — or S3 for snapshots — returns the row, and backfills the cache best-effort. The cache is a read accelerator, not a hard dependency: if it is unavailable, reads fall through to origin and still succeed. One logical cache serves every read path, with different uses (document fetch, snapshot field-values) separated by Aerospike `set`. ## Observability as code Layer's observability contract is defined in the service itself. The gateway emits a self-describing [catalog](/docs/api/metrics) of every metric it exports — names, labels, and example PromQL — so the metric surface is code, not hand-maintained dashboard config. The bundled [dashboard](/docs/dashboard) and any external automation read from that catalog, and an embedded, Prometheus-compatible VictoriaMetrics instance lets you run [PromQL](/docs/api/metrics) against the series directly or bring your own monitoring stack. ## Glossary | Concept | Current meaning | | --- | --- | | [Namespace](/docs/api/introduction) | A Turbopuffer namespace addressed through `/v2/namespaces/{namespace}`. | | Document | A row id plus attributes, and optionally a vector when writing/searching. | | Cache | NVMe-backed records keyed by namespace and document id, plus cache sets for pipeline chunks and snapshots. | | Stable watermark | Epoch-ms cut tracked by the consistency watcher when Turbopuffer index status is up-to-date. | | [Pipeline](/docs/pipelines) | A PostgreSQL-backed state machine for CPU extraction and GPU embedding work. | | [Snapshot](/docs/api/snapshots) | A content-addressed S3 facet histogram written after a namespace is observed stable. | | Facet listing | The distinct values for a configured snapshot field, surfaced as `fields[].values[].v`. | | Facet count | The document count for a configured snapshot field value, surfaced as `fields[].values[].n`. | | [Result count](/docs/api/result-count) | A synchronous ranked-query count over FTS or vector query input. | | [Scan](/docs/scans) | A filter scan that returns matching IDs asynchronously or a matching row count synchronously. | | [UDF](/docs/udfs) | A stateless container the gateway calls once per row of an index to compute a derived attribute. | | Gateway | The Rust proxy fronting Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. | | [Operator](/docs/kubernetes/operator) | The Kubernetes operator that reconciles Layer's CRDs — functions, pipelines, scaling, and cluster config. | | Shard | A hash bucket within a single namespace. Each row carries a reserved `_hevlayer_shard` value (xxh64 of its id, modulo the shard count) so the gateway can scatter/gather a query across buckets. | | CRD | Custom Resource Definition: the Kubernetes-native resources the operator reconciles — [functions](/docs/kubernetes/function-crd), [pipelines](/docs/kubernetes/pipeline-crd), [scaling](/docs/kubernetes/scaling-crd), and [indexes](/docs/kubernetes/index-crd). | | PromQL | The Prometheus query language. The gateway proxies it to the embedded VictoriaMetrics so you can query [metrics](/docs/api/metrics) without a separate scraper. | --- # Document model Source: https://hevlayer.com/docs/document-model A Layer document is a Turbopuffer row — an id, your attributes, and an optional vector — read and written through the [pull-through cache](/docs/concepts#pull-through-cache). Alongside your own schema, Layer reserves the `_hevlayer_*` attribute prefix for its own bookkeeping. The gateway manages these attributes: your writes and [UDF](/docs/udfs) outputs must not set them, and editing them directly breaks Layer's guarantees (the gateway degrades gracefully if they drift). | Attribute | Type | Purpose | | --- | --- | --- | | `_hevlayer_upserted_at` | integer (epoch ms) | Server-stamped on every write. The gateway filters queries to `_hevlayer_upserted_at <= watermark` to hold the read-consistency cut while the upstream index catches up. | | `_hevlayer_shard` | integer | Hash bucket assigned at write time (`xxh64(id) % shard_count`), present only on sharded namespaces. Lets the gateway [scatter/gather](/docs/concepts#scattergather) a query across the shards of one namespace. | The `_hevlayer_` prefix also namespaces internal cache sets — snapshot field-values and search-history clickstream — but those are cache keys, not part of your document schema. --- # No Guarantees Source: https://hevlayer.com/docs/guarantees import Callout from "../../components/docs/Callout.astro"; Layer can't offer guarantees. We try our best to provide secure, hands-off infrastructure that you are ultimately responsible for. While we can't offer guarantees, we make a set of promises in how we design, secure, and distribute our software that we believe make it easy to use and will stand the test of time. This page covers the specific status of those promises. ## Commitments - Your index stays in your search system. We will not reimplement indexing. Layer keeps a copy of your data, but the search index lives in your vector store. - Your history is backed up to S3. Search history and namespace snapshots are written to the S3 bucket you specify. The format of this data may change prior to v1.0. - Data on NVMe. Customer document and chunk data is served from NVMe for price/performance. We try not to stray from this pattern, though some use cases may justify a smaller in-memory document cache. - This documentation is accurate and up to date. When it isn't, that's a bug in the software — report it. - Metrics and alerts are documented as code, and tested. The observability surface is versioned, reviewable, and covered by tests — not hand-rolled per deployment. - Graceful degradation. We add graceful degradation support whenever possible — the gateway degrades rather than failing hard. - Client compatibility. We will (almost) always stay client-compatible with the search systems we front. Where we diverge, it's a feature making an explicit tradeoff we believe is an improvement. Layer was developed by a single person orchestrating agentic coding tools and building automation. Not a single line of code was hand-written. That said, it was made with ❤️ by a human as much as it is built by AI. --- # Tradeoffs Source: https://hevlayer.com/docs/tradeoffs Layer makes a set of design tradeoffs we believe improve functionality of the search engine. This page makes those tradeoffs explicit. As this list grows, we will offer configuration where possible to allow users to configure their preference. Layer adds latency to the query path in the following ways. - An additional network hop (not configurable). - A query plan that allows for strongly consistent reads during heavy writes ([index configurable](/docs/kubernetes/index-crd)). Layer also increases index storage requirements via. - A secondary indexing for filtering by upsert time (not configurable). - A secondary indexing used for scatter gather sharding (not configurable). --- # Limits Source: https://hevlayer.com/docs/limits Layer is limited by certain constraints of the underlying components we ship with. We will lift these as demand increases. - **Single-node Aerospike.** We enforce this for simplicity and also believe that a single large NVMe drive offers enough storage for almost every dataset. - **~4,090 Turbopuffer namespaces.** We use Aerospike sets for logical separation of data, which are limited by the Aerospike Community Edition AGPL license. - **~3 TB cache size.** Another limitation of the Aerospike license. - **10,000 distinct values per scan facet field.** Pre-computed snapshot scans cap each facet field's cardinality. If a field exceeds the cap, it is noted in `fields_skipped[]` rather than `fields[]`, so readers can treat every emitted field as complete. See [snapshots](/docs/api/snapshots). ## No limits These have no enforced ceiling, but practical limits exist and will surface under load. - **CRD instances** (`Index`, `Function`, `Pipeline`, `Scaling`) — bounded only by the etcd and operator throughput of your Kubernetes cluster. - **Snapshot history per namespace** — durable in S3, accumulates indefinitely; bounded by object storage cost. - **Search history retention** — accumulates indefinitely in S3; no automatic expiry. - **Clickstream event volume** — accumulates indefinitely in S3; no automatic expiry. - **UDF concurrency per function** — KEDA scales replicas to match queue depth, bounded by your cluster's capacity. - **Pipeline queue depth** — pipeline queues, including chunked document queues, store document IDs and chunk ID lists in S3 manifests and keep only segment state and counters in Postgres. - **Document size and attribute count** — bounded by Turbopuffer and Aerospike record limits, not by Layer. --- # Agents Source: https://hevlayer.com/docs/agents import Callout from "../../components/docs/Callout.astro"; These docs are queryable from the command line. The same engine behind the `⌘K` search on this site ships as a CLI, so your coding agent can search, read, and cite the Layer docs directly — no scraping, no MCP server, no API key. Two commands wire it up. ## 1. Install the CLI ```sh go install github.com/hev/ask/cmd/ask@latest ``` The binary is self-contained; any agent harness that can run a shell command can use it. ## 2. Add the skill For Claude Code, paste this once: ```sh mkdir -p ~/.claude/skills/hevlayer-docs cat > ~/.claude/skills/hevlayer-docs/SKILL.md <<'EOF' --- name: hevlayer-docs description: >- Query the hev layer docs. Use when the user asks about Layer — the Turbopuffer gateway, strong-consistent reads, the stable watermark, the pull-through document cache, warm jobs, scans, result count, snapshots, pipelines, UDFs, the Index/InfraRules/Pipeline/Function CRDs, compute pools, install via Terraform or Helm, failure modes, or the dashboard. --- # hev layer docs Answer Layer questions from the docs, not from memory. Every verb is a keyless read: ask --endpoint https://hevlayer.com/api/ask search "" ask --endpoint https://hevlayer.com/api/ask section get "" ask --endpoint https://hevlayer.com/api/ask overview ask --endpoint https://hevlayer.com/api/ask glossary get "" Start with `search`; fetch sections for detail; use `overview` when you need the full map. Section ids look like `api/query#strong-consistent-reads`. Cite sections in your answer as https://hevlayer.com plus the returned `url` field. If `ask` is missing, install it: `go install github.com/hev/ask/cmd/ask@latest` EOF ``` Other harnesses: paste the body of that skill into your `AGENTS.md` — it is plain instructions around a CLI, nothing Claude-specific. ## 3. Ask ```sh ask --endpoint https://hevlayer.com/api/ask search "cache is down" ``` ```json { "results": [ { "title": "Concepts", "heading": "Pull-through cache", "url": "/docs/concepts#pull-through-cache", "group": "Overview", "snippet": "Document reads are served by a pull-through cache: the gateway checks..." } ] } ``` From here your agent typically runs `section get` on the winning id and answers with the citation. ## The verbs | Verb | Returns | | --- | --- | | `overview` | Orientation context plus the full section map with stable ids | | `search ""` | Ranked sections with snippets and deep links | | `section get ""` | One section: summary, exact identifiers, source URL | | `glossary get ""` | A product term resolved through its aliases (`watermark` → stable watermark) | ## Why answers stay grounded Search runs over a committed, reviewable digest of these docs — the same corpus, heading by heading, that renders on this site. Every anchor in it is verified against the rendered pages in CI, so a cited deep link like [/docs/api/query#strong-consistent-reads](/docs/api/query#strong-consistent-reads) always resolves. When the docs change, the digest is rebuilt and recommitted with them. Every verb above is a read against the public docs. Nothing to sign up for, nothing to configure beyond the endpoint URL. The docs are also available as plain text for direct ingestion: [/llms.txt](/llms.txt) (index) and [/llms-full.txt](/llms-full.txt) (full corpus). The CLI is the better path for agents that can run commands — it ranks, resolves aliases, and costs a fraction of the tokens. --- # Roadmap & Changelog Source: https://hevlayer.com/docs/roadmap ## Up Next - 🧷 Count and scan primitives — filter-count mode, snapshot truncation removal, route renames ([RFC 0019](https://github.com/hev/layer/blob/main/docs/rfcs/0019-count-and-scan-primitives.md), [#67](https://github.com/hev/layer/issues/67)) - 🚑 Indexing failure-mode E2E runbook — Aerospike stop-writes + Postgres pressure ([#55](https://github.com/hev/layer/issues/55)) - 🧬 Embedding UDF writeback via row re-upsert ([#52](https://github.com/hev/layer/issues/52)) - 🌱 Namespace init UDF for first-time embed population - 🚦 Snapshot-aware ready signal — `layer.is_stable` honors UDF state ([#46](https://github.com/hev/layer/issues/46)) - 🎨 Full dashboard redesign — 6-tab layout from the prototype ([#11](https://github.com/hev/layer/issues/11)) - 🐚 `layer` CLI — kube-style resource access, queries, scans, and jobs over the gateway REST API ### Later - 🔐 RBAC: scoped API keys and entitlements as a Layer primitive ([#8](https://github.com/hev/layer/issues/8)) - ♻️ Soft delete with TTL + restore ([#7](https://github.com/hev/layer/issues/7)) - 🪢 Hybrid text fusion — typo-tolerant search via per-token fuzzy + BM25 legs, fused by Turbopuffer-native RRF ([RFC 0022](https://github.com/hev/layer/blob/main/docs/rfcs/0022-hybrid-text-fusion.md), [#18](https://github.com/hev/layer/issues/18)) - ⌨️ Typeahead via Turbopuffer regex index ([#19](https://github.com/hev/layer/issues/19)) - 🕰️ Temporal queries — `as_of` selector for `/query`, `/scans`, `/fetch`, and `/snapshots` ([RFC 0020](https://github.com/hev/layer/blob/main/docs/rfcs/0020-temporal-queries.md), [#68](https://github.com/hev/layer/issues/68)) - 🌿 `copy_from_with_filter` — time travel + subset branching ([#20](https://github.com/hev/layer/issues/20)) - 🐇 Exact kNN result cache keyed by consistency watermark ([#21](https://github.com/hev/layer/issues/21)) - 🧪 A/B variant indexes with operator-controlled rollout ([#23](https://github.com/hev/layer/issues/23)) - 🦚 Per-query observability with LLM-judged Tail Quality ([#41](https://github.com/hev/layer/issues/41)) - 🎞️ Pipeline crash recovery via source replay + deterministic IDs ([#43](https://github.com/hev/layer/issues/43)) - ☠️ Paginated UDF dead-letter list ([#44](https://github.com/hev/layer/issues/44)) - 🏗️ Narrow cluster topology defaults ([#45](https://github.com/hev/layer/issues/45)) - 📣 Write amplification baselines ([#15](https://github.com/hev/layer/issues/15)) - 📮 `layer push` — Python UDF dev experience via Depot ([#64](https://github.com/hev/layer/issues/64)) - 💸 Cost API — AWS + Turbopuffer cost snapshots, timeseries, and rate card ([#35](https://github.com/hev/layer/issues/35)) ## 0.1 Release (UAT) ### API hardening - 🧩 Scaling CRD consolidation — `Pipeline`, `UDF`, `InfraRules` ([RFC 0012](https://github.com/hev/layer/blob/main/docs/rfcs/0012-crd-scaling-consolidation.md)) - 🎛️ `Index` CRD redesign ([RFC 0013](https://github.com/hev/layer/blob/main/docs/rfcs/0013-index-policy-surface.md)) - 📸 Snapshot scan naming conventions ([RFC 0014](https://github.com/hev/layer/blob/main/docs/rfcs/0014-snapshot-noun-scan-verb.md)) - 🧹 Remove unused APIs ### Lifecycle and operability - 🎚️ [Autoscaling compute](/docs/kubernetes/scaling-crd) for pipelines and UDFs - 🗄️ [Document cache endpoint](/docs/api/query#fetch) for building multi-stage pipelines - 📸 [Index snapshot history](/docs/api/snapshots) - 🧨 Coordinated delete - ⛵ [Helm](/docs/install/helm) and [Terraform](/docs/install/terraform) install scripts ### Surfaces - 🪟 [Dashboard MVP](/docs/dashboard) — basic CRD management and observability - 🐍 Official Python SDK ### Search - 🎯 Strongly consistent queries during heavy writes via [`_hevlayer_upserted_at`](/docs/guarantees) - 🧮 [Result count](/docs/api/result-count) over FTS/vector queries via scatter/gather - 📜 Precomputed facet listings in [snapshots](/docs/api/snapshots) - 🪙 Precomputed facet counts in [snapshots](/docs/api/snapshots) - 🪃 [Scans](/docs/api/scans) for filter IDs and filter counts not available in a snapshot - 🆔 Search by id via document-cached vector - 📰 [Search history](/docs/api/search-history) saved to S3 - 🗂️ Enhanced [namespace metadata](/docs/api/namespace-metadata) --- # Install Source: https://hevlayer.com/docs/install import LinkGrid from "../../components/docs/LinkGrid.astro"; A hev layer install has two stages. **Terraform** provisions the required AWS resources: IAM, S3, ECR, networking, cost-read roles, and, for the recommended path, a fresh EKS cluster. **Helm** installs the gateway, operator, and document cache into that cluster and wires them to the AWS resources Terraform produced. You can skip Terraform if you already have the AWS resources hev layer needs. At minimum, provide an S3 bucket and gateway IRSA role for snapshots and history. For the full operations surface, also provide dashboard cost-read IAM, image registry locations, and cluster-level components equivalent to the Terraform outputs. ## What ships in 0.1 The 0.1 install is single-tenant: one Helm release per environment, one Turbopuffer credential per release, one S3 bucket for snapshot and history data. Multi-tenant gateway scoping is on the 0.2+ roadmap and is not exposed at the install layer yet. --- # Terraform Source: https://hevlayer.com/docs/install/terraform import Callout from "../../../components/docs/Callout.astro"; The Terraform configuration in `infra/terraform/` provisions the AWS resources that the gateway and operator need. It is opinionated about the resources hev layer needs to behave correctly and conservative about resources around it. Route53 hosted zones and ACM certificates are opt-in; most installs bring existing DNS and TLS. ## What it sets up | Resource | Purpose | | --- | --- | | S3 bucket | Durable storage for namespace snapshots, search history, and clickstream events. | | IAM roles + IRSA policies | Gateway S3 access, dashboard cost-read access, and worker/operator AWS access. | | ECR repositories | Image registry for the gateway, operator, and customer-built function images. | | EKS + VPC + node pools | Recommended fresh-cluster runtime for design partners. | | Route53 + ACM | Optional DNS zones, records, and TLS certificates when `manage_public_dns=true`. | ## Cluster: recommended Design-partner installs should use a fresh EKS cluster unless there is a specific reason to bind hev layer to an existing one. The cluster path provisions: - a VPC with the subnets, NAT, and endpoints hev layer expects - an EKS control plane and node groups - Karpenter for node autoscaling - the AWS Load Balancer Controller for ingress - EFS for shared persistent volumes If you already operate an EKS cluster, you can disable the cluster modules and point hev layer at the existing cluster. You are still responsible for the functional prerequisites: an S3 bucket for snapshots/history, gateway IRSA that can read/write that bucket, dashboard IRSA for AWS cost and pricing reads, image registry access, Karpenter or equivalent node autoscaling for workers, and the AWS Load Balancer Controller if you use public ingress. For design partners, deploy hev layer to a fresh cluster. It keeps worker autoscaling, document-cache placement, and cost attribution isolated from unrelated workloads while the 0.1 operating model settles. ## Cost notes The Terraform is designed to deploy a cost-efficient AWS footprint with autoscaling for on-demand indexing work. At rest, the fixed costs are mostly EKS, NAT when private workers need third-party egress, and small storage lines. Indexing bursts scale worker nodes up through Karpenter and back down when queues drain. Heavier search use cases may need more read-side infrastructure: additional gateway replicas, larger document-cache nodes, or dedicated node pools for steady read traffic. Contact hev layer for help sizing read-heavy deployments. ## Outputs Terraform emits the values the Helm chart needs to install: the S3 bucket name, gateway IRSA role ARN, dashboard cost-read role ARN, ECR image URLs, and cluster metadata. Pass these into the Helm values file described in [Helm Install](/docs/install/helm). --- # Helm Install Source: https://hevlayer.com/docs/install/helm import Callout from "../../../components/docs/Callout.astro"; The Helm chart at `infra/helm/layer/` installs the gateway, operator, and document cache into a cluster that already has the AWS resources from [Terraform](/docs/install/terraform) or equivalent resources you manage. ## Required values Most of the chart is opinionated defaults. In a typical install the only value you must bring from outside the cluster is the Turbopuffer API key. | Value | Required | Notes | | --- | --- | --- | | `turbopuffer.apiKey` | yes | Turbopuffer credential the gateway uses on every upstream request. | | `gateway.image` | yes | Gateway image URL — Terraform emits this as an ECR output. | | `gateway.apiKey` | yes | Bearer token clients send as `Authorization: Bearer …`. Chart render fails when blank, by design. | | `s3.bucket` | yes | S3 bucket Terraform created for snapshots and history. | | `serviceAccount.roleArn` | yes | IRSA role ARN that grants the gateway access to the S3 bucket. | | `gateway.indexGc.enabled` | no | Enables namespace hard-delete cleanup of operator-discovered `Index` CRs. | | `gateway.indexGc.indexNamespace` | no | Namespace containing `Index` CRs. Blank follows `operator.discovery.indexNamespace`, then the Helm release namespace. | | `dashboard.serviceAccount.roleArn` | for cost tab | IRSA role ARN with AWS pricing, CloudWatch, and cost read access. | | `ingress.host` | optional | Set when you want a public ingress; use your DNS/TLS or enable Terraform-managed Route53/ACM. | Most other Helm inputs are wiring between resources the install process already produced. The Turbopuffer API key is the one thing hev layer can't generate for you — it's the credential you bring in from your Turbopuffer account. ## Install ```sh helm upgrade --install layer ./infra/helm/layer \ --namespace layer --create-namespace \ -f values.customer.yaml ``` The chart is not published to a public Helm repository in 0.1 — install from the source path or from the chart artifact provided during onboarding. ## What gets installed - `layer-gateway` — Rust gateway for Turbopuffer-compatible routes, fetch, scans, snapshots, warm jobs, and pipeline state. - `layer-operator` — reconciler for Index, InfraRules, Pipeline, and Function CRDs documented in [Kubernetes](/docs/kubernetes/operator). - `layer-document-cache` — Aerospike-backed document cache, scale-to-zero by default. - Supporting resources: service accounts, IRSA bindings, ingress, and CRDs. --- # Failure Modes Source: https://hevlayer.com/docs/failure-modes ## Read If the gateway is down, your queries are down. The document cache is stateless and can scale to zero with no disruption, and no other components impact the read path. ## Write The primary failure mode for writes is Aerospike stop-writes during a multi-stage pipeline job. Staged documents stay warm in the cache but do not contain vector data. If this data exceeds the Aerospike drive allocation the system will stop accepting writes and your pipeline will degrade to S3-backed chunk reads. The operator can restart Aerospike and the document cache will be lost. Pipeline workers resume automatically: staged chunk bodies are durable in S3, pending state is in PostgreSQL, and the gateway refills Aerospike from S3 after reconnect. The Helm document cache restarts automatically on stop-writes by default (`documentCache.autoRestartOnStopWrites: true`) and clears its Aerospike backing file on pod start (`documentCache.storage.resetOnStart: true`). That makes a pod restart a valid stop-writes recovery action for the Layer-owned cache. S3 and PostgreSQL must remain healthy; they are the durable recovery boundary. --- # Operator Overview Source: https://hevlayer.com/docs/kubernetes/operator `layer-operator` manages declarative state for your hev layer deployment. It serves a few crucial functions — monitoring for changes to your indexes and managing scaling. It does this through a set of abstractions known as [custom resource definitions (CRDs)](/docs/concepts#glossary). The gateway handles the read and write path; the operator handles everything that wants to be expressed as desired state in the cluster: which indexes exist, how worker pools scale, and which stateless functions run against which indexes. ## CRDs The operator reconciles four resource kinds, each documented on its own page: - [Index CRD](/docs/kubernetes/index-crd) — one resource per Turbopuffer namespace the gateway should manage. - [InfraRules CRD](/docs/kubernetes/scaling-crd) — cluster-wide compute pools, document cache rules, and shared scaling policy. - [Pipeline CRD](/docs/kubernetes/pipeline-crd) — staged work that changes row count. - [Function CRD](/docs/kubernetes/function-crd) — stateless user-defined functions that read and write attributes on an index. ## Relationship to the gateway The gateway and the operator are decoupled. The operator reconciles declarative state; the gateway serves the read and write path. Neither sits in the other's hot path, so the gateway keeps serving even if the operator is restarted or lagging. The link between them is one-directional and read-only. For some features the gateway reads CRD status — which indexes exist, which worker pools are ready — to inform what it serves. It never writes to the CRDs; declarative state is authored by you and reconciled by the operator, and the gateway is only ever a reader of it. ## Scheduling and node pools The operator is opinionated about where the workers it creates run. It does not schedule Pipeline and Function pods onto general cluster capacity — each compute pool pins to a dedicated, labeled node pool via `nodeSelector` and `tolerations`, so CPU and GPU work land on the right nodes and stay isolated from the rest of your cluster. The shipped defaults assume [Karpenter](https://karpenter.sh) and select on the `karpenter.sh/nodepool` label, but any labeled node pool works. This is configured once on `InfraRules/default`, not per workload — see [InfraRules](/docs/kubernetes/scaling-crd) for the compute-pool fields and how Pipelines and Functions choose a pool. --- # Index CRD Source: https://hevlayer.com/docs/kubernetes/index-crd An `Index` represents one namespace exposed through the gateway. It declares the backend, snapshot policy, cache posture, consistency mode, and access metadata. ```yaml apiVersion: hevlayer.com/v1 kind: Index metadata: name: products namespace: layer spec: backend: kind: turbopuffer region: aws-us-east-1 namespace: products distanceMetric: cosine_distance metadata: labels: app: shop tags: - catalog snapshot: interval: 5m retention: never facetFields: - category - brand cache: ttl: 24h capGiB: 64 mode: standard warming: threads: 4 consistency: strong ``` ## Backend | Field | Purpose | | --- | --- | | `backend.kind` | `turbopuffer` in the 0.1 runtime. | | `backend.region` | Backend region identifier. | | `backend.namespace` | Optional upstream namespace override. Defaults to the Index name. | | `backend.distanceMetric` | Vector metric, default `cosine_distance`. | ## Snapshot policy `snapshot.facetFields` is the user-facing source of fields the gateway materializes into durable snapshots. `retention` defaults to `never` in 0.1 because automatic snapshot GC is not shipped yet. ## Cache policy `cache.warming.threads` defaults to `4`. Aerospike remains an ephemeral cache; durable snapshot history stays in S3. ## Status The operator reports observed generation, snapshot scheduling metadata, metadata sync state, and conditions. --- # InfraRules CRD Source: https://hevlayer.com/docs/kubernetes/scaling-crd `InfraRules` is the cluster-scoped policy object for Layer-managed runtime infrastructure. The 0.1 surface has exactly one object: `InfraRules/default`. Pipelines and Functions do not reference a separate autoscaling resource. They set `spec.scaling` inline and choose a pool from `InfraRules/default.spec.computePools`. ## InfraRules ```yaml apiVersion: hevlayer.com/v1alpha1 kind: InfraRules metadata: name: default spec: computePools: - name: cpu kind: cpu maxReplicasPerWorkload: 20 nodeSelector: karpenter.sh/nodepool: cpu tolerations: [] resources: requests: cpu: "500m" memory: 512Mi limits: memory: 1Gi documentCache: capGiB: 256 replicationFactor: 1 scaling: mode: autoscale nodes: min: 1 max: 3 ``` The operator validates that the object is named `default`. Helm can render the default object with `operator.infraRules.create=true`. ## Compute pools | Field | Purpose | | --- | --- | | `name` | Referenced by `spec.scaling.pool` on Pipeline and Function resources. | | `kind` | Pool class label such as `cpu` or `gpu`. | | `gpuType` | Optional descriptive GPU type for GPU pools. | | `nodeSelector` | Applied to worker pods that choose the pool. | | `tolerations` | Applied to worker pods that choose the pool. | | `resources` | Container resources applied to worker pods. | | `maxReplicasPerWorkload` | Hard ceiling for one Pipeline or Function. | If a workload names an unknown pool or asks for more replicas than the pool ceiling, the operator leaves the workload unready and records a condition on its status. ## Workload scaling ```yaml scaling: pool: cpu mode: autoscale replicas: min: 0 max: 4 ``` | Mode | Behavior | | --- | --- | | `autoscale` | Emit a KEDA `ScaledObject` and let queue depth scale the Deployment between `min` and `max`. | | `fixed` | Set Deployment replicas to `replicas.min`; no KEDA object is emitted. | | `disabled` | Scale the Deployment to 0; no KEDA object is emitted. | Paused workloads also scale to 0. To keep a cold-start-heavy worker warm, set `mode: autoscale` and `replicas.min: 1`. ## Document cache rules `documentCache` captures the operator-owned document cache envelope: capacity, replication factor, and node count. Helm still renders the document-cache KEDA object directly in 0.1; `InfraRules` is the declared policy shape the operator reports and validates against. --- # Pipeline CRD Source: https://hevlayer.com/docs/kubernetes/pipeline-crd The `Pipeline` CRD declares worker-owned indexing work whose row count can change between input and output: external ingestion, chunking, and other fan-out stages. Use a [Function](/docs/kubernetes/function-crd) when existing rows acquire a derived attribute without changing row count. Pipeline and Function resources share the same `spec.worker` and `spec.scaling` envelopes. `InfraRules/default` owns placement and pool limits; each workload chooses a pool. ```yaml apiVersion: hevlayer.com/v1alpha1 kind: Pipeline metadata: name: product-images namespace: layer spec: target: namespace: products sourceRef: kind: sqs queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images worker: image: ghcr.io/hev/product-image-worker:latest batchSize: 64 timeoutSeconds: 60 scaling: pool: cpu mode: autoscale replicas: min: 0 max: 8 ``` ## Target `spec.target.namespace` is the Turbopuffer namespace the pipeline writes. The gateway pipeline API owns document state, chunks, and vector writes for that target namespace. ## Source `spec.sourceRef` is intentionally open JSON so operators can record the external source that feeds the worker: SQS, Kafka, S3 events, a partner API, or a one-off migration source. The operator passes it through as declarative metadata; the worker image owns source-specific behavior. ## Worker | Field | Purpose | | --- | --- | | `image` | Worker image. | | `batchSize` | Work items per batch. | | `timeoutSeconds` | Worker call timeout. | | `podSpec` | Optional pod-level merge patch. | The operator creates one Deployment per Pipeline. ## Scaling ```yaml scaling: pool: cpu mode: autoscale replicas: min: 0 max: 8 ``` `spec.scaling.pool` must name a pool in `InfraRules/default`. `mode: autoscale` creates a KEDA `ScaledObject` backed by pipeline queue depth. `mode: fixed` pins the Deployment to `replicas.min`; `mode: disabled` scales it to zero. `spec.paused: true` also scales the worker to zero. ## Status The operator reports managed object references and readiness conditions. Queue counts and worker progress are served by the gateway pipeline status API. --- # Function CRD Source: https://hevlayer.com/docs/kubernetes/function-crd The `Function` CRD declares row-preserving compute over an [Index](/docs/kubernetes/index-crd). The operator creates worker resources; the gateway owns discovery, queueing, retries, leases, and writeback through the UDF API. ```yaml apiVersion: hevlayer.com/v1alpha1 kind: Function metadata: name: tag-products namespace: layer spec: targetNamespaces: - products inputs: - id - title output: attribute: tags kind: tags version: v1 filter: - "Or" - - ["tags_v", "NotEq", "v1"] - ["tags_v", "Eq", null] worker: image: ghcr.io/hev/tag-products:latest dispatch: pull batchSize: 32 timeoutSeconds: 30 schedule: discoveryIntervalSeconds: 300 leaseSeconds: 120 maxInFlightBatches: 8 maxConcurrentScans: 1 retry: maxAttempts: 8 initialBackoffSeconds: 5 maxBackoffSeconds: 300 triggers: - discovery scaling: pool: cpu mode: autoscale replicas: min: 0 max: 6 ``` ## Selection Use `targetNamespaces` for explicit namespaces. Use `indexSelector` when labels on `Index` resources should choose the namespaces. `filter` preserves arbitrary JSON, including array-form Turbopuffer filters. The operator stores the shape as-is; the gateway evaluates it during discovery. ## Worker | Field | Purpose | | --- | --- | | `image` | Worker image. | | `dispatch` | `pull` for SDK claim/poll workers, `push` for HTTP `/run` workers. | | `port` | Push-dispatch service port. | | `batchSize` | Rows per batch. | | `timeoutSeconds` | Worker call timeout. | | `podSpec` | Optional pod-level merge patch. | Pull dispatch creates a Deployment. Push dispatch also creates a Service and readiness probe. ## Scaling Function scaling is inline under `spec.scaling`. The operator emits a KEDA `ScaledObject` when `mode: autoscale`, using `layer_udf_queue_depth` for the trigger. The selected pool must exist in `InfraRules/default`. Replica maxima above the pool's `maxReplicasPerWorkload` are rejected in status. ## Output `output.kind: embedding` should set `output.dim` so consumers can validate vector shape. Outputs are patched onto the target row through the gateway. Deleting a Function garbage-collects operator-managed Kubernetes resources. It does not delete already-written attributes. --- # Introduction Source: https://hevlayer.com/docs/api/introduction import Upstream from "../../../components/docs/Upstream.astro"; Layer matches the Turbopuffer wire contract so existing clients keep working when you point them at the gateway. Where a route has an upstream equivalent, the site documents what Layer adds — not the upstream behavior itself. Follow the **Upstream docs** link on each page for the underlying request/response shape. ## Install The Python SDK is generated from `apps/layer-gateway/openapi.yaml` and ships the typed async client (`AsyncHevlayer`). ```sh pip install hevlayer ``` Requires Python 3.11+. The SDK reads connection info from environment variables: | Variable | Purpose | | --- | --- | | `LAYER_GATEWAY_URL` | Base URL of the gateway. | | `LAYER_GATEWAY_API_KEY` | API key sent on every request. | | `TURBOPUFFER_API_KEY` | Optional direct fallback key for Turbopuffer-compatible SDK calls when the gateway is unreachable. | | `TURBOPUFFER_API_URL` | Optional direct fallback base URL; defaults to `https://aws-us-east-1.turbopuffer.com`. | Languages beyond Python are generated on demand through the SDK harness; reach out if you need one that isn't shipped yet. ## Client fall-through The Python SDK can fall through to Turbopuffer direct when the gateway is unreachable. The fallback is limited to calls that can be satisfied without Layer state: simple vector queries and raw Turbopuffer-compatible methods such as `write_namespace`, `query_turbopuffer_namespace`, and namespace schema/listing calls. It emits a client log warning and sets `LayerPerf.fallback` to `turbopuffer_direct` when `with_perf=True`. Fetches, warm jobs, pipelines, UDFs, `nearest_to_id` queries, and other Layer-only workflows still fail fast because they depend on gateway-owned cache, queue, history, or consistency state. Set `fallback_to_turbopuffer=False` on `AsyncHevlayer` to disable direct fallback. ## Enhancements to upstream routes Each of the routes below is wire-compatible with Turbopuffer. The body of each section describes only what Layer overlays on top. ### Write — `POST /v2/namespaces/{ns}` and `PATCH /v2/namespaces/{ns}` Upstream contract for upsert, delete, and `patch_rows`. - Best-effort NVMe cache mirror before the upstream write. - Server-stamped `_hevlayer_upserted_at` on every upsert and patch, which powers the consistency watermark on the query path. - `_hevlayer_*` attributes are reserved — writes to them are rejected. Page: [Write](/docs/api/write). ### Query — `POST /v2/namespaces/{ns}/query` Upstream contract for vector and FTS queries — request shape, ranking, filters, attribute selection. - Strong-consistent reads via an injected `_hevlayer_upserted_at <= watermark` predicate while the upstream index is `updating`. - One-shot 429 retry with the watermark filter forced on, for queries that race a write storm. - `stable_as_of` echoed on every response so callers can correlate freshness across reads. Page: [Query](/docs/api/query). ### Metadata — `GET /v2/namespaces/{ns}/metadata` Upstream contract for namespace metadata — schema, row count, index status, timestamps. - Proxied upstream verbatim, then enriched with a `layer` block containing `stable_as_of` and `is_stable`. Page: [Namespace metadata](/docs/api/namespace-metadata). ### Cache warm hint — `GET /v1/namespaces/{ns}/hint_cache_warm` Upstream contract for the cache warm hint. - Forwards the hint upstream, then runs Layer-side warm steps: a warm job to backfill the NVMe cache from origin, plus a mirror of the latest S3 snapshot body into NVMe. - Each step is independently toggleable per request. Page: [Warm cache](/docs/api/warm-cache). ## Cross-cutting conventions These apply to every endpoint Layer proxies, whether the route is upstream-compatible or Layer-only. - **Server-stamped `_hevlayer_upserted_at`.** Every upsert and patch is stamped with a server-side epoch-ms watermark. Caller-supplied values are silently overwritten. - **`_hevlayer_*` reserved.** Document attributes prefixed with `_hevlayer_` are reserved for the proxy layer. Writing to them is a validation error; reading them is fine when explicitly requested. - **Hard vs soft failures.** Turbopuffer write/query failures are hard failures and surface as 5xx. NVMe cache failures are soft and never block the response. - **`x-layer-cache` header.** Fetch responses include `hit`, `miss`, or `miss-on-error` so callers can distinguish a cold cache from an outage. - **Consistency hints.** Reads that go through the watermark path include `stable_as_of`; queries omit it only on a cold-start gateway that has not yet observed a stable poll. ## Compatibility posture Layer aims to be a drop-in for existing Turbopuffer clients. Routes that the upstream does not implement are namespaced under `/v2/` and do not shadow upstream behavior. If a Turbopuffer client sends a request to a route Layer doesn't proxy, the gateway returns 404 — it does not silently re-route to an upstream that might handle it differently. --- # Write & Stage Source: https://hevlayer.com/docs/api/write import Upstream from "../../../components/docs/Upstream.astro"; The write path is wire-compatible with the upstream `POST /v2/namespaces/{ns}` endpoint. The shape below shows what Layer adds — see the upstream docs for the full request schema. ## Upsert and delete ```http POST /v2/namespaces/products Content-Type: application/json { "upserts": [ { "id": "asin-B08N5WRWNW", "vector": [0.0012, -0.043], "attributes": {"title": "Wireless headphones", "category": "Electronics"} } ], "deletes": ["asin-old-001"] } ``` Status semantics: - 200 OK once the upstream Turbopuffer write succeeds. - 422 when both `upserts` and `deletes` are empty. - 502 when the upstream write/delete fails. NVMe cache writes happen *before* the upstream call as a best-effort side effect. They never block the response — they're how Layer keeps reads fast through the cache. Every upsert is server-stamped with a hidden `_hevlayer_upserted_at` attribute (epoch milliseconds). Any caller-supplied value is overwritten — this stamp powers the consistency watermark on the [query](/docs/api/query) path. ## Patch ```http PATCH /v2/namespaces/products Content-Type: application/json { "patches": [ {"id": "asin-B08N5WRWNW", "attributes": {"category": "Audio"}} ] } ``` Patch preserves unspecified attributes and maps to Turbopuffer `patch_rows`. Vectors cannot be patched — update a vector by reading the row and upserting the full document. `_hevlayer_upserted_at` is bumped on every patch, so reads through the watermark filter see the patched row only after it's indexed. ## Pipeline stage When a document is part of a pipeline, the writer doesn't talk to the namespace directly. The CPU worker hands chunks off to the pipeline, the GPU worker writes vectors back, and the gateway is the one calling the namespace upsert. ```http PUT /v2/pipelines/product-images/documents/asin-B08N5WRWNW Content-Type: application/json { "chunks": [ {"id": "asin-B08N5WRWNW-0", "text": "Wireless noise-cancelling headphones"}, {"id": "asin-B08N5WRWNW-1", "text": "40-hour battery life", "metadata": {"page": 2}} ] } ``` Staging stores chunks in the NVMe cache and marks the document `pending`. Re-staging the same document ID replaces the chunks and resets state to `pending`. The full pipeline API is documented under [Pipelines](/docs/pipelines). ## Side effects | Side effect | Behavior | | --- | --- | | NVMe cache mirror | Best-effort, written before the upstream call. A failure here doesn't roll back; the gateway can briefly cache a doc that didn't reach the upstream index. Re-sending the upsert resolves it. | | Snapshot watcher | Re-evaluates freshness on the next poll. Stable namespaces materialize a new snapshot if the histogram shape changed (see [Snapshots](/docs/api/snapshots)). | --- # Query & Fetch Source: https://hevlayer.com/docs/api/query import Upstream from "../../../components/docs/Upstream.astro"; Query is wire-compatible with the upstream `POST /v2/namespaces/{ns}/query` endpoint. The request schema (vector, filters, ranking, attribute selection) is documented upstream. The shape below is what Layer adds on top. ## Query request ```http POST /v2/namespaces/products/query Content-Type: application/json { "vector": [0.0012, -0.043], "top_k": 10, "filters": ["category", "Eq", "Electronics"], "include_attributes": ["title", "category"] } ``` ```json { "results": [ {"id": "asin-B08N5WRWNW", "dist": 0.42, "attributes": {"title": "..."}} ], "stable_as_of": 1715600400000 } ``` ## Strong-consistent reads Turbopuffer indexes upserts asynchronously, so a naive query right after an upsert can return partial results or 429 entirely under streaming-write pressure. Layer sidesteps both: 1. Queries run at `consistency=eventual` upstream, so they never block on indexing. 2. A background loop polls each registered namespace's `index.status` and records the latest status plus, when stable, a watermark equal to `poll_start - safety_margin`. 3. Per-query decision: - `Updating` → inject a hidden `_hevlayer_upserted_at <= watermark` predicate so the read never sees partially-indexed rows. - `Stable` or `Unknown` → run without the predicate. The upstream index is caught up (or no contrary evidence exists). 4. On a 429 to an unfiltered query, Layer retries once with the watermark filter forced on. Responses always report `stable_as_of` (epoch ms) — the most recent watermark the watcher has recorded. Omitted on a cold-start gateway that has not yet observed a stable poll. ## Filter shape ``` ["category", "Eq", "Electronics"] # leaf ["And", [["category", "Eq", "Electronics"], ["price", "Lte", 200]]] # conjunction ["Or", [...]] # disjunction ``` Filter shape follows Turbopuffer array syntax. Layer combines the caller's filter with the watermark predicate using a 2-element `And` automatically — callers never see `_hevlayer_upserted_at` in their request or response. ## Tunables | Variable | Default | Purpose | | --- | --- | --- | | `CONSISTENCY_POLL_INTERVAL_MS` | 1000 | How often the watcher polls each namespace. | | `CONSISTENCY_SAFETY_MARGIN_MS` | 500 | Cushion between poll time and watermark to cover in-flight upserts. | ## Explain query ```http POST /v2/namespaces/products/explain_query ``` `explain_query` is proxied to Turbopuffer verbatim — Layer adds nothing and applies no watermark filter. Use it to inspect upstream query planning; see the [upstream docs](https://turbopuffer.com/docs) for the request and response shape. ## Fetch Fetch is a Layer-only surface — there is no upstream equivalent. The NVMe cache is checked first; on miss or error the gateway falls through to Turbopuffer and backfills the cache best-effort. ### Single fetch ```http GET /v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category ``` | Outcome | Status | Header | | --- | --- | --- | | Cached hit | 200 | `x-layer-cache: hit` | | Cache miss, upstream hit, cache backfilled | 200 | `x-layer-cache: miss` | | Cache unavailable, upstream hit | 200 | `x-layer-cache: miss-on-error` | | Missing from both layers | 404 | — | ### Batch fetch ```http POST /v2/namespaces/products/documents Content-Type: application/json { "ids": ["asin-1", "asin-2", "asin-3"], "include_attributes": ["title"] } ``` ```json { "documents": [ {"id": "asin-1", "attributes": {"title": "..."}}, {"id": "asin-3", "attributes": {"title": "..."}} ], "missing": ["asin-2"] } ``` Batch fetch returns found documents and missing ids inline instead of a partial 404. `documents` preserves request order; ids the gateway could not find anywhere land in `missing`. ### Behavior matrix | Cache state | Single fetch | Batch fetch | | --- | --- | --- | | Hit | cache | cache | | Miss, upstream present | upstream + backfill | upstream + backfill | | Miss, upstream absent | 404 | inline `missing` | | Cache unavailable | upstream, `miss-on-error` | upstream, `miss-on-error` | --- # Namespace metadata Source: https://hevlayer.com/docs/api/namespace-metadata import Upstream from "../../../components/docs/Upstream.astro"; The metadata payload is proxied verbatim from the upstream `/v2/namespaces/{ns}/metadata` endpoint. Schema, row counts, index status, and timestamps follow the upstream contract. Layer adds a single sub-object on top. ## Request ```http GET /v2/namespaces/products/metadata ``` ```jsonc { // Proxied from Turbopuffer verbatim "schema": { }, "approx_row_count": 12500, "approx_logical_bytes": 48800000, "created_at": "2026-03-15T10:30:45Z", "updated_at": "2026-05-12T18:49:00Z", "last_write_at": "2026-05-12T18:48:30Z", "index": { "status": "up-to-date" }, // Layer enhancement "layer": { "stable_as_of": 1715600400000, "is_stable": true } } ``` ## The `layer` block | Field | Meaning | | --- | --- | | `stable_as_of` | Epoch-ms watermark from the most recent stable poll. Null on cold start before the watcher has observed a stable namespace. | | `is_stable` | Whether the most recent poll observed `index.status == "up-to-date"`. False on cold start, true once the watcher catches up. | `is_stable` is the *current* signal — it drives the per-query filter-skip decision on the query path. `stable_as_of` is the *historical* watermark — the cut a filtered query would apply. For snapshot history derived from these freshness signals, see [Snapshots](/docs/api/snapshots). ## List namespaces `GET /v2/namespaces` is a Layer-only augmented listing. It pages the upstream namespace list and enriches each row with the same freshness and cache signals surfaced above. It is the surface the dashboard's inventory view reads. ```http GET /v2/namespaces?prefix=prod&page_size=100 ``` ```jsonc { "namespaces": [ { "name": "products", "row_count": 12500, "size_bytes": 48800000, "stable_as_of_ms": 1715600400000, "is_stable": true, "cache_state": {"state": "warm", "warm_inflight": false}, "last_write_ms": 1715600399000, "shadow": false, "labels": {} } ], "next_cursor": "..." } ``` | Query param | Purpose | | --- | --- | | `prefix` | Restrict to namespaces whose name starts with this string. | | `cursor` | Pagination cursor from a prior `next_cursor`. | | `page_size` | Page size; the upstream list page is capped at 1000. | A per-row metadata failure degrades to a row with `metadata_error` set rather than dropping the namespace, so the list stays complete even when a single namespace's metadata call fails. Responses are served from a short-TTL cache (`NAMESPACE_LIST_CACHE_TTL_MS`, default `10000`) so dashboard polling does not fan out a metadata call per namespace per refresh. --- # Scan Source: https://hevlayer.com/docs/api/scans Scans iterate a namespace by filter. `mode: ids` creates an asynchronous job and returns IDs through a results route. `mode: count` returns one number synchronously. ## Routes | Route | Method | Behavior | | --- | --- | --- | | `POST /v2/namespaces/{ns}/scans` | POST | Create an ID scan job or return a count. | | `GET /v2/namespaces/{ns}/scans` | GET | List ID scan jobs for the namespace. | | `GET /v2/namespaces/{ns}/scans/{id}` | GET | Read one ID scan job. | | `GET /v2/namespaces/{ns}/scans/{id}/results` | GET | Read completed scan IDs. | | `DELETE /v2/namespaces/{ns}/scans/{id}` | DELETE | Drop the in-memory scan job. | ## ID Mode ```http POST /v2/namespaces/products/scans Content-Type: application/json { "source": "auto", "mode": "ids", "filters": ["category", "Eq", "Electronics"], "page_size": 1000 } ``` `mode` defaults to `ids`. Valid ID-mode sources are `auto`, `cache`, and `origin`. The create response is `202 Accepted`: ```json { "id": "scan-uuid", "namespace": "products", "source": "auto", "effective_source": "origin", "status": "running", "progress": 0, "documents_scanned": 0, "created_at": "2026-05-26T10:00:00Z" } ``` Read IDs after `status` is `completed`: ```http GET /v2/namespaces/products/scans/scan-uuid/results?limit=1000&offset=0 ``` ```json { "ids": ["doc-1", "doc-2"], "total": 2 } ``` ## Count Mode ```http POST /v2/namespaces/products/scans Content-Type: application/json { "mode": "count", "source": "auto", "filters": ["category", "Eq", "Electronics"], "timeout_seconds": 30 } ``` ```json { "count": 4210, "served_by": "snapshot", "snapshot_sha": "3f9e8b21", "watermark_ms": 1747300000123, "elapsed_ms": 3 } ``` Count-mode sources are `auto`, `snapshot`, `cache`, and `origin`. Snapshot reads are eligible only for a single leaf `Eq` or `In` filter on a field present in the latest snapshot `fields[]`. `And`, `Or`, `Not`, range operators, fields absent from the snapshot, and skipped fields fall through under `auto` and fail with `412 precondition_failed` under `source: snapshot`. Live count responses include: ```json { "count": 4210, "served_by": "origin", "bounded": false, "timed_out": false, "shards_saturated": 0, "shards_total": 1, "elapsed_ms": 42 } ``` ## Auto-Mode Policy Auto ties cache freshness to the same consistency watermark used by strong-consistent queries. The gateway tracks per-namespace `cache_warmed_through`, the watermark observed at the end of the last successful origin warm. | Cache state | Watermark state | Action | | --- | --- | --- | | Empty | any | Run origin and stamp `cache_warmed_through`. | | Populated, `cache_warmed_through >= watermark` | observed | Serve cache. | | Populated, `cache_warmed_through < watermark` | observed | Serve cache and start a background origin warm. | | Populated, no `cache_warmed_through` yet | observed | Serve cache and start a background origin warm. | | Populated | not yet observed | Serve cache. | When cache is used, `_hevlayer_upserted_at <= cache_warmed_through` is added before the user filter so the scan is a stable warmed view. --- # Result Count Source: https://hevlayer.com/docs/api/result-count Result count answers "how many rows match this ranked query?" It is separate from [scan count](/docs/api/scans), which counts rows matching a filter. ```http POST /v2/namespaces/products/result-count Content-Type: application/json { "query": {"field": "title", "fts": "wireless headphones"}, "filters": ["category", "Eq", "Electronics"], "mode": "bounded", "timeout_seconds": 30 } ``` ```json { "count": 4210, "bounded": false, "timed_out": false, "shards_saturated": 0, "shards_total": 1, "elapsed_ms": 42 } ``` | Shape | Required fields | Notes | | --- | --- | --- | | FTS | `field`, `fts` | BM25 query against a BM25-indexed field. | | Vector | `vector`, `max_distance` | `max_distance` is required; without an upper bound every row matches. `field` defaults to `vector`. | | Mode | Behavior | | --- | --- | | `bounded` | One scatter/gather. Saturated shards contribute their `top_k` as a lower bound. | | `exhaustive` | Recurses through saturated shards until every page is short or the request deadline expires. | Every call carries a deadline, default 30s and server-side max 300s. On timeout the partial count is returned with `bounded: true` and `timed_out: true`. --- # Warm cache Source: https://hevlayer.com/docs/api/warm-cache import Upstream from "../../../components/docs/Upstream.astro"; import Callout from "../../../components/docs/Callout.astro"; Layer exposes two warm surfaces. `hint_cache_warm` is the Turbopuffer-compatible hint; `warm` is the Layer-only shortcut that creates a gateway warm job. `GET /v1/namespaces/{ns}/hint_cache_warm` matches Turbopuffer's warm-cache hint. The upstream call advises the index to pre-load. Layer additionally runs cache-warm steps on the gateway side. ## Hint-cache warm ```http GET /v1/namespaces/products/hint_cache_warm ``` Layer-side steps (all default-on): | Step | What it does | | --- | --- | | `turbopuffer=true` | Forwards the warm hint upstream. | | `documents=true` | Starts an origin warm job to backfill the NVMe cache. | | `snapshots=true` | Mirrors the latest S3 snapshot body into NVMe. | Disable steps independently: ```http GET /v1/namespaces/products/hint_cache_warm?turbopuffer=false&documents=false&snapshots=true ``` The response reports per-step status. If `documents` is enabled, the response includes a warm job; poll it through `/warm-jobs/{id}`. ## Layer warm `POST /v2/namespaces/{ns}/warm` creates an asynchronous job that pages through Turbopuffer, backfills Aerospike, and refreshes `cache_warmed_through`. Use it when bootstrapping a namespace whose data was written outside the gateway. ```http POST /v2/namespaces/products/warm?page_size=1000 ``` The response is `202 Accepted` with the warm job: ```json { "id": "warm-job-uuid", "namespace": "products", "status": "running", "progress": 0, "documents_scanned": 0, "created_at": "2026-05-26T10:00:00Z" } ``` Poll it through: ```http GET /v2/namespaces/products/warm-jobs/warm-job-uuid ``` ## Cache-cold behavior Warm jobs, cache scans, cache snapshot jobs, and pipeline chunk reads return 503 `cache_cold` when the NVMe cache is unavailable. Fetch and fetch-many fall through to Turbopuffer with `x-layer-cache: miss-on-error` instead. The split is deliberate. Fetch is correctness-first: a cache outage must not turn into a missing document. Warm is throughput-first: warming on a cold cache would be wasted work, so the gateway surfaces the cold state to the caller rather than silently no-op-ing. --- # Snapshot History Source: https://hevlayer.com/docs/api/snapshots Snapshots are materialized facet histograms for a namespace. They carry facet listings in `values[].v` and facet counts in `values[].n`, stored durably in S3 and mirrored into Aerospike for the latest body. Use `POST /snapshots` to materialize a field now. Use history and body routes to read the durable chronology written by the consistency watcher. ## Configure watched fields The consistency watcher only materializes snapshots for facet fields it has been told to watch. Configure them with the `LAYER_FACET_FIELDS` environment variable — a JSON object mapping each namespace to its facet fields (Helm chart: `gateway.facetFields`): ```sh export LAYER_FACET_FIELDS='{ "products": ["category", "brand"], "reviews": ["sentiment", "language"] }' ``` The default is empty, which disables the snapshot writer: with no watched fields, `source: stored` and eligible `source: auto` jobs have nothing to read, and the history and activity feeds stay empty. Namespaces discovered through `GET /v2/namespaces` are registered with the watcher automatically, but only the fields listed here are materialized. `LAYER_SNAPSHOT_MIN_INTERVAL_MS` (default `300000`) sets the floor between writes for a namespace. ## Routes | Route | Method | Behavior | | --- | --- | --- | | `POST /v2/namespaces/{ns}/snapshots` | POST | Create an on-demand snapshot job for one field. | | `GET /v2/namespaces/{ns}/snapshot-jobs` | GET | List in-memory snapshot jobs. | | `GET /v2/namespaces/{ns}/snapshot-jobs/{id}` | GET | Read one snapshot job. | | `GET /v2/namespaces/{ns}/history` | GET | Newest-first durable snapshot history. | | `GET /v2/namespaces/{ns}/snapshots/{sha}` | GET | Full snapshot body by full SHA or 7-char prefix. | | `GET /v2/activity/snapshots` | GET | Cross-namespace snapshot-write activity stream. | ## Create a snapshot job ```http POST /v2/namespaces/products/snapshots Content-Type: application/json { "field": "category", "source": "auto", "filters": ["brand", "Eq", "Acme"], "page_size": 1000 } ``` Valid sources are `auto`, `stored`, `cache`, and `origin`. | Source | Reads from | Notes | | --- | --- | --- | | `auto` | Stored snapshot when possible, otherwise cache/origin policy | Default. Stored snapshots only support unfiltered configured fields. | | `stored` | Latest S3 snapshot body, with Aerospike mirror as a cache | Fastest path for configured facet fields. | | `cache` | Aerospike document cache | Supports filters the cache can evaluate. | | `origin` | Turbopuffer paginated scan | Authoritative. Persists the computed snapshot body to S3. | The response is `202 Accepted`: ```json { "id": "snapshot-job-uuid", "namespace": "products", "field": "category", "source": "auto", "status": "running", "progress": 0, "documents_scanned": 0, "created_at": "2026-05-26T10:00:00Z" } ``` Poll the job: ```http GET /v2/namespaces/products/snapshot-jobs/snapshot-job-uuid ``` Completed jobs include `sha` when a body was materialized: ```json { "id": "snapshot-job-uuid", "namespace": "products", "field": "category", "source": "origin", "status": "completed", "documents_scanned": 12844, "sha": "3f9e8b21", "stable_as_of": 1747300000123 } ``` ## History ```http GET /v2/namespaces/products/history?limit=20 ``` ```json [ {"watermark_ms": 1747300000123, "sha": "3f9e8b21..."}, {"watermark_ms": 1747299600045, "sha": "a1c5b09f..."} ] ``` | Query param | Default | Purpose | | --- | --- | --- | | `limit` | 50 | Maximum entries returned. Capped at 500. | | `before` | none | Return entries older than this SHA. 7-char prefixes are accepted. | The history endpoint lists S3 keys only; it does not read every snapshot body. ## Snapshot body ```http GET /v2/namespaces/products/snapshots/3f9e8b2 ``` ```json { "namespace": "products", "watermark_ms": 1747300000123, "sha": "3f9e8b21", "fields": [ { "name": "category", "values": [ {"v": "books", "n": 1240}, {"v": "electronics", "n": 873} ] } ], "fields_skipped": [ { "name": "tags", "reason": "exceeded_cap", "distinct_observed": 247000, "cap": 10000 } ] } ``` `fields[].values[].v` is the facet listing. `fields[].values[].n` is the facet count. Fields present in `fields[]` are complete. Fields above the 10,000 distinct-value cap are listed in `fields_skipped[]` instead of being partially materialized. ## Activity ```http GET /v2/activity/snapshots?since=1747200000000&limit=50 ``` | Query param | Required | Purpose | | --- | --- | --- | | `since` | yes | Epoch-ms lower bound on `ts_ms`. | | `limit` | no | Cap 500, default 50. | | `namespace` | no | Exact namespace filter. | | `cursor` | no | Pagination cursor from `next_cursor`. | Activity is snapshot lifecycle only. Search history and clickstream events have separate feeds. --- # Query History Source: https://hevlayer.com/docs/api/search-history Layer logs every query the gateway serves into a durable JSONL trail in S3, mirrored into the NVMe cache for fast recent reads. Fetch events that downstream consumers tag back to a query land in a sibling clickstream feed. Together they make a search session reconstructable after the fact — for relevance tuning, A/B comparison, or incident review. Both surfaces are Layer-only. ## Routes | Route | Behavior | | --- | --- | | `GET /v2/namespaces/{ns}/search-history` | Per-namespace query log, newest first. | | `GET /v2/namespaces/{ns}/clickstream` | Fetch events correlated to a search, newest first. | The `/v1/` versions of both routes are identical aliases held for client compatibility. ## Search history entry ```json { "entries": [ { "timestamp": "2026-05-22T08:00:00.000Z", "timestamp_nanos": 1747900800000000000, "namespace": "products", "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6", "raw_query": "wireless headphones", "stable_as_of": 1747900700000, "query": {"vector": "[…]", "top_k": 10, "filters": "[…]"}, "top_result_ids": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"], "tags": ["app:hev-shop", "route:search", "surface:storefront"] } ], "next_cursor": "1747900799000000000" } ``` | Field | Meaning | | --- | --- | | `timestamp` / `timestamp_nanos` | Wall-clock and nanosecond timestamps. `timestamp_nanos` is the pagination cursor. | | `trace_id` | Trace context propagated or generated for the query. Joins to the clickstream feed. | | `raw_query` | Caller-supplied query string from the `x-hevlayer-search-query` header (e.g. the BM25 input). Omitted when the header is absent. | | `stable_as_of` | Epoch-ms namespace watermark used by the served response. Omitted on cold-start gateways before the namespace has a watermark. | | `query` | Structured query summary — vector shape, filters, ranking. | | `top_result_ids` | IDs from the served response, in rank order. | | `tags` | Caller-supplied labels propagated through request headers. Used for ad-hoc segmentation. | ### Writing metadata Set `x-hevlayer-search-query` on query requests to capture the human input, and set `x-hevlayer-tags` to a comma-separated list of segmentation tags. The Python SDK exposes these as `raw_query` and `tags`: ```python query = await client.query_namespace( "products", {"vector": embedding, "top_k": 10, "include_attributes": ["title"]}, raw_query="wireless headphones", tags=["app:hev-shop", "surface:storefront", "route:search", "page:first"], ) history = await client.list_search_history( "products", tags=["app:hev-shop", "route:search", "page:first"], limit=20, ) ``` Keep the query text in `raw_query`; use tags for segmentation, not for duplicating the query string. ### Tag contract Layer splits `x-hevlayer-tags` and `?tag=` on commas, trims whitespace, drops empty values, then sorts and dedupes tags before storing or matching them. Commas are separators and cannot be escaped. Limits: | Limit | Value | | --- | --- | | Max tags | 32 unique tags per request or filter | | Max tag length | 128 bytes | | Allowed characters | ASCII letters, digits, `:`, `_`, `-`, `.`, `/`, `=`, `+` | The list filter uses AND semantics: `?tag=a,b` returns only entries that carry both `a` and `b`. ### Query parameters | Param | Purpose | | --- | --- | | `tag` | Comma-separated tag filter. AND semantics — every tag must match. | | `from` / `to` | RFC3339 time bounds. | | `before` | Pagination cursor; return entries strictly older than the given `timestamp_nanos`. | | `limit` | Cap 500, default 50. | ## Clickstream entry ```json { "events": [ { "timestamp": "2026-05-22T08:00:02.143Z", "timestamp_nanos": 1747900802143000000, "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6", "namespace": "products", "doc_id": "asin-B08N5WRWNW", "tags": ["session:abc123"], "source": "fetch", "served_from": "cache" } ], "next_cursor": "1747900802142000000" } ``` `trace_id` joins to the search-history entry that produced the result; `served_from` distinguishes a cache hit from an upstream fetch. `trace_id` is also a supported query parameter so you can pull every event for a single search session. ## Storage ```text search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl ``` Writes are best-effort and never block the query response. Aerospike holds a recent window for fast reads; S3 is the durable store. A cache outage degrades read latency but not durability — list calls walk the S3 prefix and merge inline. --- # Metrics API Source: https://hevlayer.com/docs/api/metrics The gateway exposes a Prometheus-shaped metrics surface on its own endpoint, plus passthrough routes to the bundled VictoriaMetrics (`vmsingle`) instance so callers can run PromQL without a separate scraper. A self-describing catalog of every metric the gateway emits backs both the dashboard's observe tab and external automation. Per-metric definitions, label conventions, and example PromQL live in the [metrics catalog](#metrics-catalog) below. ## Routes | Route | Behavior | | --- | --- | | `GET /metrics` | Prometheus exposition from the gateway. | | `GET /health` | Liveness, NVMe cache connection state, and per-namespace cache state. | | `GET\|POST /v2/metrics/api/v1/query` | Proxy Prometheus instant query. | | `GET\|POST /v2/metrics/query` | Short-form instant query proxy. | | `GET\|POST /v2/metrics/api/v1/query_range` | Proxy Prometheus range query. | | `GET\|POST /v2/metrics/query_range` | Short-form range query proxy. | | `GET /v2/metrics/catalog` | List every metric the gateway emits. | | `GET /v2/metrics/catalog/{name}` | Fetch one catalog entry, including labels and example PromQL. | ## Health ```http GET /health ``` ```json { "status": "ok", "version": "0.1.0", "aerospike": { "connected": true, "generation": 3 }, "cache_state": [ {"namespace": "products", "state": "warm", "warmed_through": 1747300000123, "warm_inflight": false} ] } ``` Health always responds `200` while the process is up. The dashboard's at-a-glance cards read it for degradation signals: `aerospike.connected` and each `cache_state[].state` tell you whether the gateway is running on a cold or disconnected NVMe cache — typically just after a restart. ## Metrics catalog The catalog is the operator-facing manifest of every metric the gateway emits. Each entry carries name, kind (histogram / counter / gauge), family, labels, description, example PromQL, and (when applicable) the alert shape it backs. ```http GET /v2/metrics/catalog ``` ```jsonc { "version": "1", "entries": [ { "name": "layer_query_duration_seconds", "kind": "histogram", "family": "query", "labels": ["pipeline_id", "namespace", "status"], "description": "Total wall-clock for a query through layer.", "example_promql": "histogram_quantile(0.99, sum by (le) (rate(layer_query_duration_seconds_bucket[5m])))", "alert": { "summary": "Query p99 above target", "expr": "histogram_quantile(0.99, ...) > 0.5", "for": "10m" } } ] } ``` `version` bumps when the JSON shape changes incompatibly. The dashboard observes the catalog and groups entries by family so operators don't have to memorize which prefix lives where. The same content is also exportable from the repo via `cargo run -p metrics-catalog --bin export`. ## PromQL passthrough The `/v2/metrics/api/v1/query` and `query_range` routes are thin passthroughs to VictoriaMetrics. Response bodies match Prometheus's HTTP API shape one-for-one. The short-form aliases under `/v2/metrics/query` exist to make terminal use ergonomic: ```sh curl -sG "$LAYER_GATEWAY_URL/v2/metrics/query" \ --data-urlencode 'query=sum(layer_pipeline_stage_count{stage="pending"})' ``` The gateway does not rewrite queries. Auth happens at the gateway edge; the upstream VictoriaMetrics instance is never customer-reachable. --- # Dashboard Source: https://hevlayer.com/docs/dashboard import Callout from "../../components/docs/Callout.astro"; The Layer dashboard is the operator surface that ships in-cluster alongside the gateway. It reads from the same gateway API customers do — no direct database, Aerospike, or VictoriaMetrics access — and surfaces the views that justify Layer's role as the operating layer between an application and its vector store. Deployments on EKS reach the dashboard at `https://dashboard.hevlayer.com`. Self-hosted installs expose it via the `layer-dashboard` Service. ## Layout The dashboard groups everything operators care about into six tabs: | Tab | What it answers | | --- | --- | | **console** | What is happening right now? At-a-glance gauges + activity log. | | **data** | What is in the indexes? Namespace inventory, snapshot history, schema. | | **read** | Are queries healthy? Query latency, p99 overhead, Aerospike pool. | | **write** | Are writes flowing? Pipelines, embed pools, claim/heartbeat state. | | **cost** | Where is spend going? AWS + Turbopuffer cost lines stacked over time. | | **observe** | Catalog of every metric the gateway exports, grouped by family. | ## Console The first view a new operator opens. Two stripes: - **At a glance** — single-number cards for queries/s, indexed rows/s, fetch p99, cache hit ratio, error budget burn. Each card links into the matching read / write / observe panel. - **Activity log** — newest-first stream backed by `/v2/activity/snapshots` and the search-history endpoints. Filters are persisted in the URL so links survive a refresh. ## Data The inventory view. Click a namespace to drill into: - Schema and approximate row count proxied from Turbopuffer metadata. - Recent snapshot SHAs with field histograms and skipped-field markers — see [snapshots](/docs/api/snapshots). - The current freshness signals (`stable_as_of`, `is_stable`). - The Index policy fields that govern the namespace — `distanceMetric` and the `cache.warming.threads` cap — read from the `Index` resource. - A unified **jobs** panel covering snapshot, warm, and scan jobs (kind, id, status, progress, age) for the namespace. Two operator actions live here: - **Trigger snapshot** — materialize a snapshot for one field on demand (`POST /v2/namespaces/{ns}/snapshots`), picking the source (`origin`, `auto`, `stored`, `cache`). - **Delete namespace** — `DELETE /v2/namespaces/{ns}`, behind a confirm dialog. This is where operators answer "did the last cutover land?" and "what shape is this namespace?" without leaving the dashboard. ## Read Operator answer to "are queries healthy?". Pulls from `layer_query_*` histograms and the cache metrics families: - Query latency p50/p95/p99 over the window. - Layer-side overhead (`query_overhead_seconds`) so the operator can see whether slowness is upstream or local. - Cache hit ratio per namespace, computed from `layer_cache_lookups_total`. - Aerospike pool depth and node state — visible silent-failure surface. - Aerospike stop-writes, surfaced from `layer_aerospike_op_duration_seconds{status="aerospike_stop_writes"}`. ## Write The pipeline operator view. Surfaces pending / in-flight / failed counts per pipeline and per UDF, the same numbers KEDA scales from. Click into a pipeline to see: - Per-stage counts (`pending`, `embedding`, `indexed`, `failed`). - Active claims with `worker_id`, lease expiry, heartbeat age. - Embed pool size and the autoscaling rule attached. - Reset / pause / resume controls for UDFs (mirrors of the `/v2/udfs/{id}/{pause,resume,reset-failed}` endpoints). The **infra** sub-view leads with the **compute pools** defined in `InfraRules/default` — the logical pools (name, kind, GPU type, `maxReplicasPerWorkload`, selector/toleration summary) that pipelines and UDFs select via `spec.scaling.pool` — above the Karpenter NodePools that physically provision their nodes. The write view is the first dashboard stop for PostgreSQL pressure. A growing `pending` count with rising `layer_pg_query_duration_seconds{status="pg_error"}` means the queue is stalled at the indexing-state layer, not at Turbopuffer. Use the [failure-mode runbook](/docs/failure-modes) before resizing or deleting any queue state. ## Cost Stacked-area chart driven by `/v2/cost`, `/v2/cost/timeseries`, and `/v2/cost/rate-card`. Splits cost across AWS infrastructure lines (compute, EBS, S3, NAT, ALB) computed from CloudWatch + AWS Pricing API and Turbopuffer lines (storage, writes, queries) computed from usage metrics × a code-resident rate card. The instance picker uses the rate-card endpoint to project the impact of changing instance types before applying it. Per-namespace attribution is intentionally not modeled — this view is infra-level only. ## Observe The full metrics catalog, grouped by family (Turbopuffer ops, cache, fetch, pipeline progress, resource saturation). Each metric expands into a sparkline that runs the corresponding PromQL through `/v2/metrics/api/v1/query_range`. This is the surface operators use when they need to confirm a hypothesis about behavior without leaving the dashboard for Grafana. ## Operational notes Pipeline status is cached in-memory in the gateway to protect PostgreSQL during repeated dashboard or KEDA polling. `PIPELINE_STATUS_CACHE_TTL_MS` defaults to 15000. - Dashboard views should treat cache cold and upstream failures as separate operator states. A 503 `cache_cold` is recoverable on its own; a 502 from Turbopuffer is not. - Customer workloads never receive the dashboard URL — only the gateway base URL and credentials. - The dashboard is intentionally read-mostly. Mutating actions (UDF pause, InfraRules or scaling edits) are gated through CRD apply or explicit confirm dialogs rather than inline controls. --- # Scans Source: https://hevlayer.com/docs/scans Scans answer ad hoc filter questions about a namespace. ID mode creates an asynchronous job that returns matching document IDs. Count mode returns one number synchronously and uses the latest snapshot when the filter is covered. Use scans for bulk exports, manual inspection, UDF discovery debugging, cache/origin consistency checks, or exact row counts for a filter. ## ID scans ```sh curl -X POST http://gateway:8080/v2/namespaces/products/scans \ -H 'content-type: application/json' \ -d '{"mode": "ids", "source": "auto", "filters": ["category", "Eq", "Electronics"]}' ``` The create call returns `202 Accepted` with a job: ```json { "id": "scan-uuid", "namespace": "products", "source": "auto", "status": "running", "progress": 0, "documents_scanned": 0, "created_at": "2026-05-26T10:00:00Z" } ``` Poll the job, then read results: ```sh curl http://gateway:8080/v2/namespaces/products/scans/scan-uuid curl 'http://gateway:8080/v2/namespaces/products/scans/scan-uuid/results?limit=1000' ``` ## Count scans ```sh curl -X POST http://gateway:8080/v2/namespaces/products/scans \ -H 'content-type: application/json' \ -d '{"mode": "count", "source": "auto", "filters": ["category", "Eq", "Electronics"]}' ``` ```json { "count": 4210, "served_by": "snapshot", "snapshot_sha": "3f9e8b21", "watermark_ms": 1747300000123, "elapsed_ms": 3 } ``` `source: auto` checks the latest snapshot first for single-field `Eq` and `In` filters. If the field is fully present in the snapshot, the response is served by `snapshot`. Otherwise auto falls through to cache or origin. Use `source: snapshot` to require the snapshot path; unsupported filters return `412 precondition_failed`. ## Sources | Source | ID mode | Count mode | | --- | --- | --- | | `auto` | Cache when fresh enough, otherwise origin | Snapshot first, then cache/origin. | | `snapshot` | Not supported | Latest snapshot only; requires eligible `Eq` or `In`. | | `cache` | Aerospike document cache only | Aerospike document cache only. | | `origin` | Turbopuffer paginated scan | Turbopuffer paginated scan. | When `auto` resolves to cache, the gateway applies `_hevlayer_upserted_at <= cache_warmed_through` before the user filter. This makes the scan a stable warmed view instead of a mixed view of old and new rows. ## Filters Scans accept the same Turbopuffer filter array as [query](/docs/api/query). On origin scans, the filter is pushed to Turbopuffer. On cache scans, the gateway evaluates it against cached document attributes. Supported cache operators are `Eq`, `NotEq`, `Gt`, `Gte`, `Lt`, `Lte`, `In`, `NotIn`, `And`, `Or`, and `Not`. If `auto` sees a filter the cache cannot evaluate, it uses origin. Explicit `source: cache` with an unsupported filter fails rather than returning partial results. ## Operational notes - ID scan state is in-memory and ephemeral; it resets on gateway restart. - Count scans have a deadline, default 30s and maximum 300s. - Snapshot-served count scans are exact at the snapshot `watermark_ms`. - Live count scans include `bounded`, `timed_out`, and shard fields. --- # Pipelines Source: https://hevlayer.com/docs/pipelines import Diagram from "../../components/docs/Diagram.astro"; A pipeline indexes documents through staged work whose row count changes. The common shape is **extract** (CPU) and **embed** (GPU). The gateway tracks document state in PostgreSQL and exports queue depth so the operator can autoscale workers through KEDA. Once vectors land in Turbopuffer, query and fetch them through the namespace API — see [Query & Fetch](/docs/api/query). ## Pipeline flow {` CPU worker Gateway GPU worker | POST /v2/pipelines | |---- chunks --> PUT /documents/{doc_id} | | chunks -> S3 + NVMe cache | | state -> PostgreSQL | | | | GET /status <------ KEDA | | | | POST /claim <-------------| | GET /chunks <-------------| | PUT /vectors <------------| | vectors -> Turbopuffer | `} **CPU worker** — reads source data, extracts text/metadata, splits into chunks, calls the stage endpoint. Scales on input queue (e.g. SQS depth, Kafka lag). **GPU worker** — polls the pipeline status endpoint for `pending_count > 0`, fetches chunks from the gateway, runs the embedding model, calls the vectors endpoint. Scales on `pending_count` via KEDA. The gateway handles chunk storage (S3 backing plus embedded Aerospike cache), vector upsert (Turbopuffer), and state tracking (embedded PostgreSQL). Workers are stateless and never connect to gateway-internal stores. ## Prerequisites Pipeline routes are registered only when `DATABASE_URL` is configured. The Helm chart sets `DATABASE_URL` to the gateway pod's loopback PostgreSQL sidecar. The migration runs automatically on startup. ```bash export DATABASE_URL=postgres://hevlayer:hevlayer@localhost:5432/hevlayer ``` ## Pipeline CRD Declare a Pipeline when the operator should own the worker Deployment and KEDA object. See [Pipeline CRD](/docs/kubernetes/pipeline-crd) for the full resource reference. ```yaml apiVersion: hevlayer.com/v1alpha1 kind: Pipeline metadata: name: product-images namespace: layer spec: target: namespace: products worker: image: ghcr.io/hev/product-image-worker:latest batchSize: 64 timeoutSeconds: 60 scaling: pool: cpu mode: autoscale replicas: min: 0 max: 8 ``` `spec.scaling.pool` must name a compute pool in `InfraRules/default`. `mode: fixed` pins replicas to `replicas.min`; `mode: disabled` and `spec.paused: true` scale the worker to 0. ## Gateway API ### Create a pipeline ```bash curl -X POST http://gateway:8080/v2/pipelines \ -H 'content-type: application/json' \ -d '{ "id": "product-images", "target_namespace": "products", "distance_metric": "cosine_distance" }' ``` `distance_metric` defaults to `cosine_distance`. Returns 409 if the pipeline already exists. ### Stage a document (CPU worker) ```bash curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW \ -H 'content-type: application/json' \ -d '{ "chunks": [ {"id": "asin-B08N5WRWNW-0", "text": "Wireless noise-cancelling headphones"}, {"id": "asin-B08N5WRWNW-1", "text": "40-hour battery life", "metadata": {"page": 2}} ] }' ``` Each chunk is stored durably in S3 and cached in Aerospike (set: `pipe_{target_namespace}`). The document is marked `pending`. Re-staging the same document ID replaces the previous chunk backing and resets it to `pending`. ### Get pipeline status (KEDA polling) ```bash curl http://gateway:8080/v2/pipelines/product-images/status ``` ```json { "pipeline_id": "product-images", "counts": {"pending": 142, "indexed": 8530}, "pending_count": 142 } ``` `pending_count` is the field KEDA watches. When it hits zero, GPU workers scale to zero. ### Read chunks and write vectors (GPU worker) ```bash curl http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/chunks ``` After embedding, write vectors back. This upserts to Turbopuffer and marks the document `indexed`: ```bash curl -X PUT http://gateway:8080/v2/pipelines/product-images/documents/asin-B08N5WRWNW/vectors \ -H 'content-type: application/json' \ -d '{ "vectors": [ {"id": "asin-B08N5WRWNW-0", "vector": [0.0012, -0.043], "attributes": {"text": "..."}} ] }' ``` ### Claim, heartbeat, stage Workers claim staged documents through layer instead of mutating Postgres directly. Layer sets `claimed_by` and `claimed_at`, moves rows to the requested claim stage, recovers stale claims older than the lease, and uses `FOR UPDATE SKIP LOCKED` so multiple workers can claim concurrently. ```bash POST /v2/pipelines/product-images/claim { "stage": "pending", "claim_stage": "embedding", "limit": 2000, "worker_id": "gpu-worker-0", "lease_seconds": 900 } ``` Heartbeat long-running claims: ```bash POST /v2/pipelines/product-images/documents/heartbeat { "document_ids": ["B07XYZ123"], "stage": "embedding", "worker_id": "gpu-worker-0" } ``` Move claimed documents to a final stage: ```bash POST /v2/pipelines/product-images/documents/stage { "document_ids": ["B07XYZ123"], "stage": "indexed", "from_stage": "embedding", "worker_id": "gpu-worker-0" } ``` Use `stage: "pending"` for release and `stage: "failed"` for permanent failures. Use `create_missing: true` without `from_stage`/`worker_id` when a pipeline enqueues lightweight document IDs without chunks (e.g. aggregate refresh jobs). Pipeline queues are segmented. Layer writes document IDs and chunk ID lists into compressed S3 manifests and stores only segment leases and counters in PostgreSQL, so queues scale by segment count rather than by one PostgreSQL row per document. Set `PIPELINE_SEGMENT_SIZE` to tune the number of logical documents per segment. The Helm default segment size is 10,000, so 1,000,000 lightweight IDs become about 100 PostgreSQL segment rows. Segment manifests are queue state, not durable history. Layer deletes superseded manifests after segment splits, deletes completed manifests when documents move to `indexed`, and removes the pipeline segment prefix when the pipeline is deleted. ## Document lifecycle ``` stage_document() write_vectors() (new doc) ──────────────────► pending ──────────────────► indexed ▲ │ re-stage (idempotent) ``` - **pending** — chunks stored in Aerospike, waiting for embedding. - **indexed** — vectors written to Turbopuffer. Re-staging a document resets it to `pending` with new chunks. Useful for reprocessing after source data changes. ## Failure model - Turbopuffer write failures are hard: the vectors route returns 502 and the document stays in `embedding` for re-claim. - Aerospike cache failures do not block chunk reads when S3 backing is present; PostgreSQL connectivity surfaces as 500 and should be retried with backoff. - Lease expiry is handled server-side. A worker that crashes mid-embedding has its documents recovered on the next claim sweep. ## Autoscaling The operator emits KEDA directly from `Pipeline.spec.scaling`. For manual workers that are not represented by a Pipeline CR, use the same Prometheus signal: ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: gpu-embed-worker spec: scaleTargetRef: name: gpu-embed-worker minReplicaCount: 0 maxReplicaCount: 8 triggers: - type: prometheus metadata: serverAddress: http://layer-gateway.layer.svc.cluster.local:8080/v2/metrics metricName: product_images_pending query: 'sum(layer_pipeline_stage_count{pipeline_id="product-images",stage="pending"}) or vector(0)' threshold: "50" # 1 replica per 50 pending docs activationThreshold: "1" # scale from 0 when any doc is pending ``` This keeps autoscaling close to the same source of truth Layer uses for claims while keeping PostgreSQL private to the gateway pod. ### CPU workers — scale on input source CPU workers scale on whatever feeds them — SQS queue depth, Kafka consumer lag, S3 event notifications, etc. This is independent of the pipeline API. ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: cpu-extract-worker spec: triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.us-east-1.amazonaws.com/123456789/product-images queueLength: "10" awsRegion: us-east-1 ``` --- # UDFs Source: https://hevlayer.com/docs/udfs import Diagram from "../../components/docs/Diagram.astro"; A UDF is a stateless worker that preserves row count: one input row produces one derived attribute on the same row. Embeddings, classifications, tags, and attribute backfills all use the same primitive. Use a pipeline when external data becomes rows, or when one row fans out into many rows. Use a UDF when rows already in Layer acquire derived attributes. {` Gateway Worker Deployment | create ID scan | | POST /v2/namespaces/{ns}/scans | | filters: spec.filter | | | | enqueue (namespace, id) rows | | into the UDF queue | | | | <----- POST /v2/udfs/{id}/claim ---| | -----> rows + input columns ------>| | | fn(*, id, title) -> list[str] | <- POST /v2/udfs/{id}/items/complete | | | writeback: Turbopuffer patch_columns `} ## Author a worker The Python SDK turns a normal function into the claim/process/complete loop. ```python import asyncio from hevlayer.udf import PermanentError, TransientError, run_udf_worker, udf @udf(inputs=["id", "title", "description"], output="tags", kind="tags") def tag_product(*, id: str, title: str | None, description: str | None) -> list[str]: if not title: raise PermanentError(f"{id}: missing title") try: text = f"{title} {description or ''}".lower() except TypeError as exc: raise TransientError(str(exc)) from exc tags: list[str] = [] if "wireless" in text: tags.append("wireless") if "waterproof" in text: tags.append("waterproof") return tags or ["uncategorized"] if __name__ == "__main__": asyncio.run(run_udf_worker(tag_product, udf_id="product-tags")) ``` Function parameters are keyword-only and named to match `inputs`. Raise `TransientError` for retryable work and `PermanentError` for unrecoverable input. ## Declare the function Apply a `Function` CRD. The operator emits a worker `Deployment`, optional `Service` for push dispatch, and a KEDA `ScaledObject` from `spec.scaling`. The gateway uses the Function spec to register the UDF queue and discovery policy. ```yaml apiVersion: hevlayer.com/v1alpha1 kind: Function metadata: name: product-tags namespace: hev-shop spec: paused: false targetNamespaces: - amazon-products inputs: - id - title - description output: attribute: tags kind: tags version: v1 filter: - "Or" - - ["tags_v", "NotEq", "v1"] - ["tags_v", "Eq", null] triggers: - discovery worker: image: ghcr.io/hev/hev-shop-udf-product-tags:latest dispatch: pull batchSize: 16 timeoutSeconds: 30 schedule: discoveryIntervalSeconds: 300 leaseSeconds: 120 maxInFlightBatches: 4 maxConcurrentScans: 1 retry: maxAttempts: 6 initialBackoffSeconds: 5 maxBackoffSeconds: 300 scaling: pool: cpu mode: autoscale replicas: min: 0 max: 4 ``` `spec.filter` is the same JSON tuple syntax used in Turbopuffer queries. The 0.1 CRD preserves array-form filters, so compound expressions like the example above can be applied directly. The worker pod receives `HEVLAYER_UDF_ID`, `HEVLAYER_BASE_URL`, `HEVLAYER_UDF_BATCH_SIZE`, `HEVLAYER_UDF_TIMEOUT_SECONDS`, `HEVLAYER_UDF_LEASE_SECONDS`, and `LAYER_GATEWAY_API_KEY`. The CRD is the source of truth for the worker shape. Use `POST /v2/udfs/{id}/discover`, `claim`, and `complete` only for runtime coordination and manual recovery; do not create a separate Deployment for the same function unless you also take over scaling and placement. ## Scaling and placement `spec.scaling` is the scaling contract for the Function worker. | Field | Purpose | | --- | --- | | `pool` | Name of a compute pool in `InfraRules/default`. | | `mode` | `autoscale`, `fixed`, or `disabled`. | | `replicas.min` | Minimum worker replicas. Use `1` for warm workers. | | `replicas.max` | Maximum worker replicas; must not exceed the pool cap. | `InfraRules` owns shared placement: node selectors, tolerations, resource requests, and per-workload replica ceilings. Workload specs choose a pool; they do not repeat placement rules. For extra pod-level config, set `spec.worker.podSpec`. It is deep-merged into the operator pod spec. Container array overrides are not merged. ## Gateway API In Kubernetes installs the Function CRD is the source of truth and the runtime API below is registered from it. The routes are the same surface the Python SDK drives, and the path you reach for to register a UDF without the operator or to coordinate and recover workers by hand. ### Spec routes | Route | Behavior | | --- | --- | | `POST /v2/udfs` | Create a UDF definition and queue. | | `GET /v2/udfs` | List UDFs. | | `GET /v2/udfs/{id}` | Read a UDF. | | `DELETE /v2/udfs/{id}` | Delete a UDF and its queue (does not delete written output). | | `GET /v2/udfs/{id}/status` | Queue depth, in-flight, failed counts. | The create body carries the same shape the CRD `spec` expresses: ```http POST /v2/udfs Content-Type: application/json { "id": "product-tags", "spec": { "target_namespaces": ["amazon-products"], "inputs": ["id", "title", "description"], "output": {"attribute": "tags", "kind": "tags", "version": "v1"}, "filter": ["Or", ["tags_v", "NotEq", "v1"], ["tags_v", "Eq", null]], "triggers": ["discovery"], "worker": { "image": "ghcr.io/hev/hev-shop-udf-product-tags:latest", "port": 8080, "batch_size": 16, "timeout_seconds": 30 }, "schedule": { "discovery_interval_seconds": 300, "lease_seconds": 120, "max_in_flight_batches": 4, "max_concurrent_scans": 1 }, "retry": {"max_attempts": 6, "initial_backoff_seconds": 5, "max_backoff_seconds": 300} } } ``` ### Lifecycle routes | Route | Behavior | | --- | --- | | `POST /v2/udfs/{id}/pause` | Stop both discovery and dispatch. Workers drain in-flight then idle. | | `POST /v2/udfs/{id}/resume` | Resume discovery and dispatch. | | `POST /v2/udfs/{id}/reset-failed` | Move every row in `failed` back to `pending`. | | `POST /v2/udfs/{id}/discover` | Trigger a discovery sweep immediately. | `reset-failed` is the recovery path after a transient upstream incident — for permanent issues, fix the input shape or bump `spec.output.version` and re-apply. ### Worker coordination routes | Route | Behavior | | --- | --- | | `POST /v2/udfs/{id}/claim` | Claim a batch of rows for processing. | | `POST /v2/udfs/{id}/items/heartbeat` | Extend the lease on in-flight items. | | `POST /v2/udfs/{id}/items/complete` | Report success and persist output. | | `POST /v2/udfs/{id}/items/fail` | Report failure (transient or permanent). | The Python SDK's `run_udf_worker` implements the full loop — most workloads should never call these routes directly. ```http POST /v2/udfs/product-tags/items/complete Content-Type: application/json { "worker_id": "udf-worker-0", "items": [ {"namespace": "amazon-products", "id": "asin-B08N5WRWNW", "output": ["wireless", "waterproof"]} ] } ``` `claim` returns the batch as `(namespace, id)` pairs alongside the input columns the spec declared. Rows the gateway can't bind from the index (missing required inputs) surface as bind errors, not silent skips, so the worker can fail them explicitly rather than retry forever. On `fail`, `kind: transient` honors `spec.retry` while `kind: permanent` dead-letters immediately — the SDK derives `kind` from `TransientError` / `PermanentError`. ### Writeback and discovery UDF outputs are patched onto the target row as the named attribute. `output.kind` is an SDK type hint; writeback semantics are the same for tags, classifications, scalars, and vectors. When `spec.output.version` is set, the gateway atomically writes the output and the matching `{attribute}_v` marker in a single patch. Discovery sweeps create an ID scan with `spec.filter` against each `target_namespace`. Returned IDs are enqueued and deduplicated. The first sweep after create/apply is implicit; subsequent sweeps run on `schedule.discovery_interval_seconds`. ## Lifecycle ```sh kubectl get function product-tags kubectl describe function product-tags curl -H "authorization: Bearer $LAYER_GATEWAY_API_KEY" \ $LAYER_GATEWAY_URL/v2/udfs/product-tags/status kubectl patch function product-tags --type=merge -p '{"spec":{"paused":true}}' kubectl patch function product-tags --type=merge -p '{"spec":{"paused":false}}' curl -X POST -H "authorization: Bearer $LAYER_GATEWAY_API_KEY" \ $LAYER_GATEWAY_URL/v2/udfs/product-tags/reset-failed kubectl delete function product-tags ``` Deletion garbage-collects the operator-managed Deployment, Service, and ScaledObject. Written outputs are not deleted. ## Version markers `spec.output.version` is the re-run safety rail. When set, the gateway stamps `{attribute}_v` alongside every output write. Bump the version and keep the canonical stale filter when a model, taxonomy, or prompt changes. ## Tuning knobs | Knob | What it bounds | | --- | --- | | `worker.batchSize` | Rows per worker batch. | | `worker.timeoutSeconds` | Worker call timeout. | | `schedule.leaseSeconds` | How long a claim is held before reissue. | | `schedule.discoveryIntervalSeconds` | Time between discovery scan jobs. | | `schedule.maxInFlightBatches` | Concurrent worker batches per UDF. | | `schedule.maxConcurrentScans` | Concurrent namespace discovery jobs. | | `retry.maxAttempts` | Tries before a row lands in `failed`. | ## Not in 0.1 - Cross-namespace aggregate UDFs. - Chunkers or fan-out transforms; those remain pipelines. - Multi-output UDFs. - Managed image builds. --- # hev-shop Source: https://hevlayer.com/docs/hev-shop import LinkGrid from "../../components/docs/LinkGrid.astro"; ## What hev-shop is hev-shop is a live semantic shopping app built on the Layer gateway. It turns Amazon Reviews 2023 product and review data into vectors, writes through Layer into Turbopuffer, and serves search, filters, product pages, and review-derived tags. The running storefront is public so you can see what a Layer-backed workload looks like end to end. The source code is not currently open source — it ships as a reference starter kit granted to design-preview participants. ## Reference starter kit Design-preview participants get private repo access and fork hev-shop as the starting point for their own workload. The pieces worth knowing before you fork: - indexer/app/layer_client.py — single HTTP path to the Layer gateway. - indexer/app/pipeline.py — claim, heartbeat, stage, and completion lifecycle. - web/app/api/search/route.ts and web/lib/backend.ts — search through the backend with stable_as_of preserved. - helm/hev-shop — pipeline-metric scaling and optional CPU/GPU node pools. ## Why it matters The repo is not a generic ecommerce starter. It makes the application contract concrete: stage work, claim work, embed it, write vectors, query with freshness signals, and let the gateway own the Turbopuffer edge — so your team starts from a working pattern, not a blank slate.