# hev layer > Retrieval gateway and transform runtime around Turbopuffer. hev layer is a transparent proxy in front of Turbopuffer that adds operational semantics — document cache, namespace snapshots, search history, pipeline state — plus a Kubernetes-native UDF transform runtime for embedding, classification, tagging, and attribute migration. These docs are queryable from the command line — better than reading this file: install `ask` (`go install github.com/hev/ask/cmd/ask@latest`), then `ask --endpoint https://hevlayer.com/api/ask search ""` for ranked sections with deep links, `section get ""` for detail, `overview` for the full map. Setup and verbs: https://hevlayer.com/docs/agents The full concatenated docs are at https://hevlayer.com/llms-full.txt. The docs search knowledge graph is at https://hevlayer.com/docs/search-knowledge-graph. ## Overview - [Introduction](https://hevlayer.com/docs): Layer is a gateway and function runtime for modern retrieval systems. It scales compute for multi-stage indexing pipelines and runs functions across every row of your index, with all durable state in object storage. - [Concepts](https://hevlayer.com/docs/concepts): How the gateway composes Turbopuffer, NVMe cache, PostgreSQL, S3, and metrics — and the core nouns you'll work with. - [Document model](https://hevlayer.com/docs/document-model): A Layer document and the reserved attributes the gateway manages on every Turbopuffer row. - [No Guarantees](https://hevlayer.com/docs/guarantees): Layer can't offer guarantees — here's what we commit to instead. - [Tradeoffs](https://hevlayer.com/docs/tradeoffs): The current product posture and the cases it is not trying to cover. - [Limits](https://hevlayer.com/docs/limits): Current ceilings inherited from the components we ship with, and what we don't cap. - [Agents](https://hevlayer.com/docs/agents): Use the Layer docs from your coding agent: install the ask CLI, add a one-file skill, and get grounded answers with citations. - [Roadmap & Changelog](https://hevlayer.com/docs/roadmap): Where hev layer is headed next, and what has shipped. ## Operations - [Install](https://hevlayer.com/docs/install): How to bring up a hev layer environment: AWS resources via Terraform, runtime via Helm. - [Terraform](https://hevlayer.com/docs/install/terraform): What the hev layer Terraform configuration provisions in AWS — and what it leaves for you to bring. - [Helm Install](https://hevlayer.com/docs/install/helm): Install the hev layer gateway, operator, and document cache into a Kubernetes cluster. - [Failure Modes](https://hevlayer.com/docs/failure-modes): How reads and writes degrade when the gateway, cache, or pipeline runs into trouble. - [Operator Overview](https://hevlayer.com/docs/kubernetes/operator): What layer-operator reconciles and how it relates to the gateway. - [Index CRD](https://hevlayer.com/docs/kubernetes/index-crd): Declarative representation of a namespace managed by Layer. - [InfraRules CRD](https://hevlayer.com/docs/kubernetes/scaling-crd): Cluster-wide compute pools, document cache rules, and workload scaling. - [Pipeline CRD](https://hevlayer.com/docs/kubernetes/pipeline-crd): Staged row-changing work declared as a Kubernetes resource. - [Function CRD](https://hevlayer.com/docs/kubernetes/function-crd): Stateless user-defined functions declared as Kubernetes resources. ## API - [Introduction](https://hevlayer.com/docs/api/introduction): What Layer adds on top of the Turbopuffer wire contract, and how to point a client at the gateway. - [Write & Stage](https://hevlayer.com/docs/api/write): Upsert, delete, patch, and stage rows in a namespace. - [Query & Fetch](https://hevlayer.com/docs/api/query): Vector similarity search with strong-consistent watermark handling, plus pull-through document fetch by id. - [Namespace metadata](https://hevlayer.com/docs/api/namespace-metadata): Read namespace metadata enriched with Layer freshness signals. - [Scan](https://hevlayer.com/docs/api/scans): Filter scans in ID and count mode. - [Result Count](https://hevlayer.com/docs/api/result-count): Conclusive or bounded count over ranked FTS and vector queries. - [Warm cache](https://hevlayer.com/docs/api/warm-cache): Warm a namespace's NVMe cache and snapshot mirror. - [Snapshot History](https://hevlayer.com/docs/api/snapshots): Facet snapshot jobs, history, bodies, and activity streams. - [Query History](https://hevlayer.com/docs/api/search-history): Per-namespace query and clickstream history backed by JSONL in S3. - [Metrics API](https://hevlayer.com/docs/api/metrics): Prometheus exposition, PromQL passthrough, and the gateway metrics catalog. ## Guides - [Dashboard](https://hevlayer.com/docs/dashboard): Pipeline, worker, scaling, read/write health, cost, and observability views for operators. - [Scans](https://hevlayer.com/docs/scans): Answer filter-shaped questions with ID jobs or synchronous counts. - [Pipelines](https://hevlayer.com/docs/pipelines): CPU extraction, GPU embedding, chunk handoff, and KEDA scaling signals. - [UDFs](https://hevlayer.com/docs/udfs): User-defined functions over rows of a Layer index. - [hev-shop](https://hevlayer.com/docs/hev-shop): Reference semantic-search application built on Layer. Source included with the design preview. ## Search knowledge graph Source: https://hevlayer.com/docs/search-knowledge-graph Version: 2 Generated: 2026-06-05T21:31:25.527Z Content hash: 4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458 Context: ## Layer (hev layer) Layer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`. ### Core building blocks - **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime. - **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status. - **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic. ### Key concepts users ask about - **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`. - **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them. - **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response. - **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable. - **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts. - **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`. - **Dashboard** — read-mostly operator GUI reading the same gateway API. ### How users talk about it Users say "the gateway," "drop-in Turbopuffer client," "warm the cache," "strongly consistent query," "snapshot," "facet counts," "scan a filter," "stage/claim/embed," "UDF/function," "compute pool," and "scale to zero." Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache). Glossary: - Gateway: The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. Aliases: layer-gateway, the proxy, rust gateway. - stable watermark: Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads. Aliases: watermark, stableasof, consistency watermark. - pull-through cache: NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency. Aliases: document cache, nvme cache, aerospike. - UDF: A stateless worker that computes one derived attribute per row of an index, without changing row count. Aliases: user-defined function, function, udfs. - pipeline: A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output. Aliases: pipelines, indexing pipeline. - operator: The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources. Aliases: layer-operator, k8s operator, kubernetes operator. - CRD: Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules. Aliases: custom resource definition, index crd, function crd, pipeline crd, infrarules. - snapshot: A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable. Aliases: snapshots, facet snapshot, facet histogram. - scan: A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously. Aliases: scans, filter scan. - ask CLI: Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key. Aliases: ask, hevlayer-docs skill.