# hev layer

> Retrieval gateway and transform runtime around Turbopuffer.

hev layer is a transparent proxy in front of Turbopuffer that adds operational semantics — document cache, namespace snapshots, search history, pipeline state — plus a Kubernetes-native UDF transform runtime for embedding, classification, tagging, and attribute migration.

These docs are queryable from the command line — better than reading this file: install `ask` (`go install github.com/hev/ask/cmd/ask@latest`), then `ask --endpoint https://hevlayer.com/api/ask search "<question>"` for ranked sections with deep links, `section get "<id>"` for detail, `overview` for the full map. Setup and verbs: https://hevlayer.com/docs/agents

The full concatenated docs are at https://hevlayer.com/llms-full.txt.
The docs search knowledge graph is at https://hevlayer.com/docs/search-knowledge-graph.

## Overview
- [Introduction](https://hevlayer.com/docs): Layer is a gateway and function runtime for modern retrieval systems. It scales compute for multi-stage indexing pipelines and runs functions across every row of your index, with all durable state in object storage.
- [Concepts](https://hevlayer.com/docs/concepts): How the gateway composes Turbopuffer, NVMe cache, PostgreSQL, S3, and metrics — and the core nouns you'll work with.
- [Document model](https://hevlayer.com/docs/document-model): A Layer document and the reserved attributes the gateway manages on every Turbopuffer row.
- [No Guarantees](https://hevlayer.com/docs/guarantees): Layer can't offer guarantees — here's what we commit to instead.
- [Tradeoffs](https://hevlayer.com/docs/tradeoffs): The current product posture and the cases it is not trying to cover.
- [Limits](https://hevlayer.com/docs/limits): Current ceilings inherited from the components we ship with, and what we don't cap.
- [Agents](https://hevlayer.com/docs/agents): Use the Layer docs from your coding agent: install the ask CLI, add a one-file skill, and get grounded answers with citations.
- [Roadmap & Changelog](https://hevlayer.com/docs/roadmap): Where hev layer is headed next, and what has shipped.

## Operations
- [Install](https://hevlayer.com/docs/install): How to bring up a hev layer environment: AWS resources via Terraform, runtime via Helm.
- [Terraform](https://hevlayer.com/docs/install/terraform): What the hev layer Terraform configuration provisions in AWS — and what it leaves for you to bring.
- [Helm Install](https://hevlayer.com/docs/install/helm): Install the hev layer gateway, operator, and document cache into a Kubernetes cluster.
- [Failure Modes](https://hevlayer.com/docs/failure-modes): How reads and writes degrade when the gateway, cache, or pipeline runs into trouble.
- [Operator Overview](https://hevlayer.com/docs/kubernetes/operator): What layer-operator reconciles and how it relates to the gateway.
- [Index CRD](https://hevlayer.com/docs/kubernetes/index-crd): Declarative representation of a namespace managed by Layer.
- [InfraRules CRD](https://hevlayer.com/docs/kubernetes/scaling-crd): Cluster-wide compute pools, document cache rules, and workload scaling.
- [Pipeline CRD](https://hevlayer.com/docs/kubernetes/pipeline-crd): Staged row-changing work declared as a Kubernetes resource.
- [Function CRD](https://hevlayer.com/docs/kubernetes/function-crd): Stateless user-defined functions declared as Kubernetes resources.

## API
- [Introduction](https://hevlayer.com/docs/api/introduction): What Layer adds on top of the Turbopuffer wire contract, and how to point a client at the gateway.
- [Write & Stage](https://hevlayer.com/docs/api/write): Upsert, delete, patch, and stage rows in a namespace.
- [Query & Fetch](https://hevlayer.com/docs/api/query): Vector similarity search with strong-consistent watermark handling, plus pull-through document fetch by id.
- [Namespace metadata](https://hevlayer.com/docs/api/namespace-metadata): Read namespace metadata enriched with Layer freshness signals.
- [Scan](https://hevlayer.com/docs/api/scans): Filter scans in ID and count mode.
- [Result Count](https://hevlayer.com/docs/api/result-count): Conclusive or bounded count over ranked FTS and vector queries.
- [Warm cache](https://hevlayer.com/docs/api/warm-cache): Warm a namespace's NVMe cache and snapshot mirror.
- [Snapshot History](https://hevlayer.com/docs/api/snapshots): Facet snapshot jobs, history, bodies, and activity streams.
- [Query History](https://hevlayer.com/docs/api/search-history): Per-namespace query and clickstream history backed by JSONL in S3.
- [Metrics API](https://hevlayer.com/docs/api/metrics): Prometheus exposition, PromQL passthrough, and the gateway metrics catalog.

## Guides
- [Dashboard](https://hevlayer.com/docs/dashboard): Pipeline, worker, scaling, read/write health, cost, and observability views for operators.
- [Scans](https://hevlayer.com/docs/scans): Answer filter-shaped questions with ID jobs or synchronous counts.
- [Pipelines](https://hevlayer.com/docs/pipelines): CPU extraction, GPU embedding, chunk handoff, and KEDA scaling signals.
- [UDFs](https://hevlayer.com/docs/udfs): User-defined functions over rows of a Layer index.
- [hev-shop](https://hevlayer.com/docs/hev-shop): Reference semantic-search application built on Layer. Source included with the design preview.

## Search knowledge graph

Source: https://hevlayer.com/docs/search-knowledge-graph
Version: 2
Generated: 2026-06-05T21:31:25.527Z
Content hash: 4260e62c2a0a88b601f4e8e3c0afc368168b80f47c95ecb6610662420ede6458

Context:
## Layer (hev layer)

Layer is a **gateway and function runtime for retrieval systems**: a Rust proxy (the *gateway*) that fronts **Turbopuffer**, plus a Kubernetes *operator*, both running in your own cluster. The gateway is wire-compatible with the Turbopuffer client API — existing clients keep working when pointed at it — and Layer documents only what it *adds* on top of upstream routes, exposing Layer-only features under `/v2/`.

### Core building blocks
- **Gateway** — transparent Turbopuffer proxy adding fetch, scans, result count, facet snapshots, a pull-through document cache, write-path stamping, query consistency, query/clickstream history, warm jobs, pipelines, and a UDF runtime.
- **Operator** — reconciles four CRDs (`Index`, `InfraRules`, `Pipeline`, `Function`). Decoupled from the gateway, which only ever *reads* CRD status.
- **Backing services** (all open source): **Aerospike** (NVMe document cache, ephemeral), **PostgreSQL** (pipeline/indexing-state queue only), **VictoriaMetrics** (metrics), **Karpenter** (node autoscaling), **KEDA** (pod autoscaling to zero). Durable state lives only in **S3** — Layer processes are stateless and elastic.

### Key concepts users ask about
- **Stable watermark / strong-consistent reads** — a background watcher records an epoch-ms watermark when a namespace's Turbopuffer index status is up-to-date; while updating, queries filter to fully-indexed rows so reads never see partial writes. Surfaced via `stableasof`/`isstable`.
- **Reserved `hevlayer` attributes** — server-stamped write watermark and shard key; users must not write them.
- **Pull-through cache** — Aerospike checked first; misses fall through to Turbopuffer/S3 and backfill. Cache failures are soft (never block reads); upstream failures are hard. Hit/miss reported per response.
- **Snapshots & facets** — content-addressed S3 facet histograms written when a namespace is stable.
- **Scans & result count** — filter-shaped questions: scans return IDs or counts; result count answers ranked FTS/vector match counts.
- **Pipelines vs UDFs** — pipelines stage CPU-extracted chunks and GPU-embed them (row count changes); UDFs run a stateless function per row to compute a derived attribute (row count preserved). Both scale via KEDA off queue depth, pinned to compute pools in `InfraRules`.
- **Dashboard** — read-mostly operator GUI reading the same gateway API.

### How users talk about it
Users say "the gateway," "drop-in Turbopuffer client," "warm the cache," "strongly consistent query," "snapshot," "facet counts," "scan a filter," "stage/claim/embed," "UDF/function," "compute pool," and "scale to zero." Install is two-stage: **Terraform** (AWS resources) then **Helm** (gateway/operator/cache).

Glossary:
- Gateway: The Rust transparent proxy in front of Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. Aliases: layer-gateway, the proxy, rust gateway.
- stable watermark: Epoch-ms cut tracked by the consistency watcher when the upstream index is up-to-date, used to inject a hidden filter for strong-consistent reads. Aliases: watermark, stableasof, consistency watermark.
- pull-through cache: NVMe-backed read accelerator that serves document reads and falls through to origin on miss, never a hard dependency. Aliases: document cache, nvme cache, aerospike.
- UDF: A stateless worker that computes one derived attribute per row of an index, without changing row count. Aliases: user-defined function, function, udfs.
- pipeline: A PostgreSQL-backed staged-work state machine (CPU extract, GPU embed) whose row count can change between input and output. Aliases: pipelines, indexing pipeline.
- operator: The Kubernetes operator that reconciles Layer's CRDs (Index, InfraRules, Pipeline, Function) into worker and scaling resources. Aliases: layer-operator, k8s operator, kubernetes operator.
- CRD: Kubernetes-native resources the operator reconciles to express desired state for indexes, functions, pipelines, and infra rules. Aliases: custom resource definition, index crd, function crd, pipeline crd, infrarules.
- snapshot: A content-addressed S3 facet histogram (listings and counts) written after a namespace is observed stable. Aliases: snapshots, facet snapshot, facet histogram.
- scan: A filter-shaped query that returns matching IDs asynchronously or a matching row count synchronously. Aliases: scans, filter scan.
- ask CLI: Keyless command-line tool that searches, reads, and cites the hev layer docs so a coding agent can answer grounded questions without scraping or an API key. Aliases: ask, hevlayer-docs skill.