Start with install notes or jump straight into the API.

API

Query History

Layer logs every query the gateway serves into a durable JSONL trail in S3, mirrored into the NVMe cache for fast recent reads. Fetch events that downstream consumers tag back to a query land in a sibling clickstream feed. Together they make a search session reconstructable after the fact — for relevance tuning, A/B comparison, or incident review.

Both surfaces are Layer-only.

Routes

RouteBehavior
GET /v2/namespaces/{ns}/search-historyPer-namespace query log, newest first.
GET /v2/namespaces/{ns}/clickstreamFetch events correlated to a search, newest first.

The /v1/ versions of both routes are identical aliases held for client compatibility.

Search history entry

{
  "entries": [
    {
      "timestamp": "2026-05-22T08:00:00.000Z",
      "timestamp_nanos": 1747900800000000000,
      "namespace": "products",
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "raw_query": "wireless headphones",
      "stable_as_of": 1747900700000,
      "query": {"vector": "[…]", "top_k": 10, "filters": "[…]"},
      "top_result_ids": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
      "tags": ["app:hev-shop", "route:search", "surface:storefront"]
    }
  ],
  "next_cursor": "1747900799000000000"
}
FieldMeaning
timestamp / timestamp_nanosWall-clock and nanosecond timestamps. timestamp_nanos is the pagination cursor.
trace_idTrace context propagated or generated for the query. Joins to the clickstream feed.
raw_queryCaller-supplied query string from the x-hevlayer-search-query header (e.g. the BM25 input). Omitted when the header is absent.
stable_as_ofEpoch-ms namespace watermark used by the served response. Omitted on cold-start gateways before the namespace has a watermark.
queryStructured query summary — vector shape, filters, ranking.
top_result_idsIDs from the served response, in rank order.
tagsCaller-supplied labels propagated through request headers. Used for ad-hoc segmentation.

Writing metadata

Set x-hevlayer-search-query on query requests to capture the human input, and set x-hevlayer-tags to a comma-separated list of segmentation tags. The Python SDK exposes these as raw_query and tags:

query = await client.query_namespace(
    "products",
    {"vector": embedding, "top_k": 10, "include_attributes": ["title"]},
    raw_query="wireless headphones",
    tags=["app:hev-shop", "surface:storefront", "route:search", "page:first"],
)

history = await client.list_search_history(
    "products",
    tags=["app:hev-shop", "route:search", "page:first"],
    limit=20,
)

Keep the query text in raw_query; use tags for segmentation, not for duplicating the query string.

Tag contract

Layer splits x-hevlayer-tags and ?tag= on commas, trims whitespace, drops empty values, then sorts and dedupes tags before storing or matching them. Commas are separators and cannot be escaped.

Limits:

LimitValue
Max tags32 unique tags per request or filter
Max tag length128 bytes
Allowed charactersASCII letters, digits, :, _, -, ., /, =, +

The list filter uses AND semantics: ?tag=a,b returns only entries that carry both a and b.

Query parameters

ParamPurpose
tagComma-separated tag filter. AND semantics — every tag must match.
from / toRFC3339 time bounds.
beforePagination cursor; return entries strictly older than the given timestamp_nanos.
limitCap 500, default 50.

Clickstream entry

{
  "events": [
    {
      "timestamp": "2026-05-22T08:00:02.143Z",
      "timestamp_nanos": 1747900802143000000,
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "namespace": "products",
      "doc_id": "asin-B08N5WRWNW",
      "tags": ["session:abc123"],
      "source": "fetch",
      "served_from": "cache"
    }
  ],
  "next_cursor": "1747900802142000000"
}

trace_id joins to the search-history entry that produced the result; served_from distinguishes a cache hit from an upstream fetch. trace_id is also a supported query parameter so you can pull every event for a single search session.

Storage

search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl

Writes are best-effort and never block the query response. Aerospike holds a recent window for fast reads; S3 is the durable store. A cache outage degrades read latency but not durability — list calls walk the S3 prefix and merge inline.

esc