API

Query History

Layer logs every query the gateway serves into a durable JSONL trail in S3, mirrored into the NVMe cache for fast recent reads. Fetch events that downstream consumers tag back to a query land in a sibling clickstream feed. Together they make a search session reconstructable after the fact — for relevance tuning, A/B comparison, or incident review.

Both surfaces are Layer-only.

Routes

Route	Behavior
`GET /v2/namespaces/{ns}/search-history`	Per-namespace query log, newest first.
`GET /v2/namespaces/{ns}/clickstream`	Fetch events correlated to a search, newest first.

The /v1/ versions of both routes are identical aliases held for client compatibility.

Search history entry

{
  "entries": [
    {
      "timestamp": "2026-05-22T08:00:00.000Z",
      "timestamp_nanos": 1747900800000000000,
      "namespace": "products",
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "raw_query": "wireless headphones",
      "stable_as_of": 1747900700000,
      "query": {"vector": "[…]", "top_k": 10, "filters": "[…]"},
      "top_result_ids": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
      "tags": ["app:hev-shop", "route:search", "surface:storefront"]
    }
  ],
  "next_cursor": "1747900799000000000"
}

Field	Meaning
`timestamp` / `timestamp_nanos`	Wall-clock and nanosecond timestamps. `timestamp_nanos` is the pagination cursor.
`trace_id`	Trace context propagated or generated for the query. Joins to the clickstream feed.
`raw_query`	Caller-supplied query string from the `x-hevlayer-search-query` header (e.g. the BM25 input). Omitted when the header is absent.
`stable_as_of`	Epoch-ms namespace watermark used by the served response. Omitted on cold-start gateways before the namespace has a watermark.
`query`	Structured query summary — vector shape, filters, ranking.
`top_result_ids`	IDs from the served response, in rank order.
`tags`	Caller-supplied labels propagated through request headers. Used for ad-hoc segmentation.

Writing metadata

Set x-hevlayer-search-query on query requests to capture the human input, and set x-hevlayer-tags to a comma-separated list of segmentation tags. The Python SDK exposes these as raw_query and tags:

query = await client.query_namespace(
    "products",
    {"vector": embedding, "top_k": 10, "include_attributes": ["title"]},
    raw_query="wireless headphones",
    tags=["app:hev-shop", "surface:storefront", "route:search", "page:first"],
)

history = await client.list_search_history(
    "products",
    tags=["app:hev-shop", "route:search", "page:first"],
    limit=20,
)

Keep the query text in raw_query; use tags for segmentation, not for duplicating the query string.

Tag contract

Layer splits x-hevlayer-tags and ?tag= on commas, trims whitespace, drops empty values, then sorts and dedupes tags before storing or matching them. Commas are separators and cannot be escaped.

Limits:

Limit	Value
Max tags	32 unique tags per request or filter
Max tag length	128 bytes
Allowed characters	ASCII letters, digits, `:`, `_`, `-`, `.`, `/`, `=`, `+`

The list filter uses AND semantics: ?tag=a,b returns only entries that carry both a and b.

Query parameters

Param	Purpose
`tag`	Comma-separated tag filter. AND semantics — every tag must match.
`from` / `to`	RFC3339 time bounds.
`before`	Pagination cursor; return entries strictly older than the given `timestamp_nanos`.
`limit`	Cap 500, default 50.

Clickstream entry

{
  "events": [
    {
      "timestamp": "2026-05-22T08:00:02.143Z",
      "timestamp_nanos": 1747900802143000000,
      "trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
      "namespace": "products",
      "doc_id": "asin-B08N5WRWNW",
      "tags": ["session:abc123"],
      "source": "fetch",
      "served_from": "cache"
    }
  ],
  "next_cursor": "1747900802142000000"
}

trace_id joins to the search-history entry that produced the result; served_from distinguishes a cache hit from an upstream fetch. trace_id is also a supported query parameter so you can pull every event for a single search session.

Storage

search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl

Writes are best-effort and never block the query response. Aerospike holds a recent window for fast reads; S3 is the durable store. A cache outage degrades read latency but not durability — list calls walk the S3 prefix and merge inline.