API
Query History
Layer logs every query the gateway serves into a durable JSONL trail in S3, mirrored into the NVMe cache for fast recent reads. Fetch events that downstream consumers tag back to a query land in a sibling clickstream feed. Together they make a search session reconstructable after the fact — for relevance tuning, A/B comparison, or incident review.
Both surfaces are Layer-only.
Routes
| Route | Behavior |
|---|---|
GET /v2/namespaces/{ns}/search-history | Per-namespace query log, newest first. |
GET /v2/namespaces/{ns}/clickstream | Fetch events correlated to a search, newest first. |
The /v1/ versions of both routes are identical aliases held for client
compatibility.
Search history entry
{
"entries": [
{
"timestamp": "2026-05-22T08:00:00.000Z",
"timestamp_nanos": 1747900800000000000,
"namespace": "products",
"trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
"raw_query": "wireless headphones",
"stable_as_of": 1747900700000,
"query": {"vector": "[…]", "top_k": 10, "filters": "[…]"},
"top_result_ids": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
"tags": ["app:hev-shop", "route:search", "surface:storefront"]
}
],
"next_cursor": "1747900799000000000"
}
| Field | Meaning |
|---|---|
timestamp / timestamp_nanos | Wall-clock and nanosecond timestamps. timestamp_nanos is the pagination cursor. |
trace_id | Trace context propagated or generated for the query. Joins to the clickstream feed. |
raw_query | Caller-supplied query string from the x-hevlayer-search-query header (e.g. the BM25 input). Omitted when the header is absent. |
stable_as_of | Epoch-ms namespace watermark used by the served response. Omitted on cold-start gateways before the namespace has a watermark. |
query | Structured query summary — vector shape, filters, ranking. |
top_result_ids | IDs from the served response, in rank order. |
tags | Caller-supplied labels propagated through request headers. Used for ad-hoc segmentation. |
Writing metadata
Set x-hevlayer-search-query on query requests to capture the human
input, and set x-hevlayer-tags to a comma-separated list of
segmentation tags. The Python SDK exposes these as raw_query and
tags:
query = await client.query_namespace(
"products",
{"vector": embedding, "top_k": 10, "include_attributes": ["title"]},
raw_query="wireless headphones",
tags=["app:hev-shop", "surface:storefront", "route:search", "page:first"],
)
history = await client.list_search_history(
"products",
tags=["app:hev-shop", "route:search", "page:first"],
limit=20,
)
Keep the query text in raw_query; use tags for segmentation, not for
duplicating the query string.
Tag contract
Layer splits x-hevlayer-tags and ?tag= on commas, trims whitespace,
drops empty values, then sorts and dedupes tags before storing or
matching them. Commas are separators and cannot be escaped.
Limits:
| Limit | Value |
|---|---|
| Max tags | 32 unique tags per request or filter |
| Max tag length | 128 bytes |
| Allowed characters | ASCII letters, digits, :, _, -, ., /, =, + |
The list filter uses AND semantics: ?tag=a,b returns only entries that
carry both a and b.
Query parameters
| Param | Purpose |
|---|---|
tag | Comma-separated tag filter. AND semantics — every tag must match. |
from / to | RFC3339 time bounds. |
before | Pagination cursor; return entries strictly older than the given timestamp_nanos. |
limit | Cap 500, default 50. |
Clickstream entry
{
"events": [
{
"timestamp": "2026-05-22T08:00:02.143Z",
"timestamp_nanos": 1747900802143000000,
"trace_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
"namespace": "products",
"doc_id": "asin-B08N5WRWNW",
"tags": ["session:abc123"],
"source": "fetch",
"served_from": "cache"
}
],
"next_cursor": "1747900802142000000"
}
trace_id joins to the search-history entry that produced the result;
served_from distinguishes a cache hit from an upstream fetch.
trace_id is also a supported query parameter so you can pull every
event for a single search session.
Storage
search-history/{namespace}/{YYYY-MM-DD}/{timestamp_nanos}.jsonl
Writes are best-effort and never block the query response. Aerospike holds a recent window for fast reads; S3 is the durable store. A cache outage degrades read latency but not durability — list calls walk the S3 prefix and merge inline.