ADR-508: Configurable Search Similarity Threshold
Context
Semantic search returns noisy results: nonsense queries match real concepts.
Empirically, kg search query "asdfgh qwerty zxcvb" (random characters) returns
concepts, and every query — relevant or not — tops out around the same score.
Root cause (measured)
The active embedding profile is nomic-ai/nomic-embed-text-v1.5 (768-dim,
local provider; kg_api.embedding_profile, ADR-804). Measuring cosine similarity
directly against the live model:
| Query kind | cosine (max) |
|---|---|
| Gibberish vs domain concepts | ~0.47–0.51 |
| Relevant query ("application security") vs domain concepts | ~0.60–0.65 |
The scoring path is correct — proper cosine over L2-normalized vectors
(api/app/lib/age_client/query.py), no rescaling. The high floor is inherent to
this embedding model: nomic v1.5 has a compressed cosine distribution where even
out-of-distribution text sits at ~0.5.
The defect is therefore calibration, not computation: the clients disagree on the threshold and one of them sits below the model's noise floor.
| Client | default min_similarity |
effect on nomic v1.5 |
|---|---|---|
API model (models/queries.py) |
0.7 | too high — relevant queries return 0 |
CLI (kg search) |
0.7 | same |
FUSE (kg-fuse, ADR-715.1) |
0.5 | too low — below the ~0.5 gibberish floor, so noise passes |
The "right" threshold (~0.55–0.60 for this model) is model- and corpus-dependent. Hardcoding it in three clients is what produced the split-brain behavior. It belongs in configuration.
Related finding (deferred, not this ADR's fix)
The nomic task prefixes (search_query: / search_document:) are configured in the
profile but never applied: _embed_sentence_transformers
(api/app/lib/embedding_model_manager.py) gates on the configured strings, then
applies them via sentence-transformers prompt_name — but the model's named prompts
are empty strings, so prompt_name="query" is a no-op (verified: prefixed and
unprefixed embeddings are identical). Applying the prefixes correctly requires
re-embedding the corpus (query and document spaces must match). Measured on
real-domain text, correct prefixes were neutral-to-slightly-worse for separation,
so fixing this does not address the noise problem and is out of scope here. See
Phase 2 / a future embedding-correctness ADR.
Decision
Introduce a single server-side, runtime-configurable default similarity threshold,
stored in the existing kg_api.platform_config key-value table (migration 031),
and have clients inherit it rather than hardcode their own.
-
Storage. Seed
search_default_similarity_threshold(default0.6, chosen from the measurements above) via a new migration, using the existingset_platform_config/get_platform_confighelpers. Plaintext (non-secret), consistent with otherplatform_configentries. -
API.
SearchRequest.min_similaritybecomesOptional[float](defaultNone). When a request omits it,/query/searchreads the configured default fromplatform_config(falling back to0.6if unset). An admin-gated endpoint (require_permission(...), per theadmin.pyconvention) exposes GET/PUT of the value. -
Clients inherit. CLI (
kg search) and FUSE both change their default from a hardcoded number to "unset", so an omitted flag inherits the server default. This supersedes the FUSE0.5constant (ADR-715.1) and the CLI/API0.7. Explicit--min-similarity/.meta/thresholdstill override per query. -
Surfaces. The value is settable via
operator/configure.py platform-config(already supports it), the new admin API,kg admin search-threshold, and a web settings surface. -
Floor the FUSE auto-adjust (supersedes part of ADR-715.1). ADR-715.1's
mkdirauto-adjust lowered a zero-result query's threshold to "show files". Once the default is calibrated (and clients inherit it), that auto-lowering works against noise filtering: a gibberish query that returns 0 at the default would be dropped below it and resurface noise. Auto-adjust is therefore floored at the server default — it must never lower below it. Because a freshly created FUSE query inherits exactly that default, there is nothing above it to surface at creation, so themkdirprobe now no-ops for inherited queries (the common case) — effectively retiring the auto-lower behavior while the calibrated default does the job. The machinery is retained for a possible future path that creates queries with an explicit, over-tight threshold.
Phased delivery: Phase 1 = schema + API + CLI + FUSE + web (this ADR). Phase 2 = the prefix-application fix + corpus re-embed, gated on a real before/after separation eval (separate ADR).
Consequences
Positive
- One place to tune search precision per deployment; ends the split-brain 0.5/0.7 defaults.
- Operators can raise/lower the floor to match their embedding model and corpus without a redeploy.
- Noise filtering is consistent across surfaces: CLI/API and FUSE all apply the same calibrated default, and FUSE no longer auto-lowers below it (the floor), so gibberish no longer resurfaces on the FUSE surface.
- No new tables or mechanisms — reuses
platform_config,configure.py, RBAC admin endpoints.
Negative
min_similaritybecoming optional touches the API model and every client default; care needed so scripts passing explicit values are unaffected.- A per-deployment default can be mis-set (too high → empty results; too low → noise). Mitigated by the FUSE auto-adjust hint and a sane seed.
Neutral
- Does not change the embedding model or scoring math. The nomic floor remains; the threshold is calibrated around it.
- The prefix no-op bug remains until Phase 2.
Alternatives Considered
- Hardcode a higher default (~0.6) in each client. Rejected: perpetuates split-brain defaults and can't be tuned per deployment/model.
- Fix the nomic prefixes + re-embed as the primary fix. Rejected as the primary lever: measured separation was neutral-to-worse; it's a correctness improvement, not the noise fix. Deferred to Phase 2, evidence-gated.
- Switch to a wider-range embedding model (e.g. the inactive
text-embedding-3-smallprofile). Viable but orthogonal and heavier (external provider, re-embed); the config surface makes the threshold tunable regardless of model, and model choice stays a separateembedding_profiledecision (ADR-804). - Per-ontology thresholds. Deferred: start with one global default; the key-value store can grow scoped keys later if needed.