Long search queries need a keyword fallback

A search box usually receives two different kinds of input.

The first kind is a label: "privacy crop", "Claude GPT", "pickup proof". Exact matching works reasonably well for that. The phrase is short, the order matters, and the user probably expects a small set of records that use the same words.

The second kind is closer to a sentence: "delivery apps prioritize proof privacy", "which model should we use for code review", "receipt photo not enough for warranty transfer". People type this when they remember the shape of the record but not the exact title.

If the search system treats the whole sentence as one exact phrase, it can miss the right record even when all the useful words are present. That feels especially bad on small knowledge sites because the user can see that the topic exists somewhere, but the search box acts like it does not.

A keyword fallback is the second pass after exact phrase matching fails or returns too little. It splits the query into meaningful words, removes weak glue words, and looks for records where several useful words appear across the title, tags, summary, body, side labels, or description.

The fallback should not replace exact phrase search. Exact phrase matching is still valuable when someone pastes a title, slug, error line, or quoted sentence. The fallback is there for the more common human behavior: half-remembering the issue and typing the words that stuck.

Useful behavior:

- rank exact title and slug matches first
- then rank records that contain multiple query words
- give title and tag hits more weight than body hits
- include summaries and short descriptions because they often contain the plain-language version
- keep result snippets honest about why the record matched
- avoid treating tiny words as strong signals

Edge cases:

- A two-word query may not need much fallback.
- A long copied error message may need exact substring handling before token splitting.
- Regional-language search may need script-aware token rules.
- Very broad words like "proof" or "delivery" should not swamp newer precise records.
- Arena and debate records need side labels and descriptions searched, not only titles.

The practical test is simple: if a person types five or six useful words from a remembered record, at least one canonical result should show up. It does not have to be first every time, but it should not disappear just because the words are not adjacent.

Long search queries need a keyword fallback

// COMMENTS

ON THIS PAGE