Search functions
These functions are used in conjunction with the @@ operator (the 'matches' operator) to either collect the relevance score or highlight the searched keywords within the content.
| Function | Description | ||
|---|---|---|---|
search::analyze() | Returns the output of a defined search analyzer | ||
search::linear() | Performs weighted linear search | search::offsets() | Returns the position of the matching keywords |
search::rrf() | Performs RRF (reciprocal rank fusion) search | ||
search::score() | Returns the relevance score |
Note
The examples below assume the following queries:
search::analyze
The search_analyze function returns the outut of a defined search analyzer on an input string.
First define the analyzer using the DEFINE ANALYZER statement
Next you can pass the analyzer to the search::analyzefunction. The following example shows this function, and its output, when used in a RETURN statement:
search::highlight
The search::highlight function highlights the matching keywords for the predicate reference number.
The following example shows this function, and its output, when used in a RETURN statement:
The optional Boolean parameter can be set to true to explicitly request that the whole found term be highlighted,
or set to false to highlight only the sequence of characters we are looking for. This must be used with an edgengram or ngram filter.
The default value is true.
search::linear
Notes on the arguments and output of this function:
Input:
lists- array of result arrays. Each inner array must be pre‑sorted most‑relevant‑first (BM25 score descending, distance ascending already inverted, etc.).weights- An array of numeric weights corresponding to each result(must have same length as results)limit- Maximum number of documents to return (must be ≥ 1)norm- Normalization method: "minmax" for MinMax normalization or "zscore" for Z-score normalization
Processing:
Computes the union of all candidate ids.
The function automatically extracts scores from documents using the following priority:
distancefield - converted using1.0 / (1.0 + distance)(lower distance = higher score)ft_scorefield - used directly (full-text search scores)scorefield - used directly (generic scores)Rank-based fallback -
1.0 / (1.0 + rank)if no score field is found
Normalization Methods:
MinMax: Scales scores to [0,1] range using
(score - min) / (max - min)Z-score: Standardizes scores using
(score - mean) / std_dev
When merging field data from the per‑list rows, keeps the first non‑null value encountered in the order the lists were supplied, or the last one if there are several fields with the same key.
Sorts by
linear_scoredescending and truncates to limit.
Output:
Array of merged result objects, each containing original fields and an added fuse_score.
Output of the final search::linear() queries:
search::offsets
The search::offsets function returns the position of the matching keywords for the predicate reference number.
The following example shows this function, and its output, when used in a RETURN statement:
The output returns the start s and end e positions of each matched term found within the original field.
The full-text index is capable of indexing both single strings and arrays of strings. In this example, the key 0 indicates that we're highlighting the first string within the title field, which contains an array of strings.
The optional boolean parameter can be set to true to explicitly request that the whole found term be highlighted,
or set to false to highlight only the sequence of characters we are looking for. This must be used with an edgengram or ngram filter.
The default value is true.
search::rrf
Notes on the arguments and output of this function:
Input:
lists: array of result arrays. Each inner array must be pre‑sorted most‑relevant‑first (BM25 score descending, distance ascending already inverted, etc.).
limit: maximum number of fused results to return.
k (optional): RRF constant; defaults to 60.
See this paper for why 60 tends to be the default k value:
Our intuition in choosing this formula derived from fact that while highly-ranked documents are more important, the importance of lower-ranked documents does not vanish as it would were, say, an exponential function used. The constant
kmitigates the impact of high rankings by outlier systems.
Processing:
Computes the union of all candidate ids.
For each candidate, derives its rank in each list and computes
rff_score = Σ 1/(k + rank).When merging field data from the per‑list rows, keeps the first non‑null value encountered in the order the lists were supplied, or the last one if there are several fields with the same key.
Sorts by
rff_scoredescending and truncates to limit.
Output:
Array of merged result objects, each containing original fields and an added fuse_score.
Output of the final search::rrf() query:
search::score
The search::score function returns the relevance score corresponding to the given 'matches' predicate reference numbers.
The following example shows this function, and its output, when used in a RETURN statement: