The following functions are provided for semantic text search.
Function | Return type | Description |
|---|---|---|
pgx_similarity_search(view pg_catalog.regclass, query text, num_result integer defaut 5, distance_operator text default '<=>', OUT embedding_uuid uuid, OUT chunk text, OUT distance float8); | SETOF record | Searches the embedding view specified in view to obtain text similar to the text specified in query. You can display results up to the number specified in num_result. You can specify the distance calculation method using distance_operator. |
pgx_similarity_search_checking_index(view pg_catalog.regclass, query text, num_result integer defaut 5, distance_operator text default '<=>', OUT embedding_uuid uuid, OUT chunk text, OUT distance float8); | SETOF record | An error occurs if the index operator defined in the embedding column of the table that references the view does not match the operator specified in distance_operator. Other than the above, it is the same as the pgx_similarity_search function. |
The following functions are provided for hybrid search.
Function | Return type | Description |
|---|---|---|
pgx_hybrid_search(query jsonb, OUT queryid bigint, OUT context_id uuid, OUT chunk text, OUT score float8) | SETOF record | Search the specified embedded view and retrieve the relevant text. For the return value, refer to "3.13.3.3 Details of the pgx_hybrid_search Function". For query details, refer to "Hybrid search results". |
pgx_list_search_results(queryid bigint, subquery_type text) | SETOF record | If trace information for the specified query ID is recorded, it returns the search results. If subquery_type is not specified, it returns the results of a hybrid search. If subquery_type is semantic or fulltext, it returns the results of semantic text search or full-text search conducted internally, respectively. |
pgx_list_search_result_metrics(queryid bigint, subquery_type text) | SETOF record | Returns search results along with the evaluation value of the text chunk. Other specifications follow the pgx_list_search_results function. |
pgx_list_contexts(queryid bigint) | SETOF record | If trace information for the specified query ID is recorded, it extracts and returns a list of text chunks returned as search results in either hybrid search processing or subqueries executed internally. The extracted list of text chunks is inserted into the evaluation value table. |
pgx_hybrid_search_trace_size() | bigint | Return the size of the trace information and evaluation value table in bytes. |
pgx_grant_access_on_hybrid_search_trace(role name) | None | Grant the necessary access rights to the role for recording and evaluating trace information. |
pgx_revoke_access_on_hybrid_search_trace(role name) | None | Revoke the access rights necessary for recording and evaluating trace information assigned to the role. |
The pgx_list_search_results function returns a set of records of the following type, in descending order of score.
Column | Type | Description |
|---|---|---|
queryid | bigint | Quiry ID for hybrid search feature. |
context_id | uuid | Identifier of the returned text chunk. |
chunk | text | Returned text chunk. |
score | real | Hybrid search score. |
The pgx_list_search_results_metrics function returns a set of records of the following type, in descending order of score.
Column | Type | Description |
|---|---|---|
queryid | bigint | Quiry ID for hybrid search feature. |
context_id | uuid | Identifier of the returned text chunk. |
chunk | text | Returned text chunk. |
score | real | Hybrid search score. |
metrics | jsonb | Evaluation value for each input text chunk. |
The pgx_list_contexts function returns a set of records of the following type, in descending order of score.
Column | Type | Description |
|---|---|---|
queryid | bigint | Quiry ID for hybrid search feature. |
context_id | uuid | Identifier of the returned text chunk. |
chunk | text | Returned text chunk. |
query argument
Specify search conditions in JSON format for the query argument. Specify the following for each key.
Key | JSON format | Deccription |
|---|---|---|
target_view | string | Embedded view to be searched. |
search_fusion | string | How to combine semantic text search and full-text search. |
rrf_k | number | Weighted RRF's weight. |
topN | number | Maximum number of results returned from hybrid search. |
semantic.search_text | string | Text to search with semantic text search. |
semantic.distance_operator | string | Distance operator in vector comparison. The default is "<=>". |
semantic.num_result | number | Maximum number of items to be retrieved by semantic text search. |
semantic.score_weight | number | Relative weighting (importance) of semantic text search. |
fulltext.search_condition | string | Full-text search condition (an expression that returns a boolean value specified in the WHERE clause). |
fulltext.num_result | number | Maximum number of items to be retrieved by full-text search. |
fulltext.score_weight | number | Relative weight (importance) of full-text search. |
fulltext.score_expression | string | An expression that returns a floating-point score for full-text search. |
See
For the function used in calculating scores in full-text search, refer to "Full Text Search" in the PostgreSQL Documentation.
Information
You can specify a matching operator for full-text search as a condition for full-text search. Specify the operator supported by the full-text search index defined in the table to be searched. The search text for full-text search must be specified in the pattern required by the matching operator.
Example) Query argument of the pgx_hybrid_search function
{
"target_view": "sample_embeddings",
"search_fusion": "UNION",
"topN": 20,
"semantic": {
"search_text": "text for search",
"distance_operator": "<=>",
"num_result": 10,
"score_weight": 1
},
"fulltext": {
"search_condition": "chunk @@ websearch_to_tsquery('search text')",
"num_result": 10,
"score_weight": 1,
"score_expression": "ts_rank_cd(to_tsvector(chunk), websearch_to_tsquery('search text'))"
}
}The search conditions specified by the query argument
The search conditions specified by the query argument correspond to the following flow. (This flow is for explaining the meaning of the search process and is not the same as the actual access plan.)

Full-text search: Perform a full-text search based on the conditions specified in fulltext.search_condition.
Full-text search score calculation: Perform score calculation using fulltext.score_expression. If not specified, full-text search will result in an error.
Full-text search ranking & topN: Rank the results in order of highest score. Only the number of search results specified by semantic.num_result from the top will be adopted, and the rest will be discarded.
Semantic text search: Perform semantic text search based on vector similarity search using semantic.search_text and semantic.distance_operator.
Semantic text search ranking & topN: Rank results in order of smallest distance from vector similarity search results. Only the number of search results specified by semantic.num_result from the top will be adopted, and the rest will be discarded.
Integration: Integrate the results of full-text search and semantic text search according to the method specified in search_fusion.
Ranking: Calculate scores based on the ranks of full-text search and semantic text search and assign final ranks. Use fulltext.score_weight, semantic.score_weight and rrf_k.
topN: Return only the specified number of results from the top based on the final rank.
Score calculation
The score of the integrated result of multiple ranked results is calculated using a method called weighted RRF (Reciprocal Rank Fusion). Unweighted RRF considers the reciprocal of the rank of each search method as the score and takes the sum of multiple search methods as the final score. The final rank is determined based on that score. Weighted RRF calculates the sum of multiple search methods by multiplying each score by the weight for each search method as a coefficient.
The method for calculating the score WRRF(d) using weighted RRF for a certain search result d is shown below. i represents the search method, and the sum of scores for all search methods is obtained.

The value of k can be changed with the rrf_k parameter, and the default is 60. The smaller this parameter is, the greater the impact of differences in the original rank.
The rank (ranki(d)) for search results not returned by one of the search methods is considered a sufficiently large value (considered to have no impact on the final score).
Hybrid search results
The pgx_hybrid_search function returns a set of records of the following type.
Column | Type | Description |
|---|---|---|
queryid | bigint | Query ID of the hybrid search feature used to refer to trace information. |
context_id | uuid | Identifier of the returned text chunk. |
chunk | text | Returned text chunk. |
score | real | Hybrid search score. |