Top
Enterprise Postgres 18 Knowledge DataManagement FeatureUser's Guide

3.13.3 Search Functions

3.13.3.1 Semantic Text Search

The following functions are provided for semantic text search.

Function

Return type

Description

pgx_similarity_search(view pg_catalog.regclass, query text, num_result integer defaut 5, distance_operator text default '<=>', OUT embedding_uuid uuid, OUT chunk text, OUT distance float8);

SETOF record

Searches the embedding view specified in view to obtain text similar to the text specified in query.

You can display results up to the number specified in num_result.

You can specify the distance calculation method using distance_operator.

pgx_similarity_search_checking_index(view pg_catalog.regclass, query text, num_result integer defaut 5, distance_operator text default '<=>', OUT embedding_uuid uuid, OUT chunk text, OUT distance float8);

SETOF record

An error occurs if the index operator defined in the embedding column of the table that references the view does not match the operator specified in distance_operator.

Other than the above, it is the same as the pgx_similarity_search function.

The following functions are provided for hybrid search.

Function

Return type

Description

pgx_hybrid_search(query jsonb, OUT queryid bigint, OUT context_id uuid, OUT chunk text, OUT score float8)

SETOF record

Search the specified embedded view and retrieve the relevant text.

For the return value, refer to "3.13.3.3 Details of the pgx_hybrid_search Function".

For query details, refer to "Hybrid search results".

pgx_list_search_results(queryid bigint, subquery_type text)

SETOF record

If trace information for the specified query ID is recorded, it returns the search results. If subquery_type is not specified, it returns the results of a hybrid search. If subquery_type is semantic or fulltext, it returns the results of semantic text search or full-text search conducted internally, respectively.

pgx_list_search_result_metrics(queryid bigint, subquery_type text)

SETOF record

Returns search results along with the evaluation value of the text chunk. Other specifications follow the pgx_list_search_results function.

pgx_list_contexts(queryid bigint)

SETOF record

If trace information for the specified query ID is recorded, it extracts and returns a list of text chunks returned as search results in either hybrid search processing or subqueries executed internally. The extracted list of text chunks is inserted into the evaluation value table.

pgx_hybrid_search_trace_size()

bigint

Return the size of the trace information and evaluation value table in bytes.

pgx_grant_access_on_hybrid_search_trace(role name)

None

Grant the necessary access rights to the role for recording and evaluating trace information.

pgx_revoke_access_on_hybrid_search_trace(role name)

None

Revoke the access rights necessary for recording and evaluating trace information assigned to the role.


The pgx_list_search_results function returns a set of records of the following type, in descending order of score.

Column

Type

Description

queryid

bigint

Quiry ID for hybrid search feature.

context_id

uuid

Identifier of the returned text chunk.
Unique within the embedded view.

chunk

text

Returned text chunk.

score

real

Hybrid search score.
A higher score indicates a better match with the conditions.


The pgx_list_search_results_metrics function returns a set of records of the following type, in descending order of score.

Column

Type

Description

queryid

bigint

Quiry ID for hybrid search feature.

context_id

uuid

Identifier of the returned text chunk.
Unique within the embedded view.

chunk

text

Returned text chunk.

score

real

Hybrid search score.
A higher score indicates a better match with the conditions.

metrics

jsonb

Evaluation value for each input text chunk.


The pgx_list_contexts function returns a set of records of the following type, in descending order of score.

Column

Type

Description

queryid

bigint

Quiry ID for hybrid search feature.

context_id

uuid

Identifier of the returned text chunk.
Unique within the embedded view.

chunk

text

Returned text chunk.

3.13.3.3 Details of the pgx_hybrid_search Function

query argument

Specify search conditions in JSON format for the query argument. Specify the following for each key.

Key

JSON format

Deccription

target_view

string

Embedded view to be searched.
Interpreted as pg_catalog.regclass.
Cannot be omitted.

search_fusion

string

How to combine semantic text search and full-text search.
The default is "UNION".
"UNION": Returns the union of each search result.
"INTERSECT": Returns the intersection of each search result.
"TEXT_ONLY": Returns the intersection with all results of the full-text search.
"VECTOR_ONLY": Returns the intersection with all results of the semantic text search.
"MINUS_TEXT": Returns the results of the semantic text search excluding the intersection.
"MINUS_VECTOR": Returns the results of the full-text search excluding the intersection.

rrf_k

number

Weighted RRF's weight.
When a floating-point type is specified, the decimal part is truncated.
The default is 60.

topN

number

Maximum number of results returned from hybrid search.
When a floating-point type is specified, an error occurs.
The default is 20.

semantic.search_text

string

Text to search with semantic text search.
Cannot be omitted.

semantic.distance_operator

string

Distance operator in vector comparison.

The default is "<=>".

semantic.num_result

number

Maximum number of items to be retrieved by semantic text search.
When a floating-point type is specified, an error occurs.
If omitted, it is calculated based on topN. If the combination method is "UNION", it is half of topN, otherwise it is the same as topN.

semantic.score_weight

number

Relative weighting (importance) of semantic text search.
When a floating-point type is specified, the decimal part is truncated.
The default is 1.

fulltext.search_condition

string

Full-text search condition (an expression that returns a boolean value specified in the WHERE clause).
The name of the column targeted for full-text search in the embedded view is chunk. Refer to the "3.3.3 Definition of Vectorization" for the definition of the embedded view.
Cannot be omitted.

fulltext.num_result

number

Maximum number of items to be retrieved by full-text search.
When a floating-point type is specified, an error occurs.
If omitted, it is calculated in the same way as semantic.num_result.

fulltext.score_weight

number

Relative weight (importance) of full-text search.
When a floating-point type is specified, the decimal part is truncated.
The default is 1.

fulltext.score_expression

string

An expression that returns a floating-point score for full-text search.
You can refer to tableoid and ctid.
It cannot be omitted.

See

For the function used in calculating scores in full-text search, refer to "Full Text Search" in the PostgreSQL Documentation.

Information

You can specify a matching operator for full-text search as a condition for full-text search. Specify the operator supported by the full-text search index defined in the table to be searched. The search text for full-text search must be specified in the pattern required by the matching operator.

Example) Query argument of the pgx_hybrid_search function

{
  "target_view": "sample_embeddings",
  "search_fusion": "UNION",
  "topN": 20,
  "semantic": {
  "search_text": "text for search",
  "distance_operator": "<=>",
  "num_result": 10,
  "score_weight": 1
  },
  "fulltext": {
    "search_condition": "chunk @@ websearch_to_tsquery('search text')",
    "num_result": 10,
    "score_weight": 1,
    "score_expression": "ts_rank_cd(to_tsvector(chunk), websearch_to_tsquery('search text'))"
  }
}

The search conditions specified by the query argument

The search conditions specified by the query argument correspond to the following flow. (This flow is for explaining the meaning of the search process and is not the same as the actual access plan.)

Score calculation

The score of the integrated result of multiple ranked results is calculated using a method called weighted RRF (Reciprocal Rank Fusion). Unweighted RRF considers the reciprocal of the rank of each search method as the score and takes the sum of multiple search methods as the final score. The final rank is determined based on that score. Weighted RRF calculates the sum of multiple search methods by multiplying each score by the weight for each search method as a coefficient.

The method for calculating the score WRRF(d) using weighted RRF for a certain search result d is shown below. i represents the search method, and the sum of scores for all search methods is obtained.

The value of k can be changed with the rrf_k parameter, and the default is 60. The smaller this parameter is, the greater the impact of differences in the original rank.

The rank (ranki(d)) for search results not returned by one of the search methods is considered a sufficiently large value (considered to have no impact on the final score).

Hybrid search results

The pgx_hybrid_search function returns a set of records of the following type.

Column

Type

Description

queryid

bigint

Query ID of the hybrid search feature used to refer to trace information.
A valid value will be returned if trace information is recorded. Otherwise, 0 will be returned.

context_id

uuid

Identifier of the returned text chunk.
Unique within the embedded view.

chunk

text

Returned text chunk.

score

real

Hybrid search score.
A higher score indicates a better match with the conditions.