Top
Enterprise Postgres 18 Knowledge DataManagement FeatureUser's Guide

3.13.1 Vectorization Functions

3.13.1.1 Defining Vectorization

For the parameters to specify in the definition of vectorization,refer to the pgai documentation. If you want to perform vectorization within the database, you must specify pgx_vectorizer.schedule_vectorizer as the scheduling.

3.13.1.2 Vectorization Schedule

You can generate vector data from text data in the following ways.

The timing of vectorization is determined by the schedule specified when defining vectorization. If schedule_none is specified, periodic vectorization will not be performed. To perform vectorization at a specified time, run the run_vectorize_worker function. If schedule_vectorizer is specified, periodic vectorization will be performed within the database.

Automatic vectorization can be disabled with the ai.disable_vectorizer_schedule function. It can also be re-enabled with the ai.enable_vectorizer_schedule function.

3.13.1.3 Vectorizer Management Functions

The following functions are provided to use for vectorization feature.

Function

Return type

Description

pgx_create_vectorizer([Arguments that can be specified with ai.create_vectorizer], fulltext_indexing => jsonb)

integer

It is used to automatically define full-text search indexes for vectorization and chunks.
The ID of the created vectorizer will be returned.
In addition to the arguments that can be specified for ai.create_vectorizer, specify the definition of the full-text search index (pgx_fulltext_indexing_gin() or pgx_fulltext_indexing_gist()).
fulltext_indexing is optional when only defining vectorization and when using full-text search indexes other than GiST or GIN.
If omitted, a full-text search index will not be created.

pgx_fulltext_indexing_gin(target_column text, fastupdate boolean, gin_pending_list_limit integer, opclass text, min_rows integer, create_when_queue_empty boolean)

jsonb

It is used when creating a full-text search index as GIN.
The value specified for fulltext_indexing in pgx_create_vectorizer will be returned.
target_column is used when using expression indexes or covering indexes.
The column name for text chunks is chunk, and the default value for target_column is chunk.
Specify the operator class to use in opclass.
min_rows specifies the threshold for creating an index. The default is 100000. A full-text search index is created when the number of records in the embedded table exceeds this threshold. create_when_queue_empty is a parameter that specifies the timing for creating a full-text search index. The default is true, and the full-text search index is created after the vector conversion is completed.
Other specifiable parameters refer to index storage parameters.

pgx_fulltext_indexing_gist(target_column text, fillfactor integer, buffering text, opclass text, min_rows integer, create_when_queue_empty boolean)

jsonb

It is used when creating a GIST as a full-text search index.
The value specified for fulltext_indexing in pgx_create_vectorizer will be returned.
Specify the operator class to use in opclass.
The settings for target_column, min_rows, and create_when_queue_empty are the same as those of pgx_fulltext_indexing_gin.
Other specifiable parameters refer to index storage parameters.

pgx_delete_fulltext_index_config(vectorizer_id integer)

void

It is used when deleting the vector transformation defined by pgx_create_vectorizer.

Delete information related to full-text search of vectorization created by the pgx_create_vectorizer function.

get_vectorizer_id(view_name pg_catalog.pg_regclass)

integer

Returns the ID of the vectorization definition that corresponds to the specified embedded view.

schedule_vectorizer(schedule_interval interval)

json

Specify the interval for the vectorization process in schedule_interval.
This function returns a JSON to be specified in the scheduling argument of the create_vectorizer function.
If schedule_interval is not specified, 10 minutes will be specified.

alter_vectorizer_processing(vectorizer_id integer, batch_size integer, concurrency integer)

void

Changes the amount of data to be converted at one time and the worker multiplicity for the vectorization definition with the id specified in vectorizer_id.

alter_vectorizer_schedule(vectorizer_id integer, schedule_interval interval)

void

Changes the interval for the vectorization process for the vectorization definition with the id specified in vectorizer_id.
If schedule_interval is not specified, 5 minutes will be specified.

run_vectorize_worker(vectorizer_id integer)

integer

Immediately starts the vectorization process for the vectorization definition with the ID specified in vectorizer_id, and starts the vectorization process in the background.
Returns the PID of the started process.

start_vectorize_scheduler(void)

void

This will start the vectorize scheduler that connects to the database where this SQL function was executed.
If the vectorize scheduler is already running, an error will occur.

pgx_embedding_onnx(model text, dimension integer)

jsonb

Generate a configuration JSON object for use with the pgx_create_vectorizer function. It is used when executing automatic vectorization with a model imported into the database. Specify the model name and the number of dimensions of the vector to be generated. This cannot be omitted. If the specified model is not imported, a warning message will be output.

See

For information about the arguments that can be specified with ai.create_vectorizer, refer to the pgai documentation.