For the parameters to specify in the definition of vectorization,refer to the pgai documentation. If you want to perform vectorization within the database, you must specify pgx_vectorizer.schedule_vectorizer as the scheduling.
You can generate vector data from text data in the following ways.
Manually start vectorization processing
Perform vectorization processing periodically within the database
The timing of vectorization is determined by the schedule specified when defining vectorization. If schedule_none is specified, periodic vectorization will not be performed. To perform vectorization at a specified time, run the run_vectorize_worker function. If schedule_vectorizer is specified, periodic vectorization will be performed within the database.
Automatic vectorization can be disabled with the ai.disable_vectorizer_schedule function. It can also be re-enabled with the ai.enable_vectorizer_schedule function.
The following functions are provided to use for vectorization feature.
Function | Return type | Description |
|---|---|---|
pgx_create_vectorizer([Arguments that can be specified with ai.create_vectorizer], fulltext_indexing => jsonb) | integer | It is used to automatically define full-text search indexes for vectorization and chunks. |
pgx_fulltext_indexing_gin(target_column text, fastupdate boolean, gin_pending_list_limit integer, opclass text, min_rows integer, create_when_queue_empty boolean) | jsonb | It is used when creating a full-text search index as GIN. |
pgx_fulltext_indexing_gist(target_column text, fillfactor integer, buffering text, opclass text, min_rows integer, create_when_queue_empty boolean) | jsonb | It is used when creating a GIST as a full-text search index. |
pgx_delete_fulltext_index_config(vectorizer_id integer) | void | It is used when deleting the vector transformation defined by pgx_create_vectorizer. Delete information related to full-text search of vectorization created by the pgx_create_vectorizer function. |
get_vectorizer_id(view_name pg_catalog.pg_regclass) | integer | Returns the ID of the vectorization definition that corresponds to the specified embedded view. |
schedule_vectorizer(schedule_interval interval) | json | Specify the interval for the vectorization process in schedule_interval. |
alter_vectorizer_processing(vectorizer_id integer, batch_size integer, concurrency integer) | void | Changes the amount of data to be converted at one time and the worker multiplicity for the vectorization definition with the id specified in vectorizer_id. |
alter_vectorizer_schedule(vectorizer_id integer, schedule_interval interval) | void | Changes the interval for the vectorization process for the vectorization definition with the id specified in vectorizer_id. |
run_vectorize_worker(vectorizer_id integer) | integer | Immediately starts the vectorization process for the vectorization definition with the ID specified in vectorizer_id, and starts the vectorization process in the background. |
start_vectorize_scheduler(void) | void | This will start the vectorize scheduler that connects to the database where this SQL function was executed. |
pgx_embedding_onnx(model text, dimension integer) | jsonb | Generate a configuration JSON object for use with the pgx_create_vectorizer function. It is used when executing automatic vectorization with a model imported into the database. Specify the model name and the number of dimensions of the vector to be generated. This cannot be omitted. If the specified model is not imported, a warning message will be output. |
See
For information about the arguments that can be specified with ai.create_vectorizer, refer to the pgai documentation.