Support for semantic text search and automatic vectorization is provided as an extension called pgx_vectorizer. pgx_vectorizer uses pgai, which relies on pgvector and plpython3u.
Refer to "pgvector" in the Installation and Setup Guide for Server to set up pgvector.
If you want to manage ONNX format models in a database and use them for vectorization, refer to "Chapter 5 Model Management in the Database".
If you are already using the pgx_vectorizer extension when migrating from Fujitsu Enterprise Postgres 17 SP1 and 17 SP2, refer to "3.2.2.3 Migration from Fujitsu Enterprise Postgres 17 SP1 and 17 SP2" to upgrade the pgx_vectorizer extension and the pgai extension.
As a superuser, execute the following command to set up pgai. "<x>" indicates the product version.
$ su - # cp -r /opt/fsepv<x>server64/OSS/pgai-extension/* /opt/fsepv<x>server64/
Configure the following before starting the instance.
Set the <Python package installation destination> to the installation destination listed in the "Related Software" of the "Installation and Setup Guide for Server".
PYTHONPATH=/opt/fsepv<x>server64/psycopg/python3.11/site-packages/:<Python package installation destination>:$PYTHONPATH
Set the following parameters.
Add pgx_vectorizer to the shared_preload_libraries parameter.
Specify the maximum parallelism for vectorization processing in the pgx_vectorizer.max_vectorize_worker parameter.
Add the following value to the value of the max_worker_processes parameter:
number of databases to enable pgx_vectorizer functionality +pgx_vectorizer.max_vectorize_worker+2
If you have changed the installation destination of the Fujitsu Enterprise Postgres Server feature to a location other than the standard installation destination, specify the following for the pgx_vectorizer.pgai_worker_path parameter.
<Fujitsu Enterprise Postgres server feature Installation Directory>/OSS/pgai-worker/bin/pgai
Execute CREATE EXTENSION for the database that will use this feature.
Adding the CASCADE option will also enable the dependent pgai, pgvector, and plpython3u at the same time.
After CREATE EXTENSION, execute the start_vectorize_scheduler function to start the vectorize scheduler. The vectorize scheduler is a feature that schedules workers to perform vectorization.
Example) Connecting to the database "rag_database" using the psql command
rag_database=# CREATE EXTENSION IF NOT EXISTS pgx_vectorizer CASCADE; CREATE EXTENSION rag_database=# SELECT pgx_vectorizer.start_vectorize_scheduler(); -- Starting the vectorize scheduler
After enabling the extended features, update the postgresql.conf parameter settings using commands such as pg_ctl reload.
Create a database user that the automatic vectorization feature will use when converting vectors in the background, and register it as the user that will convert vectors. Specify a password to use password authentication.
CREATE ROLE <worker_user> PASSWORD `<worker password>` … LOGIN; SELECT pgx_vectorizer.set_worker_setting('user', 'VECTORIZE_USER', '<worker_user>');
This database user will connect to Fujitsu Enterprise Postgres as an application, so modify the pg_hba.conf file for client authentication. Set it so that the database user created above can connect to the database that uses the pgx_vectorizer function from localhost using password authentication.
host <ai-database> <worker_user> 127.0.0.1/32 scram-sha-256 host <ai-database> <worker_user> ::1/128 scram-sha-256
The background vectorization process runs with the privileges of the OS user that starts Fujitsu Enterprise Postgres. The password required for the above database user to connect to Fujitsu Enterprise Postgres is referenced from the password file of the OS user that starts Fujitsu Enterprise Postgres. Specify the information required to connect to the vectorization process. The password file used is in the default location. For details about the password file, refer to "The Password File" in the PostgreSQL Documentation.
If you have set up the pgx_vectorizer extension with Fujitsu Enterprise Postgres 17 SP1 and 17 SP2, you need to upgrade the pgx_vectorizer extension. Follow the steps below to perform the upgrade.
If the database server is running, stop the database server.
$ pg_ctl stop -D <data_directory>Remove pgx_vectorizer from shared_preload_libraries parameter in postgresql.conf.
Upgrade the product.
Start the database server.
$ pg_ctl start -D <data_directory>Update pgvector, pgai, and pgx_vectorizer.
ALTER EXTENSION vector UPDATE; ALTER EXTENSION ai UPDATE; ALTER EXTENSION pgx_vectorizer UPDATE;
Add pgx_vectorizer to shared_preload_libraries parameter in postgresql.conf.
Restart the database server.
$ pg_ctl restart -D <data_directory>