Support for semantic text search and automatic vectorization is provided as an extension called pgx_vectorizer. pgx_vectorizer uses pgai, which relies on pgvector and plpython3u.
Refer to "pgvector" in the Installation and Setup Guide for Server to set up pgvector.
As a superuser, execute the following command to set up pgai. "<x>" indicates the product version.
$ su - # cp -r /opt/fsepv<x>server64/OSS/pgai-extension/* /opt/fsepv<x>server64/
Note
With the Fujitsu Enterprise Postgres server feature installed, plpython3u is configured to use the following.
For RHEL8, RHEL9: Python 3.9
For SLES15: Python 3.6
By setting up pgai as above, plpython3u will be configured to use Python 3.11. Therefore, existing PL/Python programs that use plpython3u may no longer work.
If you want to return plpython3u to the configuration immediately after installation, refer to "3.2.3 Removing".
Configure the following before starting the instance.
PYTHONPATH=/opt/fsepv<x>server64/psycopg/python3.11/site-packages/:$PYTHONPATH
Set the following parameters.
Add pgx_vectorizer to the shared_preload_libraries parameter.
Specify the maximum parallelism for vectorization processing in the pgx_vectorizer.max_vectorize_worker parameter.
Add the following value to the value of the max_worker_processes parameter:
number of databases to enable pgx_vectorizer functionality +pgx_vectorizer.max_vectorize_worker+2
If you have changed the installation destination of the Fujitsu Enterprise Postgres Server feature to a location other than the standard installation destination, specify the following for the pgx_vectorizer.pgai_worker_path parameter.
<Fujitsu Enterprise Postgres server feature Installation Directory>/OSS/pgai-worker/bin/pgai
Execute CREATE EXTENSION for the database that will use this feature.
Adding the CASCADE option will also enable the dependent pgai, pgvector, and plpython3u at the same time.
After CREATE EXTENSION, execute the start_vectorize_scheduler function to start the vectorize scheduler.
Example) Connecting to the database "rag_database" using the psql command
rag_database=# CREATE EXTENSION IF NOT EXISTS pgx_vectorizer CASCADE; CREATE EXTENSION rag_database=# SELECT pgx_vectorizer.start_vectorize_scheduler(); -- Starting the vectorize scheduler
After enabling the extended features, update the postgresql.conf parameter settings using commands such as pg_ctl reload.
Create a database user that the automatic vectorization feature will use when converting vectors in the background, and register it as the user that will convert vectors. Specify a password to use password authentication.
CREATE ROLE <worker_user> PASSWORD `<worker password>` … LOGIN; SELECT pgx_vectorizer.set_worker_setting('user', 'VECTORIZE_USER', '<worker_user>');
This database user will connect to Fujitsu Enterprise Postgres as an application, so modify the pg_hba.conf file for client authentication. Set it so that the database user created above can connect to the database that uses the pgx_vectorizer function from localhost using password authentication.
host <ai-database> <worker_user> 127.0.0.1/32 scram-sha-256 host <ai-database> <worker_user> ::1/128 scram-sha-256
The background vectorization process runs with the privileges of the OS user that starts Fujitsu Enterprise Postgres. The password required for the above database user to connect to Fujitsu Enterprise Postgres is referenced from the password file of the OS user that starts Fujitsu Enterprise Postgres. Specify the information required to connect to the vectorization process. The password file used is in the default location. For details about the password file, refer to "The Password File" in the PostgreSQL Documentation.