This feature is provided as an extension called pgx_inference. Additionally, this feature is used in conjunction with automatic vectorization and semantic text search by pgx_vectorizer. If you want to perform semantic text search and automatic vectorization using a model imported into the database, set up pgx_vectorizer along with this feature. For instructions on setting up pgx_vectorizer, refer to "3.2.2.2 Setting Up pgx_vectorizer".
Set the following parameters.
shared_preload_libraries
Add pgx_inference. As a result, when the database instance is started, the load launcher is initiated internally.
max_worker_processes
Add the number of databases to enable this feature +1. This is the number of worker processes that will be started with this feature.
pgx_inference.triton_model_repository_path
Set the path of the model repository specified when setting up the inference server. You can only set the path on the machine where the database is running. It is recommended to prepare a dedicated directory for this function, as model files will be generated or deleted in the directory set in this parameter during loading.
pgx_inference.triton_grpc_port
This is the port number used when sending requests to the inference server via gRPC. Set the gRPC port number specified during the setup of the inference server.
pgx_inference.triton_ort_extensions_library_filename
Specify the absolute path of the shared library (onnxruntime-extensions) on the server where the Triton Inference Server is running.
If mTLS authentication is configured on the inference server, Set the following parameters.
pgx_inference.triton_use_ssl
Specify whether to enable SSL communication with the Triton Inference Server. If enabled, set the following three parameters.
pgx_inference.triton_grpc_root_certificates
Specify the path to the CA file for verifying the server certificate chain used in the gRPC connection.
pgx_inference.triton_grpc_certificate_chain
Specify the path of the client certificate chain used for the gRPC connection.
pgx_inference.triton_grpc_private_key
Specify the path of the client private key used for the gRPC connection.
Execute CREATE EXTENSION for the database that will use this feature.
CREATE EXTENSION pgx_inference CASCADE;
Start the load launcher with the user who executed CREATE EXTENSION.
SELECT pgx_inference.pgx_launch_load_launcher();