To monitor the state of the load launcher, refer to pg_stat_activity.
If you want to check the total size of model files loaded on the inference server, execute the pgx_get_total_model_size function. Additionally, the status of models loaded on the inference server can be checked in the pgx_triton_model_status view.
The logs output by the inference server and the PostgreSQL server logs are output as separate logs.Check the log file specified during the setup of the inference server. Refer to "3.6 Monitoring Vectorization Processing for Semantic Text Search" or "3.10 Performance Tuning of Semantic Text Search" to monitor vectorization and semantic text search by the imported model, and if there is a possibility of failure, check the server logs. Refer to the timestamps output in the logs to link the PostgreSQL server logs with the inference server logs.
In this feature, the model imported into the database is managed with the prefix "pgx_" added to the model name managed by Fujitsu Enterprise Postgres on the Triton Inference Server, and the suffix "System Identifier (hereinafter referred to as SystemID)" which is a system-specific value generated when creating the database cluster. For example, if imported with the name "sample_model" into a database cluster with SystemID "7553888845569639366", it will be treated as "pgx_sample_model_7553888845569639366" on the Triton Inference Server. If an error message related to a specific model is output in the server log of Fujitsu Enterprise Postgres, check the log related to the model with the prefix "pgx_" added to the model name in the Triton Inference Server log.
The SystemID of the database cluster can be checked with the following SQL.
SELECT system_identifier FROM pg_control_system();
The directory set in pgx_inference.triton_model_repository_path parameter will output large model files. To avoid the backup size from becoming bloated, set it outside the database cluster.
For GUC parameters related to memory, basically use the default values. If there are other tasks that you want to prioritize over this function, set each parameter to a value other than the default. If the machine's memory size is small or there are multiple system administrators with model loading privileges, loading the model may deplete memory and affect processes other than this function. In this case, set a value other than unlimited for pgx_inference.total_model_size_limit parameter and impose an upper limit on the total size of models that can be loaded.
See
For more information on managing the resources used by the inference server, refer to the official documentation of Triton Inference Server.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/onnxruntime_backend/README.html