Users who qualify as system administrators for this feature can request the loading of imported models into the database by executing the pgx_load_model function.
When a model load is requested, the load launcher running in the database loads the specified model file into the inference process. At this time, the model file is output to the directory set in pgx_inference.triton_model_repository_path parameter.
When the pgx_unload_model function is executed, it requests unloading for the model specified in the argument. As with loading, the requested model is unloaded from the inference server by the load launcher. Once the model is unloaded, vectorization and semantic text search using that model can no longer be performed.
To check whether the model is available/unavailable after executing the pgx_load_model function and the pgx_unload_model function, check the pgx_triton_model_status view. If the model is unavailable, the reason will be displayed in the reason column of the pgx_triton_model_status view.
Models requested to be loaded will automatically load even after instance restart. Also, load requests propagate to all standby servers and are loaded on each server. Unload requests are also synchronized to all standby servers.