Optimal Topology#

VSS Deployment Topologies#

VSS supports different deployment topologies optimized for various GPU types and performance requirements. The choice of topology depends on your hardware configurations.

Default Topology#

The default topology dedicates four GPUs for LLM NIM, two GPUs for VSS ingestion and Retrieval pipeline, and one GPU each for NeMo embedding and reranking NIMs. This topology is designed for the system where single GPU is not enough to handle multiple NIMs. For example, system with L40s GPUs.

For details on the default topology configuration, refer to Default Deployment Topology and Models in Use.

Shared GPU Topology#

For high-performance GPUs like H100, H200, or A100 (80+ GB device memory), there is no need to dedicate individual GPUs to embedding and reranking NIMs. It is recommended to use the GPU-sharing topology for better utilization of GPU resources and better throughput.

For configuration details, refer to Optional Deployment Topology with GPU Sharing.

Note

For optimal performance on H100, H200, or A100 GPUs, always use the GPU-optimized topology. The default topology might not fully utilize the capabilities of these high-performance GPUs.