NVIDIA NeMo

NVIDIA NeMo is a modular software suite of APIs and libraries that help developers manage the AI agent lifecycle—building, deploying, and optimizing AI agents at scale.

Easy-to-use containerized APIs for data preparation, model customization, evaluation, guardrailing, and continuously optimizing AI agents.
Flexible open-source framework for end to end training and development of generative AI models, scaling seamlessly from a single GPU to multi-node clusters.
Open-source toolkit for evaluation-based development and optimization of an agentic system.

Easy-to-use containerized APIs for data preparation, model customization, evaluation, guardrailing, and continuously optimizing AI agents.

NVIDIA NeMo Microservices

A modular collection of containerized services exposed via intuitive APIs that enables developers to seamlessly integrate NeMo into existing platforms.

Build high-quality, use-case-specific datasets with fast previews, built-in evaluations, and scalable workflows
Fine-tune language models with your proprietary data to build domain-specific AI agents.
Benchmark and monitor model and agent effectiveness with standard and custom metrics, including LLM-as-a-judge.
High-accuracy retrieval augmented generation (RAG) pipelines with open-source models and privacy-preserving data access.
Add safety, policy, and topical control to model responses.

Develop multimodal generative AI models with the open-source NeMo Framework.

NVIDIA NeMo Framework

A modular open-source Python framework for large-scale pretraining, post-training, and reinforcement learning of multimodal generative AI models.

Clean, filter, and prepare multimodal data with a GPU-accelerated Python library.
Align models with a scalable post-training library that integrates Hugging Face and Megatron optimizations.
Evaluate model performance with streamlined deployment, benchmark support, and advanced harnesses.
Train natively with accelerated PyTorch and finetune Hugging Face models on Day-0.
Train and fine-tune large models using a Megatron-Core parallelism with PyTorch-native training loop.
Add programmable safety, control, and compliance to LLM and agentic systems.
Configure, execute, and track training or evaluation jobs across local, on-prem, and cloud clusters.
Export and deploy models to production using TensorRT, TensorRT-LLM, vLLM engines, and Triton backends.
Develop vision foundation models with a PyTorch-native training loop powered by both Megatron-Core and PyTorch backends.
Extend LLM capabilities with reference pipelines for synthetic data generation, training, and benchmark evaluation.
Train and deploy speech AI models, including ASR and TTS, with export support to NVIDIA Riva.

Monitor and optimize the performance of AI agents and multi-agent systems.

Build, profile, evaluate, and optimize agentic systems with open-source, framework-agnostic observability toolkit.

Reference workflows with code, models, and deployment guides that helps developers quickly build and scale AI solutions.

Build a custom deep researcher powered by state-of-the art models that continuously process and synthesize multimodal enterprise data, enabling reasoning, planning, and refinement to generate comprehensive reports.
Build a data flywheel, to continuously optimize AI agents for latency, cost, and accuracy using automated data curation, evaluation, and fine-tuning with NeMo microservices.
RAG
Continuously extract, embed, and index multimodal data for fast, accurate semantic search using NeMo Retriever models.

Deploy and manage AI workloads as scalable, performance-optimized services to seamlessly power enterprise-grade AI agents in production.

Containerized microservices for secure, performant and reliable deployment of AI models anywhere.
Kubernetes-native operator for automating deployment, scaling, and lifecycle management of NIM and NeMo microservices.
Reference architectures standardizing hardware, networking, and software to build scalable, secure, and high-performance AI infrastructure for production set ups.