Debuggability Guide#

Overview#

NVIDIA Cloud Functions provides comprehensive debuggability features through two main approaches:

Real-Time Logs
- Access near real-time logs for faster debugging
- Available through both NGC UI and CLI
- No long-term storage, logs are ephemeral during workload lifecycle
- Significantly reduced latency compared to traditional logging solutions
Remote Command Execution
- Execute commands on function or task containers for debugging purposes
- Support for common Linux commands in NGC CLI
- Secure, controlled access to container environments

Real-Time Logs#

Real-time logs allow you to view function or task logs with minimal latency, providing immediate feedback during development and troubleshooting.

Key Benefits#

Immediate Feedback: View logs in near real-time, reducing debugging cycles
Reduced Latency: Significantly faster than historical log solutions
Multiple Access Methods: Available through both NGC UI and CLI interfaces

Getting Started#

Real-time logs are accessible for deployed NVIDIA Cloud Functions.

Access Logs via NGC UI
- Navigate to your function in the NGC UI
- Click the 3-dots button next to an active function
- Select “View Logs” or “View Version Logs”
- In the Logs page, you’ll see two tabs:
  - History Logs: For historical log analysis across different function version instances
  - Live Tail Logs: For near real-time log streaming
Using Live Tail Logs in NGC UI
- Select the “Live Tail Logs” tab
- Choose the Cluster name and Instance ID
- Click “Start Session” to begin viewing live logs
- Use “Pause Session” to temporarily halt the log stream
- Use “Resume Session” to continue viewing logs
- Click “Stop Session” to end the streaming session
- Filter logs using the search box for quick identification of specific events

Note

Real-time logs are available after a function instance is actively running and a real-time logging session has begun. Once an instance terminates or restarts, these logs are no longer accessible. For historical log analysis, use the History Logs tab.
Live Tail logs are not stored and cannot be ‘replayed’ after a session ends or after the 50k buffer is exceeded.
Currently, live tail logs are only supported for functions deployed to GFN and DGXC cloud environments (note that not all GFN and DGXC environments may be supported).
Tasks will be supported soon.

Remote Command Execution#

Remote command execution allows you to run commands directly in your function’s or task’s container environment for advanced debugging purposes. Please note that the feature will depend on the user’s own container environment, i.e. if the container is a distroless container, you may not be able to access your target function or task container file system. Additionally, the default working directory will be the root directory of the target container when executing commands.

Key Benefits#

Interactive Debugging: Execute commands for troubleshooting without redeploying
Container Inspection: Examine file systems, processes, and environment variables
Secure Access: Commands are executed in a controlled, secure environment
Distroless Support: Debug containers with minimal operating system components

Getting Started#

View Available Instances
- Navigate to your function or task in the NGC UI or use the CLI
- Use the CLI to list instances:

# Function
ngc cf fn instance ls <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org

# Task
ngc cf task instance ls <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org

Execute Commands via NGC CLI

Syntax#

# Function
ngc cf fn instance exec <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used
--command "<linux-command>" #linux command to be executed

# Task
ngc cf task instance exec <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used
--command "<linux-command>" #linux command to be executed

Example#

ngc cf fn instance exec my-function:v1
--org my-organization
--team my-team
--instance-id --instance-id instance-1
--pod-name pod-1234
--container-name main
--command "ls -la"

ngc cf task instance exec my-task
--org my-organization
--team my-team
--instance-id --instance-id instance-1
--pod-name pod-1234
--container-name main
--command "ls -la"

NGC CLI Requirements and Examples#

CLI Version Requirements#

The debuggability features are only available in NGC CLI versions 3.131.5 and newer.

Detailed CLI Examples#

List Function Instances, Containers, and Pods

Syntax#

# Function
ngc cf fn instance ls <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org

# Task
ngc cf task instance ls <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org

Example#

# Function
ngc cf fn instance ls my-function:v1
--org my-organization
--team my-team

# Task
ngc cf task instance ls my-task
--org my-organization
--team my-team

Execute Commands on Target Containers

Syntax#

# Function
ngc cf fn instance exec <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used
--command "<linux-command>" #linux command to be executed

# Task
ngc cf task instance exec <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used
--command "<linux-command>" #linux command to be executed

Example#

# Function
ngc cf fn instance exec my-function:v1
--org my-organization
--team my-team
--instance-id instance-1
--pod-name pod-1234
--container-name main
--command "ls -la"

# Task
ngc cf task instance exec task-id
--org my-organization
--team my-team
--instance-id instance-1
--pod-name pod-1234
--container-name main
--command "ls -la"

Attach Log Output from a Specific Pod Container

Syntax#

# Function
ngc cf fn instance logs <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used

# Task
ngc cf task instance logs <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID
--pod-name <pod-name> #Pod name used
--container-name <container-name> #Container name used

Example#

# Function
ngc cf fn instance logs my-function:v1
--org my-organization
--team my-team
--instance-id instance-1
--pod-name pod-1234
--container-name main

# Task
ngc cf task instance logs my-task
--org my-organization
--team my-team
--instance-id instance-1
--pod-name pod-1234
--container-name main

Attach Log Output from an Entire Instance

Syntax#

# Function
ngc cf fn instance logs <function-id>:<version-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID

# Task
ngc cf task instance logs <task-id>
--org <org-id> #NGC Organization ID
--team <team-name> #Team name in an org
--instance-id <instance-id> #Instance ID

Example#

# Function
ngc cf fn instance logs my-function:v1
--org my-organization
--team my-team
--instance-id instance-1

# Task
ngc cf task instance logs my-task
--org my-organization
--team my-team
--instance-id instance-1

Supported Commands#

The following commands are supported for remote execution:

Command/Method	Description
cat	Display file contents
ls	List directory contents
cd	Change directory
pwd	Print working directory
man	Display manual pages
sort	Sort lines of text files
df	Report file system disk space usage
du	Estimate file space usage
grep	Search for patterns in files
find	Search for files
head	Display beginning of files
more	Page through text
less	Page through text with more features
tail	Display end of files
wc	Print newline, word, and byte counts
cut	Remove sections from lines
echo	Display a line of text
printf	Format and print data
print	Print data
ps	Report process status
base64	Base64 encode/decode
Pipe (\|)	Pipe output
Input redirect (<)	Redirect input
Command separator (;)	Separate commands
Command chaining (&&)	Chain commands

Note

The command execution environment is isolated and has no impact on the function’s running state. Command execution is logged for security and audit purposes.

Security#

NVCF ensures secure debugging capabilities:

Authentication and authorization for all debugging actions
Container isolation prevents unauthorized access
Limited command set to prevent system modifications
Access control based on NGC permissions
All debugging actions are logged and auditable

Troubleshooting#

Common Error Codes#

Error Code	Description	Possible Resolution
400 (BadRequestException)	Function/Task is inactive or invalid parameters provided	Ensure function/task is active and parameters are correct
401 (NotAuthorizedException)	Invalid authentication token	Check that your NGC API key or SSA token is valid
403 (ForbiddenException)	Insufficient permissions or function/task does not exist	Verify that your token has the appropriate scopes and the function/task exists
404 (NotFoundException)	Selected pod/container/instance does not exist	Verify that the specified resources exist and are correctly named
429 (TooManyRequestsException)	Rate limit exceeded	Reduce the frequency of requests and try again later
500 (UpstreamException)	Internal service error	Contact support if the issue persists

Required Permissions#

To use the debuggability features, ensure your NGC API key has the correct permissions:

When generating an NGC API key from the NGC console, select the “Cloud Function” permission
This permission grants the necessary access to use both Live Tail Logs and Command Execution features

Limitations#

Real-time logs are ephemeral with no long-term storage
Historical logs are still available through the standard logging system
Command execution is limited to a predefined set of commands
Debugging sessions have a maximum duration of 2 hours
Output size is limited to 2MB per command
Live tail logs are only supported for functions deployed to GFN and DGXC cloud environments
Live tail logs view maintains a maximum of 50,000 lines in the console buffer
Real-time logs cannot be searched on aggregate across all functions (e.g., searching for a string across all functions in an organization)

Appendix A: Terminology#

Term	Definition
NGC	NVIDIA GPU Cloud which provides a way for users to set up and manage access to NVIDIA cloud services
NVCF	NVIDIA Cloud Functions
NVCT	NVIDIA Cloud Tasks
Ephemeral Container	A temporary container created within a pod for debugging purposes
Real-time Logs	Logs streamed with minimal latency during function execution
DGXC	DGX Cloud service
History Logs	Logs stored for longer-term analysis with search capabilities
Live Tail Logs	Near real-time streaming logs with minimal latency
Distroless Container	A container image with minimal operating system components