Debuggability Guide#

Overview#

NVIDIA Cloud Functions provides comprehensive debuggability features through two main approaches:

  1. Real-Time Logs

    • Access near real-time logs for faster debugging

    • Available through both NGC UI and CLI

    • No long-term storage, logs are ephemeral during workload lifecycle

    • Significantly reduced latency compared to traditional logging solutions

  2. Remote Command Execution

    • Execute commands on function or task containers for debugging purposes

    • Support for common Linux commands in NGC CLI

    • Secure, controlled access to container environments

Real-Time Logs#

Real-time logs allow you to view function or task logs with minimal latency, providing immediate feedback during development and troubleshooting.

Key Benefits#

  • Immediate Feedback: View logs in near real-time, reducing debugging cycles

  • Reduced Latency: Significantly faster than historical log solutions

  • Multiple Access Methods: Available through both NGC UI and CLI interfaces

Getting Started#

Real-time logs are accessible for deployed NVIDIA Cloud Functions.

  1. Access Logs via NGC UI

    • Navigate to your function in the NGC UI

    • Click the 3-dots button next to an active function

    • Select “View Logs” or “View Version Logs”

    • In the Logs page, you’ll see two tabs:

      • History Logs: For historical log analysis across different function version instances

      • Live Tail Logs: For near real-time log streaming

  2. Using Live Tail Logs in NGC UI

    • Select the “Live Tail Logs” tab

    • Choose the Cluster name and Instance ID

    • Click “Start Session” to begin viewing live logs

    • Use “Pause Session” to temporarily halt the log stream

    • Use “Resume Session” to continue viewing logs

    • Click “Stop Session” to end the streaming session

    • Filter logs using the search box for quick identification of specific events

Note

  • Real-time logs are available after a function instance is actively running and a real-time logging session has begun. Once an instance terminates or restarts, these logs are no longer accessible. For historical log analysis, use the History Logs tab.

  • Live Tail logs are not stored and cannot be ‘replayed’ after a session ends or after the 50k buffer is exceeded.

  • Currently, live tail logs are only supported for functions deployed to GFN and DGXC cloud environments (note that not all GFN and DGXC environments may be supported).

  • Tasks will be supported soon.

Remote Command Execution#

Remote command execution allows you to run commands directly in your function’s or task’s container environment for advanced debugging purposes. Please note that the feature will depend on the user’s own container environment, i.e. if the container is a distroless container, you may not be able to access your target function or task container file system. Additionally, the default working directory will be the root directory of the target container when executing commands.

Key Benefits#

  • Interactive Debugging: Execute commands for troubleshooting without redeploying

  • Container Inspection: Examine file systems, processes, and environment variables

  • Secure Access: Commands are executed in a controlled, secure environment

  • Distroless Support: Debug containers with minimal operating system components

Getting Started#

  1. View Available Instances

    • Navigate to your function or task in the NGC UI or use the CLI

    • Use the CLI to list instances:

1# Function
2ngc cf fn instance ls <function-id>:<version-id>
3--org <org-id> #NGC Organization ID
4--team <team-name> #Team name in an org
5
6# Task
7ngc cf task instance ls <task-id>
8--org <org-id> #NGC Organization ID
9--team <team-name> #Team name in an org
  1. Execute Commands via NGC CLI

Syntax#
 1# Function
 2ngc cf fn instance exec <function-id>:<version-id>
 3--org <org-id> #NGC Organization ID
 4--team <team-name> #Team name in an org
 5--instance-id <instance-id> #Instance ID
 6--pod-name <pod-name> #Pod name used
 7--container-name <container-name> #Container name used
 8--command "<linux-command>" #linux command to be executed
 9
10# Task
11ngc cf task instance exec <task-id>
12--org <org-id> #NGC Organization ID
13--team <team-name> #Team name in an org
14--instance-id <instance-id> #Instance ID
15--pod-name <pod-name> #Pod name used
16--container-name <container-name> #Container name used
17--command "<linux-command>" #linux command to be executed
Example#
 1ngc cf fn instance exec my-function:v1
 2--org my-organization
 3--team my-team
 4--instance-id --instance-id instance-1
 5--pod-name pod-1234
 6--container-name main
 7--command "ls -la"
 8
 9ngc cf task instance exec my-task
10--org my-organization
11--team my-team
12--instance-id --instance-id instance-1
13--pod-name pod-1234
14--container-name main
15--command "ls -la"

NGC CLI Requirements and Examples#

CLI Version Requirements#

The debuggability features are only available in NGC CLI versions 3.131.5 and newer.

Detailed CLI Examples#

  1. List Function Instances, Containers, and Pods

Syntax#
1# Function
2ngc cf fn instance ls <function-id>:<version-id>
3--org <org-id> #NGC Organization ID
4--team <team-name> #Team name in an org
5
6# Task
7ngc cf task instance ls <task-id>
8--org <org-id> #NGC Organization ID
9--team <team-name> #Team name in an org
Example#
1# Function
2ngc cf fn instance ls my-function:v1
3--org my-organization
4--team my-team
5
6# Task
7ngc cf task instance ls my-task
8--org my-organization
9--team my-team
  1. Execute Commands on Target Containers

Syntax#
 1# Function
 2ngc cf fn instance exec <function-id>:<version-id>
 3--org <org-id> #NGC Organization ID
 4--team <team-name> #Team name in an org
 5--instance-id <instance-id> #Instance ID
 6--pod-name <pod-name> #Pod name used
 7--container-name <container-name> #Container name used
 8--command "<linux-command>" #linux command to be executed
 9
10# Task
11ngc cf task instance exec <task-id>
12--org <org-id> #NGC Organization ID
13--team <team-name> #Team name in an org
14--instance-id <instance-id> #Instance ID
15--pod-name <pod-name> #Pod name used
16--container-name <container-name> #Container name used
17--command "<linux-command>" #linux command to be executed
Example#
 1# Function
 2ngc cf fn instance exec my-function:v1
 3--org my-organization
 4--team my-team
 5--instance-id instance-1
 6--pod-name pod-1234
 7--container-name main
 8--command "ls -la"
 9
10# Task
11ngc cf task instance exec task-id
12--org my-organization
13--team my-team
14--instance-id instance-1
15--pod-name pod-1234
16--container-name main
17--command "ls -la"
  1. Attach Log Output from a Specific Pod Container

Syntax#
 1# Function
 2ngc cf fn instance logs <function-id>:<version-id>
 3--org <org-id> #NGC Organization ID
 4--team <team-name> #Team name in an org
 5--instance-id <instance-id> #Instance ID
 6--pod-name <pod-name> #Pod name used
 7--container-name <container-name> #Container name used
 8
 9# Task
10ngc cf task instance logs <task-id>
11--org <org-id> #NGC Organization ID
12--team <team-name> #Team name in an org
13--instance-id <instance-id> #Instance ID
14--pod-name <pod-name> #Pod name used
15--container-name <container-name> #Container name used
Example#
 1# Function
 2ngc cf fn instance logs my-function:v1
 3--org my-organization
 4--team my-team
 5--instance-id instance-1
 6--pod-name pod-1234
 7--container-name main
 8
 9# Task
10ngc cf task instance logs my-task
11--org my-organization
12--team my-team
13--instance-id instance-1
14--pod-name pod-1234
15--container-name main
  1. Attach Log Output from an Entire Instance

Syntax#
 1# Function
 2ngc cf fn instance logs <function-id>:<version-id>
 3--org <org-id> #NGC Organization ID
 4--team <team-name> #Team name in an org
 5--instance-id <instance-id> #Instance ID
 6
 7# Task
 8ngc cf task instance logs <task-id>
 9--org <org-id> #NGC Organization ID
10--team <team-name> #Team name in an org
11--instance-id <instance-id> #Instance ID
Example#
 1# Function
 2ngc cf fn instance logs my-function:v1
 3--org my-organization
 4--team my-team
 5--instance-id instance-1
 6
 7# Task
 8ngc cf task instance logs my-task
 9--org my-organization
10--team my-team
11--instance-id instance-1

Supported Commands#

The following commands are supported for remote execution:

Command/Method

Description

cat

Display file contents

ls

List directory contents

cd

Change directory

pwd

Print working directory

man

Display manual pages

sort

Sort lines of text files

df

Report file system disk space usage

du

Estimate file space usage

grep

Search for patterns in files

find

Search for files

head

Display beginning of files

more

Page through text

less

Page through text with more features

tail

Display end of files

wc

Print newline, word, and byte counts

cut

Remove sections from lines

echo

Display a line of text

printf

Format and print data

print

Print data

ps

Report process status

base64

Base64 encode/decode

Pipe (|)

Pipe output

Input redirect (<)

Redirect input

Command separator (;)

Separate commands

Command chaining (&&)

Chain commands

Note

The command execution environment is isolated and has no impact on the function’s running state. Command execution is logged for security and audit purposes.

Security#

NVCF ensures secure debugging capabilities:

  • Authentication and authorization for all debugging actions

  • Container isolation prevents unauthorized access

  • Limited command set to prevent system modifications

  • Access control based on NGC permissions

  • All debugging actions are logged and auditable

Troubleshooting#

Common Error Codes#

Error Code

Description

Possible Resolution

400 (BadRequestException)

Function/Task is inactive or invalid parameters provided

Ensure function/task is active and parameters are correct

401 (NotAuthorizedException)

Invalid authentication token

Check that your NGC API key or SSA token is valid

403 (ForbiddenException)

Insufficient permissions or function/task does not exist

Verify that your token has the appropriate scopes and the function/task exists

404 (NotFoundException)

Selected pod/container/instance does not exist

Verify that the specified resources exist and are correctly named

429 (TooManyRequestsException)

Rate limit exceeded

Reduce the frequency of requests and try again later

500 (UpstreamException)

Internal service error

Contact support if the issue persists

Required Permissions#

To use the debuggability features, ensure your NGC API key has the correct permissions:

  • When generating an NGC API key from the NGC console, select the “Cloud Function” permission

  • This permission grants the necessary access to use both Live Tail Logs and Command Execution features

Limitations#

  • Real-time logs are ephemeral with no long-term storage

  • Historical logs are still available through the standard logging system

  • Command execution is limited to a predefined set of commands

  • Debugging sessions have a maximum duration of 2 hours

  • Output size is limited to 2MB per command

  • Live tail logs are only supported for functions deployed to GFN and DGXC cloud environments

  • Live tail logs view maintains a maximum of 50,000 lines in the console buffer

  • Real-time logs cannot be searched on aggregate across all functions (e.g., searching for a string across all functions in an organization)

Appendix A: Terminology#

Term

Definition

NGC

NVIDIA GPU Cloud which provides a way for users to set up and manage access to NVIDIA cloud services

NVCF

NVIDIA Cloud Functions

NVCT

NVIDIA Cloud Tasks

Ephemeral Container

A temporary container created within a pod for debugging purposes

Real-time Logs

Logs streamed with minimal latency during function execution

DGXC

DGX Cloud service

History Logs

Logs stored for longer-term analysis with search capabilities

Live Tail Logs

Near real-time streaming logs with minimal latency

Distroless Container

A container image with minimal operating system components