cnvrg is an AI OS designed to help you effectively manage and utilize your compute resources. This is enabled through the Dashboard tab of your organization.
Inside the Dashboard tab, you get a complete overview of how your resources are allocated and utilized.
The topics in this page:
Along the top of the page, there is an overview of the current status of all your jobs and compute allocations over your entire organization and all of your compute resources.
The summary displays:
- Current active jobs
- Current pending jobs (queued or initializing)
- Amount of Compute Resources
- Assigned and total available GPU cores
- Assigned and total available CPU cores
- Assigned and total memory
- Number of nodes among your compute resources
# Live Resources Chart
The Live Resources graph provides live insights into the allocation and utilization for each of your compute resources. Each resource has a line representing allocation (% of full cluster) and a line representing utilization (% of full cluster) for CPU, GPU and memory.
Highlight a line on the graph by hovering over its name. Click its name to toggle the line's visibility on and off.
The Metric menu toggles on and off visibility of CPU, GPU and memory metrics.
The Compute menu allows to toggle on and off visibility for each resource individually or enable All.
When you hover over a data point on the graph, a tooltip appears showing the allocation and utilization for that metric (CPU,GPU or memory) and the relevant jobs.
The tooltip can be docked to appear on the right side of the graph by clicking <<. When undocked, the tooltip displays a couple relevant jobs, when docked, more jobs appear. You can undock the tooltip by clicking >>.
# Job Table
Below the Live Resources chart is the Job Table. This contains a full list of all current and historical jobs run in the organization, along with accompanying utilization metadata. All metadata is on a per job basis, for instance, Utilized CPU refers to the portion of the allocated CPU for the specific job was utilized by the job.
# Customize the table
You can control the columns that appear in the table by clicking the Customize button. The following fields can be chosen:
- Type: The type of job (experiment, workspace, endpoint or app).
- Title: The name of the job.
- Project: The project the job belongs to.
- User: The user who started the job.
- Status: The resource utilization status (active, inactive or pending).
- Job Status: The status of the job itself (initializing, ongoing, success, aborted, error or scheduled)
- Duration: Length of time the job has been running (current or total).
- Created At: When the job was created.
- Resource: The compute resource used.
- Compute: The compute template used.
- Allocated GPU: Total allocated GPU for the job.
- Allocated CPU: Total allocated CPU for the job.
- Allocated Memory: Total allocated memory for the job.
- Utilized GPU: Percentage of allocated GPU utilized by job.
- Utilized CPU: Percentage of allocated CPU utilized by job.
- Utilized Memory: Percentage of allocated memory utilized by job.
- Image: Container used by job.
- Node Selectors: Any Kubernetes selector used for choosing node/node pool.
- Datasets Attached: Datasets used by job.
- Datasets Size: Size of dataset used by job.
# Filter the table
Filter the table by clicking the Filter icon. Then add one or more key, comparison and value combinations to filter the table. Added filters appear along the top of the table.
For each added filter, you can remove it by clicking the X corresponding to the filter.
# Search through the table
You can use the Search bar to search the table by project, user and title. The search is case-sensitive.
← AI Library Workspaces →