cnvrg is an AI OS, designed to help you manage and utilize all of your compute resources effectively. This is enabled through the Dashboard tab of your organization.
Inside the Dashboard tab, you can get a complete overview of how your resources are allocated and how they are been utilized.
The topics in this page:
Along the top of the page, there is an overview of the current status of all of your jobs and compute allocations over your entire organization and all of your compute resources.
The summary displays:
- Current active jobs
- Current pending jobs (queued or initializing)
- Amount of Compute Resources
- Assigned and total available GPU cores
- Assigned and total available CPU cores
- Assigned and total Memory
- Amount of nodes amongst your compute resources
# Live Resources Chart
The Live Resources graph will provide live insights into the allocation and utilization for each of your compute resources. Each resource has a line representing allocation (% of full cluster) and a line representing utilization (% of full cluster) for CPU, GPU and memory.
You can highlight a line on the graph by hovering over its name. Click its name to toggle the line's visibility on and off.
The Metric menu allows to toggle on and off visibility of CPU, GPU and memory metrics.
The Compute menu allows to toggle on and off visibility for each resource individually or enable All.
When you hover over a data point on the graph, a tooltip will appear showing the allocation and utilization for that metric (CPU,GPU or memory) and the relevant jobs.
The tooltip can be docked to appear on the right side of the graph by clicking <<. When undocked, the tooltip will only display a couple relevant jobs, when docked, more jobs will appear. You can undock the tooltip by clicking >>.
# Job Table
Below the Live Resources chart, is the Job Table. This contains a full list of all current and historical jobs run in the organization, along with accompanying utilization metadata. The metadata is all on a per job basis, for instance, Utilized CPU refers to how much of the allocated CPU for the specific job was utilized by the job.
# Customize the table
You can control with columns appear in the table by clicking the Customize button. The following fields can be chosen:
- Type: Which type of job is it (experiment, workspace, endpoint or app).
- Title: The name of the job.
- Project: Which project the job belongs to.
- User: Which user started the job.
- Status: The resource utilization status (active, inactive or pending).
- Job Status: The status of the job itself (initializing, ongoing, success, aborted, error or scheduled)
- Duration: Length of time the job has been running for (current or total).
- Created At: When was the job created.
- Resource: Which compute resource was used.
- Compute: Which compute template was used.
- Allocated GPU: Total allocated GPU for the job.
- Allocated CPU: Total allocated CPU for the job.
- Allocated Memory: Total allocated memory for the job.
- Utilized GPU: Percentage of allocated GPU utilized by job.
- Utilized CPU: Percentage of allocated CPU utilized by job.
- Utilized Memory: Percentage of allocated memory utilized by job.
- Image: Container used by job.
- Node Selectors: Any Kubernetes selector used for choosing node/node pool.
- Datasets Attached: Datasets used by job.
- Datasets Size: Size of dataset used by job.
# Filter the table
You can filter the table by clicking the Filter icon. You can then add one or more key, comparison and value combinations to filter the table. Added filters will appear along the top of the table.
For each added filter, you can remove it by clicking the X corresponding to the filter.
# Search through the table
You can use the Search bar to search the table by project, user and title. The search is case-sensitive.