# Monitor your Experiments health

When experiments are running it's very important to follow its progress- to be able to see that you get the results that you're expecting as its advances.

In addition to monitoring the progress of the experiment, it's also important to monitor the health of the experiment - especially to check that the memory of the experiment isn't exploding and you won't receive errors like OOM.

# Experiment metrics

For each experiment you'll have a live CPU/memory/IO/gpu utilization/gpu memory charts:

Though, sometimes you would like to monitor your experiments more closely, and to check that your experiments won't reach an out-of-memory state.

You can use the cnvrg SDK to check the experiments stats in your code:

from cnvrg import Experiment
Experiment.get_utilization()
#{'cpu': pcputimes(user=0.8726608, system=0.162425856, children_user=0.0, children_system=0.0),
# 'cpu_precent': 0.0,
# 'memory_info': 0.5182981491088867,
# 'threads': 9}

If the response you receive is that your memory or CPU metric is higher than it should be or could reach an out-of-memory state, you can stop your current experiment and restart it:

from cnvrg import Experiment
Experiment().restart(
		message="Im restarting because the memory is over 90%", 
		sync=True
)

when setting sync to be True it will first sync the current state of the experiment and only then will stop the experiment and rerun it.

Last Updated: 1/6/2020, 7:55:00 AM