DVC Options - Metrics and Plots#
The DVC way#
In the following part we will look into metrics and plots from ZnTrack Nodes. All dvc run
options listed here can be used via dvc.<option>
. With the exception of params, which is handled automatically. All these options take either str
or pathlib.Path
directed to the file the content should be stored in. As shown before, dvc.deps
can also take another Node
as an argument.
[1]:
import zntrack
from pathlib import Path
import json
import pandas as pd
import numpy as np
[2]:
zntrack.config.nb_name = "04_metrics_and_plots.ipynb"
[4]:
!git init
!dvc init
Initialized empty Git repository in /tmp/tmpq5kwayap/.git/
Initialized DVC repository.
You can now commit the changes to git.
+---------------------------------------------------------------------+
| |
| DVC has enabled anonymous aggregate usage analytics. |
| Read the analytics documentation (and how to opt-out) here: |
| <https://dvc.org/doc/user-guide/analytics> |
| |
+---------------------------------------------------------------------+
What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>
In the following we define a simple Node that produces a metric and a plot output using json
and pandas
. We will queue multiple experiments with different outputs and then compare them afterwards.
[5]:
class MetricAndPlot(zntrack.Node):
my_metric: Path = zntrack.dvc.metrics(Path("my_metric.json"))
my_plots: Path = zntrack.dvc.plots("my_plots.csv")
pre_factor = zntrack.zn.params(1.0)
def run(self):
self.my_metric.write_text(
json.dumps(
{"metric_1": 17 * self.pre_factor, "metric_2": 42 * self.pre_factor}
)
)
x_data = np.linspace(0, 1.0 * self.pre_factor, 1000)
y_data = np.exp(x_data)
df = pd.DataFrame({"y": y_data, "x": x_data}).set_index("x")
df.to_csv(self.my_plots)
[6]:
with zntrack.Project() as project:
node = MetricAndPlot()
project.run()
!git add .
!git commit -m "First Run"
Running DVC command: 'stage add --name MetricAndPlot --force ...'
Creating 'dvc.yaml'
Adding stage 'MetricAndPlot' in 'dvc.yaml'
To track the changes with git, run:
git add .gitignore dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[main (root-commit) 57992f7] First Run
11 files changed, 778 insertions(+)
create mode 100644 .dvc/.gitignore
create mode 100644 .dvc/config
create mode 100644 .dvcignore
create mode 100644 .gitignore
create mode 100644 04_metrics_and_plots.ipynb
create mode 100644 dvc.lock
create mode 100644 dvc.yaml
create mode 100644 params.yaml
create mode 100644 src/MetricAndPlot.py
create mode 100644 src/__pycache__/MetricAndPlot.cpython-310.pyc
create mode 100644 zntrack.json
[7]:
with project.create_experiment(name="factor_2"):
node.pre_factor = 2
with project.create_experiment(name="factor_3"):
node.pre_factor = 3
with project.create_experiment(name="factor_4"):
node.pre_factor = 4
with project.create_experiment(name="factor_5"):
node.pre_factor = 5
project.run_exp()
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Queued experiment 'factor_2' for future execution.
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_3' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_4' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_5' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.yaml dvc.lock params.yaml
To enable auto staging, run:
dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock params.yaml dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.yaml dvc.lock params.yaml
To enable auto staging, run:
dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock params.yaml dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
Ran experiment(s):
To apply the results of an experiment to your workspace run:
dvc exp apply <exp>
To promote an experiment to a Git branch run:
dvc exp branch <exp> <branch>
Now that all experiments are done, we can look at the metrics directly with dvc exp show
or dvc metrics show/diff
[8]:
!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)
[8]:
rev | typ | Created | parent | metric_1 | metric_2 | MetricAndPlot.pre_factor | |
---|---|---|---|---|---|---|---|
Experiment | |||||||
NaN | workspace | baseline | NaN | NaN | 17.0 | 42.0 | 5.0 |
main | 57992f7 | baseline | 2023-03-08T14:27:34 | NaN | 17.0 | 42.0 | 1.0 |
factor_5 | 8491765 | branch_commit | 2023-03-08T14:27:51 | NaN | 85.0 | 210.0 | 5.0 |
factor_4 | 646b1ad | branch_commit | 2023-03-08T14:27:49 | NaN | 68.0 | 168.0 | 4.0 |
factor_3 | efaea50 | branch_commit | 2023-03-08T14:27:46 | NaN | 51.0 | 126.0 | 3.0 |
factor_2 | 8790933 | branch_base | 2023-03-08T14:27:44 | NaN | 34.0 | 84.0 | 2.0 |
We can also use dvc plots show/diff
to evaluate the plot data that we produced.
[9]:
!dvc plots diff HEAD factor_2 factor_3 factor_4 factor_5
file:///tmp/tmpq5kwayap/dvc_plots/index.html
The ZnTrack way#
ZnTrack provides and easier way to handle metrics. Similar to zn.outs()
which does not require defining a path to outs file, one can use zn.metrics
. The same is possible for plots via zn.plots()
. To define additional options you can pass the keyword to zn.plots()
[10]:
class ZnTrackMetric(zntrack.Node):
my_metric = zntrack.zn.metrics()
my_plot = zntrack.zn.plots()
def run(self):
self.my_metric = {"alpha": 1.0, "beta": 0.00473}
self.my_plot = pd.DataFrame({"val": np.sin(np.linspace(0, 3.14, 100))})
self.my_plot.index.name = ( # For DVC it is required that the index has a column name
"index"
)
with zntrack.Project() as project:
node = ZnTrackMetric()
project.run()
DeprecationWarning for write_graph: Building a graph is now done using 'with zntrack.Project() as project: ...' (Deprecated since 0.6.0)
Running DVC command: 'stage add --name ZnTrackMetric --force ...'
Adding stage 'ZnTrackMetric' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml nodes/ZnTrackMetric/.gitignore
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Running DVC command: 'repro ZnTrackMetric'
Running stage 'ZnTrackMetric':
> zntrack run src.ZnTrackMetric.ZnTrackMetric --name ZnTrackMetric
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[11]:
!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)
[11]:
rev | typ | Created | parent | metric_1 | metric_2 | alpha | beta | MetricAndPlot.pre_factor | |
---|---|---|---|---|---|---|---|---|---|
Experiment | |||||||||
NaN | workspace | baseline | NaN | NaN | 17.0 | 42.0 | 1.0 | 0.00473 | 5.0 |
main | 57992f7 | baseline | 2023-03-08T14:27:34 | NaN | 17.0 | 42.0 | NaN | NaN | 1.0 |
factor_5 | 8491765 | branch_commit | 2023-03-08T14:27:51 | NaN | 85.0 | 210.0 | NaN | NaN | 5.0 |
factor_4 | 646b1ad | branch_commit | 2023-03-08T14:27:49 | NaN | 68.0 | 168.0 | NaN | NaN | 4.0 |
factor_3 | efaea50 | branch_commit | 2023-03-08T14:27:46 | NaN | 51.0 | 126.0 | NaN | NaN | 3.0 |
factor_2 | 8790933 | branch_base | 2023-03-08T14:27:44 | NaN | 34.0 | 84.0 | NaN | NaN | 2.0 |
[12]:
!dvc plots show
file:///tmp/tmpq5kwayap/dvc_plots/index.html
[13]:
temp_dir.cleanup()