DVC Options - Metrics and Plots#

The DVC way#

In the following part we will look into metrics and plots from ZnTrack Nodes. All dvc run options listed here can be used via dvc.<option>. With the exception of params, which is handled automatically. All these options take either str or pathlib.Path directed to the file the content should be stored in. As shown before, dvc.deps can also take another Node as an argument.

[1]:

import zntrack
from pathlib import Path
import json
import pandas as pd
import numpy as np

[2]:

zntrack.config.nb_name = "04_metrics_and_plots.ipynb"

[4]:

!git init
!dvc init

Initialized empty Git repository in /tmp/tmpq5kwayap/.git/
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>

In the following we define a simple Node that produces a metric and a plot output using json and pandas. We will queue multiple experiments with different outputs and then compare them afterwards.

[5]:

class MetricAndPlot(zntrack.Node):
    my_metric: Path = zntrack.dvc.metrics(Path("my_metric.json"))
    my_plots: Path = zntrack.dvc.plots("my_plots.csv")
    pre_factor = zntrack.zn.params(1.0)

    def run(self):
        self.my_metric.write_text(
            json.dumps(
                {"metric_1": 17 * self.pre_factor, "metric_2": 42 * self.pre_factor}
            )
        )

        x_data = np.linspace(0, 1.0 * self.pre_factor, 1000)
        y_data = np.exp(x_data)
        df = pd.DataFrame({"y": y_data, "x": x_data}).set_index("x")

        df.to_csv(self.my_plots)

[6]:

with zntrack.Project() as project:
    node = MetricAndPlot()

project.run()
!git add .
!git commit -m "First Run"

Running DVC command: 'stage add --name MetricAndPlot --force ...'

Creating 'dvc.yaml'
Adding stage 'MetricAndPlot' in 'dvc.yaml'

To track the changes with git, run:

        git add .gitignore dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true

Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script

Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot

[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py

Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[main (root-commit) 57992f7] First Run
 11 files changed, 778 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvcignore
 create mode 100644 .gitignore
 create mode 100644 04_metrics_and_plots.ipynb
 create mode 100644 dvc.lock
 create mode 100644 dvc.yaml
 create mode 100644 params.yaml
 create mode 100644 src/MetricAndPlot.py
 create mode 100644 src/__pycache__/MetricAndPlot.cpython-310.pyc
 create mode 100644 zntrack.json

[7]:

with project.create_experiment(name="factor_2"):
    node.pre_factor = 2
with project.create_experiment(name="factor_3"):
    node.pre_factor = 3
with project.create_experiment(name="factor_4"):
    node.pre_factor = 4
with project.create_experiment(name="factor_5"):
    node.pre_factor = 5

project.run_exp()

[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py

Queued experiment 'factor_2' for future execution.

[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script

Queued experiment 'factor_3' for future execution.

[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script

Queued experiment 'factor_4' for future execution.

[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script

Queued experiment 'factor_5' for future execution.

[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py

Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).

Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.yaml dvc.lock params.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock params.yaml dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.yaml dvc.lock params.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock params.yaml dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true

Ran experiment(s):
To apply the results of an experiment to your workspace run:

        dvc exp apply <exp>

To promote an experiment to a Git branch run:

        dvc exp branch <exp> <branch>

Now that all experiments are done, we can look at the metrics directly with dvc exp show or dvc metrics show/diff

[8]:

!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)

[8]:

	rev	typ	Created	parent	metric_1	metric_2	MetricAndPlot.pre_factor
Experiment
NaN	workspace	baseline	NaN	NaN	17.0	42.0	5.0
main	57992f7	baseline	2023-03-08T14:27:34	NaN	17.0	42.0	1.0
factor_5	8491765	branch_commit	2023-03-08T14:27:51	NaN	85.0	210.0	5.0
factor_4	646b1ad	branch_commit	2023-03-08T14:27:49	NaN	68.0	168.0	4.0
factor_3	efaea50	branch_commit	2023-03-08T14:27:46	NaN	51.0	126.0	3.0
factor_2	8790933	branch_base	2023-03-08T14:27:44	NaN	34.0	84.0	2.0

We can also use dvc plots show/diff to evaluate the plot data that we produced.

[9]:

!dvc plots diff HEAD factor_2 factor_3 factor_4 factor_5

file:///tmp/tmpq5kwayap/dvc_plots/index.html

The ZnTrack way#

ZnTrack provides and easier way to handle metrics. Similar to zn.outs() which does not require defining a path to outs file, one can use zn.metrics. The same is possible for plots via zn.plots(). To define additional options you can pass the keyword to zn.plots()

[10]:

class ZnTrackMetric(zntrack.Node):
    my_metric = zntrack.zn.metrics()
    my_plot = zntrack.zn.plots()

    def run(self):
        self.my_metric = {"alpha": 1.0, "beta": 0.00473}
        self.my_plot = pd.DataFrame({"val": np.sin(np.linspace(0, 3.14, 100))})
        self.my_plot.index.name = (  # For DVC it is required that the index has a column name
            "index"
        )


with zntrack.Project() as project:
    node = ZnTrackMetric()

project.run()

DeprecationWarning for write_graph: Building a graph is now done using 'with zntrack.Project() as project: ...' (Deprecated since 0.6.0)
Running DVC command: 'stage add --name ZnTrackMetric --force ...'

Adding stage 'ZnTrackMetric' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml nodes/ZnTrackMetric/.gitignore

To enable auto staging, run:

        dvc config core.autostage true

[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Running DVC command: 'repro ZnTrackMetric'

Running stage 'ZnTrackMetric':
> zntrack run src.ZnTrackMetric.ZnTrackMetric --name ZnTrackMetric
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.

[11]:

!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)

[11]:

	rev	typ	Created	parent	metric_1	metric_2	alpha	beta	MetricAndPlot.pre_factor
Experiment
NaN	workspace	baseline	NaN	NaN	17.0	42.0	1.0	0.00473	5.0
main	57992f7	baseline	2023-03-08T14:27:34	NaN	17.0	42.0	NaN	NaN	1.0
factor_5	8491765	branch_commit	2023-03-08T14:27:51	NaN	85.0	210.0	NaN	NaN	5.0
factor_4	646b1ad	branch_commit	2023-03-08T14:27:49	NaN	68.0	168.0	NaN	NaN	4.0
factor_3	efaea50	branch_commit	2023-03-08T14:27:46	NaN	51.0	126.0	NaN	NaN	3.0
factor_2	8790933	branch_base	2023-03-08T14:27:44	NaN	34.0	84.0	NaN	NaN	2.0

[12]:

!dvc plots show

file:///tmp/tmpq5kwayap/dvc_plots/index.html

[13]:

temp_dir.cleanup()