DVC Options - Metrics and Plots#

The DVC way#

In the following part we will look into metrics and plots from ZnTrack Nodes. All dvc run options listed here can be used via dvc.<option>. With the exception of params, which is handled automatically. All these options take either str or pathlib.Path directed to the file the content should be stored in. As shown before, dvc.deps can also take another Node as an argument.

[1]:
import zntrack
from pathlib import Path
import json
import pandas as pd
import numpy as np
[2]:
zntrack.config.nb_name = "04_metrics_and_plots.ipynb"
[4]:
!git init
!dvc init
Initialized empty Git repository in /tmp/tmpq5kwayap/.git/
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>

In the following we define a simple Node that produces a metric and a plot output using json and pandas. We will queue multiple experiments with different outputs and then compare them afterwards.

[5]:
class MetricAndPlot(zntrack.Node):
    my_metric: Path = zntrack.dvc.metrics(Path("my_metric.json"))
    my_plots: Path = zntrack.dvc.plots("my_plots.csv")
    pre_factor = zntrack.zn.params(1.0)

    def run(self):
        self.my_metric.write_text(
            json.dumps(
                {"metric_1": 17 * self.pre_factor, "metric_2": 42 * self.pre_factor}
            )
        )

        x_data = np.linspace(0, 1.0 * self.pre_factor, 1000)
        y_data = np.exp(x_data)
        df = pd.DataFrame({"y": y_data, "x": x_data}).set_index("x")

        df.to_csv(self.my_plots)
[6]:
with zntrack.Project() as project:
    node = MetricAndPlot()

project.run()
!git add .
!git commit -m "First Run"
Running DVC command: 'stage add --name MetricAndPlot --force ...'
Creating 'dvc.yaml'
Adding stage 'MetricAndPlot' in 'dvc.yaml'

To track the changes with git, run:

        git add .gitignore dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[main (root-commit) 57992f7] First Run
 11 files changed, 778 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvcignore
 create mode 100644 .gitignore
 create mode 100644 04_metrics_and_plots.ipynb
 create mode 100644 dvc.lock
 create mode 100644 dvc.yaml
 create mode 100644 params.yaml
 create mode 100644 src/MetricAndPlot.py
 create mode 100644 src/__pycache__/MetricAndPlot.cpython-310.pyc
 create mode 100644 zntrack.json
[7]:
with project.create_experiment(name="factor_2"):
    node.pre_factor = 2
with project.create_experiment(name="factor_3"):
    node.pre_factor = 3
with project.create_experiment(name="factor_4"):
    node.pre_factor = 4
with project.create_experiment(name="factor_5"):
    node.pre_factor = 5

project.run_exp()
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Queued experiment 'factor_2' for future execution.
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_3' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_4' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
Queued experiment 'factor_5' for future execution.
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).

Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.yaml dvc.lock params.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock params.yaml dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.yaml dvc.lock params.yaml

To enable auto staging, run:

        dvc config core.autostage true
Running stage 'MetricAndPlot':
> zntrack run src.MetricAndPlot.MetricAndPlot --name MetricAndPlot
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock params.yaml dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true

Ran experiment(s):
To apply the results of an experiment to your workspace run:

        dvc exp apply <exp>

To promote an experiment to a Git branch run:

        dvc exp branch <exp> <branch>

Now that all experiments are done, we can look at the metrics directly with dvc exp show or dvc metrics show/diff

[8]:
!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)
[8]:
rev typ Created parent metric_1 metric_2 MetricAndPlot.pre_factor
Experiment
NaN workspace baseline NaN NaN 17.0 42.0 5.0
main 57992f7 baseline 2023-03-08T14:27:34 NaN 17.0 42.0 1.0
factor_5 8491765 branch_commit 2023-03-08T14:27:51 NaN 85.0 210.0 5.0
factor_4 646b1ad branch_commit 2023-03-08T14:27:49 NaN 68.0 168.0 4.0
factor_3 efaea50 branch_commit 2023-03-08T14:27:46 NaN 51.0 126.0 3.0
factor_2 8790933 branch_base 2023-03-08T14:27:44 NaN 34.0 84.0 2.0

We can also use dvc plots show/diff to evaluate the plot data that we produced.

[9]:
!dvc plots diff HEAD factor_2 factor_3 factor_4 factor_5
file:///tmp/tmpq5kwayap/dvc_plots/index.html

The ZnTrack way#

ZnTrack provides and easier way to handle metrics. Similar to zn.outs() which does not require defining a path to outs file, one can use zn.metrics. The same is possible for plots via zn.plots(). To define additional options you can pass the keyword to zn.plots()

[10]:
class ZnTrackMetric(zntrack.Node):
    my_metric = zntrack.zn.metrics()
    my_plot = zntrack.zn.plots()

    def run(self):
        self.my_metric = {"alpha": 1.0, "beta": 0.00473}
        self.my_plot = pd.DataFrame({"val": np.sin(np.linspace(0, 3.14, 100))})
        self.my_plot.index.name = (  # For DVC it is required that the index has a column name
            "index"
        )


with zntrack.Project() as project:
    node = ZnTrackMetric()

project.run()
DeprecationWarning for write_graph: Building a graph is now done using 'with zntrack.Project() as project: ...' (Deprecated since 0.6.0)
Running DVC command: 'stage add --name ZnTrackMetric --force ...'
Adding stage 'ZnTrackMetric' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml nodes/ZnTrackMetric/.gitignore

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 04_metrics_and_plots.ipynb to script
[NbConvertApp] Writing 3574 bytes to 04_metrics_and_plots.py
Running DVC command: 'repro ZnTrackMetric'
Running stage 'ZnTrackMetric':
> zntrack run src.ZnTrackMetric.ZnTrackMetric --name ZnTrackMetric
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[11]:
!dvc exp show --csv > exp_show.csv
pd.read_csv("exp_show.csv", index_col=0)
[11]:
rev typ Created parent metric_1 metric_2 alpha beta MetricAndPlot.pre_factor
Experiment
NaN workspace baseline NaN NaN 17.0 42.0 1.0 0.00473 5.0
main 57992f7 baseline 2023-03-08T14:27:34 NaN 17.0 42.0 NaN NaN 1.0
factor_5 8491765 branch_commit 2023-03-08T14:27:51 NaN 85.0 210.0 NaN NaN 5.0
factor_4 646b1ad branch_commit 2023-03-08T14:27:49 NaN 68.0 168.0 NaN NaN 4.0
factor_3 efaea50 branch_commit 2023-03-08T14:27:46 NaN 51.0 126.0 NaN NaN 3.0
factor_2 8790933 branch_base 2023-03-08T14:27:44 NaN 34.0 84.0 NaN NaN 2.0
[12]:
!dvc plots show
file:///tmp/tmpq5kwayap/dvc_plots/index.html
[13]:
temp_dir.cleanup()