Overview#

ZnTrack is a user-friendly framework that simplifies the creation and tracking of experiments. It’s built on top of DVC, a powerful tool for version controlling machine learning projects. If you’re not familiar with DVC, we highly recommend reading the Getting Started guide to learn more about it.

While DVC provides all the necessary functionality, it was designed to be language independent. This often requires writing custom Python scripts, managing dependencies, and working with configuration files. ZnTrack addresses these challenges by providing a Python-specific interface that’s easy to use and well-integrated with Python workflows.

Just like Git was originally designed to serve as a low-level version control system engine, on top of which others could build front ends, ZnTrack was designed to build on top of DVC for Python. By doing so, it provides a more feature-rich, user-friendly interface that’s optimized for Python developers. You can think of it as similar to using Django or SQLAlchemy to make working with SQL easier and more tailored to Python. With ZnTrack, you can streamline the steps involved in experiment tracking and management, and enjoy a more streamlined workflow that’s optimized for Python developers.

Jupyter Notebook Support#

ZnTrack can extract Nodes defined in Jupyter Notebooks. It will try to extract the Node definition and write it into a python file. Therefore, it needs to know the name of the notebook.

For more complex workflows, it is recommended to define the Nodes inside Python files and import them into Jupyter Notebooks.

[1]:
from zntrack import config

# When using ZnTrack we can write our code inside a Jupyter notebook.
# We can make use of this functionality by setting the `nb_name` config as follows:
config.nb_name = "01_Intro.ipynb"

Setup#

Every project starts inside an empty directory. We can initialize a new project by running dvc init and git init inside the directory.

[2]:
from zntrack.utils import cwd_temp_dir

temp_dir = cwd_temp_dir()
[3]:
!git init
!dvc init
Initialized empty Git repository in /tmp/tmpkyrcn10i/.git/
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>

Nodes#

In DVC, a pipeline is organized into multiple stages, which can be created by inheriting from zntrack.Node and implementing a run() method.

The run() method defines the logic of your pipeline stage, which will later be executed by our pipeline manager (e.g. dvc repro).

As an example, let’s create a RandomNumber Node that generates a random integer between 0 and a parameterized maximum value. To do this, we’ll use the zntrack module to define our Node’s inputs and outputs:

[4]:
from zntrack import Node, zn, Project
from random import randrange


class RandomNumber(Node):
    number = zn.outs()
    maximum = zn.params()

    def run(self):
        self.number = randrange(self.maximum)

The ZnTrack class generates an __init__ method for all zn.params and other inputs automatically. When writing a custom __init__ it is important to add super().__init__(**kwargs) for ZnTrack to work.

class RandomNumber(Node):
    def __init__(self, maximum=None, **kwargs):
        super().__init__(**kwargs)
        self.maximum = maximum

For most cases the ZnTrack node just behaves like a normal python class.

[5]:
random_number = RandomNumber(maximum=512)
random_number.run()
print(random_number.number)
14

To add the Node to the DVC pipeline we can employ a context manager and use project.run().

[6]:
with Project() as project:
    node = RandomNumber(maximum=512)

project.run()
Running DVC command: 'stage add --name RandomNumber --force ...'
Creating 'dvc.yaml'
Adding stage 'RandomNumber' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml nodes/RandomNumber/.gitignore

To enable auto staging, run:

        dvc config core.autostage true
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 01_Intro.ipynb to script
Running stage 'RandomNumber':
> zntrack run src.RandomNumber.RandomNumber --name RandomNumber
[NbConvertApp] Writing 4644 bytes to 01_Intro.py
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.

To gain access to the results we can load the Node via the classmethod load() and look at the number attribute.

[7]:
node.load()
node.number
[7]:
354

Instead of passing parameters you can also pass a parameter file (A list of all supported files, e.g. json/yaml can be found in the documentation DVC Params). To do so you can use zntrack.dvc.params(<param_file>).