Functions#

So far we have only considered converting a Python class into a ZnTrack Node. Whilst ZnTrack classes are the more powerful tool a lightweight alternative is wrapping a Python function with @zntrack.nodify to gain access to a subset of the available ZnTrack tools.

[1]:
from zntrack import config

# When using ZnTrack we can write our code inside a Jupyter notebook.
# We can make use of this functionality by setting the `nb_name` config as follows:
config.nb_name = "07_functions.ipynb"

In the following example we will create an output file and write some parameters to it.

[4]:
from zntrack import nodify, NodeConfig
import pathlib


@nodify(outs=pathlib.Path("outs.txt"), params={"text": "Lorem Ipsum"})
def write_text(cfg: NodeConfig):
    cfg.outs.write_text(cfg.params.text)

The @nodify allows us to define all available DVC run options such as outs or deps together with a parameter dictionary. The params are cast into a DotDict which allows us to access them either via cfg.params["text"] or directly via cfg.params.text. Running the function will only create the Node for us and not execute the function. We can circumvent that by telling DVC to run the method via run=True.

[5]:
cfg = write_text(run=True)
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 07_functions.ipynb to script
[NbConvertApp] Writing 2495 bytes to 07_functions.py
Running DVC command: 'stage add -n write_text --force ...'
Creating 'dvc.yaml'
Adding stage 'write_text' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml .gitignore

To enable auto staging, run:

        dvc config core.autostage true
Running DVC command: 'repro write_text'
Running stage 'write_text':
> zntrack run src.write_text.write_text
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[6]:
cfg.outs.read_text()
[6]:
'Lorem Ipsum'

This also allows us to build DAGs by adding the output files as dependencies.

[7]:
@nodify(
    deps=pathlib.Path("outs.txt"),
    outs=[pathlib.Path("part_1.txt"), pathlib.Path("part_2.txt")],
)
def split_text(cfg: NodeConfig):
    text = cfg.deps.read_text()
    for text_part, outs_file in zip(text.split(" "), cfg.outs):
        outs_file.write_text(text_part)
[8]:
_ = split_text(run=True)
[NbConvertApp] Converting notebook 07_functions.ipynb to script
[NbConvertApp] Writing 2495 bytes to 07_functions.py
Running DVC command: 'stage add -n split_text --force ...'
Adding stage 'split_text' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml .gitignore

To enable auto staging, run:

        dvc config core.autostage true
Running DVC command: 'repro split_text'
Stage 'write_text' didn't change, skipping
Running stage 'split_text':
> zntrack run src.split_text.split_text
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[9]:
print(pathlib.Path("part_1.txt").read_text())
print(pathlib.Path("part_2.txt").read_text())
Lorem
Ipsum

Pros and Cons#

Wrapping a Python function and converting it into Node is closer to the original DVC API. It provides all the basic functionality and can be nicely applied to compact methods. The ZnTrack class API provides more powerful tools such as the zn.<method> and can be used without configuring any file names. Personal preferences allow everyone to use either method or combine them to get maximum benefit from ZnTrack and DVC.