Dependencies#

For ZnTrack there are two different ways to set up dependencies:

  1. Node based dependencies

  2. File based dependencies

Node dependencies#

We will first look at Node based dependencies starting from a RandomNumber Hello World example. In our first stage we create a random number and then we add another Node that depends on this one. We can do this very easily by using zn.deps.

This allows us to access all properties of the dependency attribute.

[1]:
import zntrack
from random import randrange
from pathlib import Path
[2]:
zntrack.config.nb_name = "03_dependencies.ipynb"
[4]:
!git init
!dvc init
Initialized empty Git repository in /tmp/tmpazvjbvbl/.git/
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>
[5]:
class RandomNumber(zntrack.Node):
    maximum = zntrack.zn.params()
    number = zntrack.zn.outs()

    def run(self):
        self.number = float(randrange(self.maximum))


class ComputePower(zntrack.Node):
    random_number: RandomNumber = zntrack.zn.deps()
    number = zntrack.zn.outs()
    power = zntrack.zn.params()

    def run(self):
        self.number = self.random_number.number**self.power

We can now create the stages the usual way and look at the outcomes. This will create the following graph for us:

image0

[6]:
with zntrack.Project() as project:
    random_number = RandomNumber(maximum=16)
    compute_power = ComputePower(random_number=random_number, power=2.0)
project.run(repro=False)
Running DVC command: 'stage add --name RandomNumber --force ...'
Creating 'dvc.yaml'
Adding stage 'RandomNumber' in 'dvc.yaml'

To track the changes with git, run:

        git add nodes/RandomNumber/.gitignore dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePower --force ...'
Adding stage 'ComputePower' in 'dvc.yaml'

To track the changes with git, run:

        git add nodes/ComputePower/.gitignore dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
[7]:
!dvc repro
Running stage 'RandomNumber':
> zntrack run src.RandomNumber.RandomNumber --name RandomNumber
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

Running stage 'ComputePower':
> zntrack run src.ComputePower.ComputePower --name ComputePower
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[8]:
random_number.load()
compute_power.load()
print(f"{random_number.number} ^ {compute_power.power} = {compute_power.number}")
3.0 ^ 2.0 = 9.0

File dependencies#

The second approach for specifying dependencies in ZnTrack is to depend on files. This is useful when our pipeline requires output files from a previous stage, or when we want to track the changes in an input file. To create a file dependency, we first create a file from our random number. We then use the path to that file as our dependency. Setting a file dependency is simple and can be done by passing pathlib.Path or str to the dvc.deps method. Like other dvc.<...> attributes, it also supports lists:

dependency: Path = dvc.deps([Path('some_file.txt'), 'some_other_file.txt'])

Info: Node working directory

It is recommended to store files created by a node in the node’s working directory (nwd), which is located at ./nodes/<nodename>. You can access the nwd using zntrack.nwd. Here’s an example:

file: Path = dvc.outs(zntrack.nwd / "random_number.txt")
[9]:
# zntrack: break


class WriteToFile(zntrack.Node):
    random_number: RandomNumber = zntrack.zn.deps()
    file: Path = zntrack.dvc.outs(zntrack.nwd / "random_number.txt")

    def run(self):
        self.file.write_text(str(self.random_number.number))


class PowerFromFile(zntrack.Node):
    file: Path = zntrack.zn.deps()
    number = zntrack.zn.outs()
    power = zntrack.zn.params(2)

    def run(self):
        number = float(self.file.read_text())
        self.number = number**self.power


class ComparePowers(zntrack.Node):
    power_deps = zntrack.zn.deps()

    def run(self):
        assert self.power_deps[0].number == self.power_deps[1].number

Let us create the stages and look at the graph.

[12]:
project.nodes
[12]:
NodeView((UUID('fcedbb2b-2f78-4e30-9d73-8663d88f83aa'), UUID('3b6008c4-a1bb-4fdf-9ce0-389c429fe4bf'), UUID('fc4b13b3-dd53-4347-8af1-3b8c6ca75b2a'), UUID('afdb6202-8db2-4f1d-a379-4af819846aee'), UUID('908d7dd4-405d-4da4-a40f-9cb323f5f0d2'), UUID('4d70c58b-8b22-44b6-aa97-43528094e209')))
[17]:
with zntrack.Project() as project:
    random_number = RandomNumber(maximum=16)
    compute_power = ComputePower(random_number=random_number, power=2.0)

    write_to_file = WriteToFile(random_number=random_number)
    power_from_file = PowerFromFile(file=write_to_file.file)
    compare_powerts = ComparePowers(power_deps=[power_from_file, compute_power])
project.run()
Running DVC command: 'stage add --name RandomNumber --force ...'
Modifying stage 'RandomNumber' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePower --force ...'
Modifying stage 'ComputePower' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name WriteToFile --force ...'
Adding stage 'WriteToFile' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml nodes/WriteToFile/.gitignore

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name PowerFromFile --force ...'
Adding stage 'PowerFromFile' in 'dvc.yaml'

To track the changes with git, run:

        git add nodes/PowerFromFile/.gitignore dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComparePowers --force ...'
Adding stage 'ComparePowers' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
Stage 'RandomNumber' didn't change, skipping
Running stage 'WriteToFile':
> zntrack run src.WriteToFile.WriteToFile --name WriteToFile
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Updating lock file 'dvc.lock'

Running stage 'PowerFromFile':
> zntrack run src.PowerFromFile.PowerFromFile --name PowerFromFile
Updating lock file 'dvc.lock'

Stage 'ComputePower' didn't change, skipping
Running stage 'ComparePowers':
> zntrack run src.ComparePowers.ComparePowers --name ComparePowers
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[18]:
!dvc dag
                +--------------+
                | RandomNumber |
                +--------------+
                **             ***
             ***                  ***
           **                        **
 +-------------+                       **
 | WriteToFile |                        *
 +-------------+                        *
        *                               *
        *                               *
        *                               *
+---------------+               +--------------+
| PowerFromFile |               | ComputePower |
+---------------+               +--------------+
                **             ***
                  ***        **
                     **    **
                +---------------+
                | ComparePowers |
                +---------------+
[19]:
# to verify we can also run the method manually
compare_powerts.load()
compare_powerts.run()

If we now look at our dvc.yaml we can see that for our Node dependencies we rely on the nodes/<node_name>/outs.json while for the file dependency it is directly connect to the passed file.

[20]:
from IPython.display import Pretty, display

display(Pretty("dvc.yaml"))
stages:
  RandomNumber:
    cmd: zntrack run src.RandomNumber.RandomNumber --name RandomNumber
    params:
    - RandomNumber
    outs:
    - nodes/RandomNumber/number.json
  ComputePower:
    cmd: zntrack run src.ComputePower.ComputePower --name ComputePower
    deps:
    - nodes/RandomNumber/number.json
    params:
    - ComputePower
    outs:
    - nodes/ComputePower/number.json
  WriteToFile:
    cmd: zntrack run src.WriteToFile.WriteToFile --name WriteToFile
    deps:
    - nodes/RandomNumber/number.json
    outs:
    - nodes/WriteToFile/random_number.txt
  PowerFromFile:
    cmd: zntrack run src.PowerFromFile.PowerFromFile --name PowerFromFile
    deps:
    - nodes/WriteToFile/random_number.txt
    params:
    - PowerFromFile
    outs:
    - nodes/PowerFromFile/number.json
  ComparePowers:
    cmd: zntrack run src.ComparePowers.ComparePowers --name ComparePowers
    deps:
    - nodes/ComputePower/number.json
    - nodes/PowerFromFile/number.json

Node attributes as dependencies#

It is also possible to specify a Node attribute as a dependency. In this case you will be able to access the value of the attribute directly instead of using the Node class. This can be used for all dvc.<option> and zn.<option> as well as e.g. class properties. Note that the dvc dependencies will still be written for the full Node and won’t be limited to the Node attribute. To be able to define a dependency of an attribute the zntrack.getdeps function is required.

[21]:
class ComputePowerFromNumber(zntrack.Node):
    number: float = zntrack.zn.deps()  # this will be a float instead of RandomNumber

    power: int = zntrack.zn.params()
    result: float = zntrack.zn.outs()

    def run(self):
        self.result = self.number**self.power
[22]:
with zntrack.Project() as project:
    random_number = RandomNumber(maximum=16)
    compute_power = ComputePowerFromNumber(number=random_number.number, power=2.0)
project.run()
Running DVC command: 'stage add --name RandomNumber --force ...'
Modifying stage 'RandomNumber' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePowerFromNumber --force ...'
Adding stage 'ComputePowerFromNumber' in 'dvc.yaml'

To track the changes with git, run:

        git add dvc.yaml nodes/ComputePowerFromNumber/.gitignore

To enable auto staging, run:

        dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
  validate(nb)
Stage 'RandomNumber' didn't change, skipping
Stage 'WriteToFile' didn't change, skipping
Stage 'PowerFromFile' didn't change, skipping
Stage 'ComputePower' didn't change, skipping
Stage 'ComparePowers' didn't change, skipping
Running stage 'ComputePowerFromNumber':
> zntrack run src.ComputePowerFromNumber.ComputePowerFromNumber --name ComputePowerFromNumber
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Updating lock file 'dvc.lock'

To track the changes with git, run:

        git add dvc.lock

To enable auto staging, run:

        dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.

getdeps(RandomNumber, "number") can also be replaced by getdeps(RandomNumber["nodename"], "number") or getdeps(RandomNumber.load(name="nodename"), "number"). The first argument represents the Node and the second argument is the attribute, similar to getattr(). ZnTrack also provides a shorthand for this via RandomNumber @ "number" or RandomNumber["nodename"] @ "number".

[23]:
compute_power.load()
[24]:
print(f"{compute_power.number} ^ {compute_power.power} = {compute_power.result}")
3.0 ^ 2.0 = 9.0
[ ]: