Dependencies#
For ZnTrack there are two different ways to set up dependencies:
Node based dependencies
File based dependencies
Node dependencies#
We will first look at Node based dependencies starting from a RandomNumber Hello World
example. In our first stage we create a random number and then we add another Node that depends on this one. We can do this very easily by using zn.deps
.
This allows us to access all properties of the dependency
attribute.
[1]:
import zntrack
from random import randrange
from pathlib import Path
[2]:
zntrack.config.nb_name = "03_dependencies.ipynb"
[4]:
!git init
!dvc init
Initialized empty Git repository in /tmp/tmpazvjbvbl/.git/
Initialized DVC repository.
You can now commit the changes to git.
+---------------------------------------------------------------------+
| |
| DVC has enabled anonymous aggregate usage analytics. |
| Read the analytics documentation (and how to opt-out) here: |
| <https://dvc.org/doc/user-guide/analytics> |
| |
+---------------------------------------------------------------------+
What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>
[5]:
class RandomNumber(zntrack.Node):
maximum = zntrack.zn.params()
number = zntrack.zn.outs()
def run(self):
self.number = float(randrange(self.maximum))
class ComputePower(zntrack.Node):
random_number: RandomNumber = zntrack.zn.deps()
number = zntrack.zn.outs()
power = zntrack.zn.params()
def run(self):
self.number = self.random_number.number**self.power
We can now create the stages the usual way and look at the outcomes. This will create the following graph for us:
[6]:
with zntrack.Project() as project:
random_number = RandomNumber(maximum=16)
compute_power = ComputePower(random_number=random_number, power=2.0)
project.run(repro=False)
Running DVC command: 'stage add --name RandomNumber --force ...'
Creating 'dvc.yaml'
Adding stage 'RandomNumber' in 'dvc.yaml'
To track the changes with git, run:
git add nodes/RandomNumber/.gitignore dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
Jupyter support is an experimental feature! Please save your notebook before running this command!
Submit issues to https://github.com/zincware/ZnTrack.
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePower --force ...'
Adding stage 'ComputePower' in 'dvc.yaml'
To track the changes with git, run:
git add nodes/ComputePower/.gitignore dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
[7]:
!dvc repro
Running stage 'RandomNumber':
> zntrack run src.RandomNumber.RandomNumber --name RandomNumber
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
Running stage 'ComputePower':
> zntrack run src.ComputePower.ComputePower --name ComputePower
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[8]:
random_number.load()
compute_power.load()
print(f"{random_number.number} ^ {compute_power.power} = {compute_power.number}")
3.0 ^ 2.0 = 9.0
File dependencies#
The second approach for specifying dependencies in ZnTrack is to depend on files. This is useful when our pipeline requires output files from a previous stage, or when we want to track the changes in an input file. To create a file dependency, we first create a file from our random number. We then use the path to that file as our dependency. Setting a file dependency is simple and can be done by passing pathlib.Path
or str
to the dvc.deps
method. Like other dvc.<...>
attributes,
it also supports lists:
dependency: Path = dvc.deps([Path('some_file.txt'), 'some_other_file.txt'])
Info: Node working directory
It is recommended to store files created by a node in the node’s working directory (nwd), which is located at ./nodes/<nodename>
. You can access the nwd using zntrack.nwd
. Here’s an example:
file: Path = dvc.outs(zntrack.nwd / "random_number.txt")
[9]:
# zntrack: break
class WriteToFile(zntrack.Node):
random_number: RandomNumber = zntrack.zn.deps()
file: Path = zntrack.dvc.outs(zntrack.nwd / "random_number.txt")
def run(self):
self.file.write_text(str(self.random_number.number))
class PowerFromFile(zntrack.Node):
file: Path = zntrack.zn.deps()
number = zntrack.zn.outs()
power = zntrack.zn.params(2)
def run(self):
number = float(self.file.read_text())
self.number = number**self.power
class ComparePowers(zntrack.Node):
power_deps = zntrack.zn.deps()
def run(self):
assert self.power_deps[0].number == self.power_deps[1].number
Let us create the stages and look at the graph.
[12]:
project.nodes
[12]:
NodeView((UUID('fcedbb2b-2f78-4e30-9d73-8663d88f83aa'), UUID('3b6008c4-a1bb-4fdf-9ce0-389c429fe4bf'), UUID('fc4b13b3-dd53-4347-8af1-3b8c6ca75b2a'), UUID('afdb6202-8db2-4f1d-a379-4af819846aee'), UUID('908d7dd4-405d-4da4-a40f-9cb323f5f0d2'), UUID('4d70c58b-8b22-44b6-aa97-43528094e209')))
[17]:
with zntrack.Project() as project:
random_number = RandomNumber(maximum=16)
compute_power = ComputePower(random_number=random_number, power=2.0)
write_to_file = WriteToFile(random_number=random_number)
power_from_file = PowerFromFile(file=write_to_file.file)
compare_powerts = ComparePowers(power_deps=[power_from_file, compute_power])
project.run()
Running DVC command: 'stage add --name RandomNumber --force ...'
Modifying stage 'RandomNumber' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePower --force ...'
Modifying stage 'ComputePower' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name WriteToFile --force ...'
Adding stage 'WriteToFile' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml nodes/WriteToFile/.gitignore
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name PowerFromFile --force ...'
Adding stage 'PowerFromFile' in 'dvc.yaml'
To track the changes with git, run:
git add nodes/PowerFromFile/.gitignore dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComparePowers --force ...'
Adding stage 'ComparePowers' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
Stage 'RandomNumber' didn't change, skipping
Running stage 'WriteToFile':
> zntrack run src.WriteToFile.WriteToFile --name WriteToFile
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Updating lock file 'dvc.lock'
Running stage 'PowerFromFile':
> zntrack run src.PowerFromFile.PowerFromFile --name PowerFromFile
Updating lock file 'dvc.lock'
Stage 'ComputePower' didn't change, skipping
Running stage 'ComparePowers':
> zntrack run src.ComparePowers.ComparePowers --name ComparePowers
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
[18]:
!dvc dag
+--------------+
| RandomNumber |
+--------------+
** ***
*** ***
** **
+-------------+ **
| WriteToFile | *
+-------------+ *
* *
* *
* *
+---------------+ +--------------+
| PowerFromFile | | ComputePower |
+---------------+ +--------------+
** ***
*** **
** **
+---------------+
| ComparePowers |
+---------------+
[19]:
# to verify we can also run the method manually
compare_powerts.load()
compare_powerts.run()
If we now look at our dvc.yaml
we can see that for our Node dependencies we rely on the nodes/<node_name>/outs.json
while for the file dependency it is directly connect to the passed file.
[20]:
from IPython.display import Pretty, display
display(Pretty("dvc.yaml"))
stages:
RandomNumber:
cmd: zntrack run src.RandomNumber.RandomNumber --name RandomNumber
params:
- RandomNumber
outs:
- nodes/RandomNumber/number.json
ComputePower:
cmd: zntrack run src.ComputePower.ComputePower --name ComputePower
deps:
- nodes/RandomNumber/number.json
params:
- ComputePower
outs:
- nodes/ComputePower/number.json
WriteToFile:
cmd: zntrack run src.WriteToFile.WriteToFile --name WriteToFile
deps:
- nodes/RandomNumber/number.json
outs:
- nodes/WriteToFile/random_number.txt
PowerFromFile:
cmd: zntrack run src.PowerFromFile.PowerFromFile --name PowerFromFile
deps:
- nodes/WriteToFile/random_number.txt
params:
- PowerFromFile
outs:
- nodes/PowerFromFile/number.json
ComparePowers:
cmd: zntrack run src.ComparePowers.ComparePowers --name ComparePowers
deps:
- nodes/ComputePower/number.json
- nodes/PowerFromFile/number.json
Node attributes as dependencies#
It is also possible to specify a Node attribute as a dependency. In this case you will be able to access the value of the attribute directly instead of using the Node class. This can be used for all dvc.<option>
and zn.<option>
as well as e.g. class properties. Note that the dvc dependencies will still be written for the full Node and won’t be limited to the Node attribute. To be able to define a dependency of an attribute the zntrack.getdeps
function is required.
[21]:
class ComputePowerFromNumber(zntrack.Node):
number: float = zntrack.zn.deps() # this will be a float instead of RandomNumber
power: int = zntrack.zn.params()
result: float = zntrack.zn.outs()
def run(self):
self.result = self.number**self.power
[22]:
with zntrack.Project() as project:
random_number = RandomNumber(maximum=16)
compute_power = ComputePowerFromNumber(number=random_number.number, power=2.0)
project.run()
Running DVC command: 'stage add --name RandomNumber --force ...'
Modifying stage 'RandomNumber' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Running DVC command: 'stage add --name ComputePowerFromNumber --force ...'
Adding stage 'ComputePowerFromNumber' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml nodes/ComputePowerFromNumber/.gitignore
To enable auto staging, run:
dvc config core.autostage true
[NbConvertApp] Converting notebook 03_dependencies.ipynb to script
/data/fzills/miniconda3/envs/zntrack/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
Stage 'RandomNumber' didn't change, skipping
Stage 'WriteToFile' didn't change, skipping
Stage 'PowerFromFile' didn't change, skipping
Stage 'ComputePower' didn't change, skipping
Stage 'ComparePowers' didn't change, skipping
Running stage 'ComputePowerFromNumber':
> zntrack run src.ComputePowerFromNumber.ComputePowerFromNumber --name ComputePowerFromNumber
[NbConvertApp] Writing 6203 bytes to 03_dependencies.py
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
getdeps(RandomNumber, "number")
can also be replaced by getdeps(RandomNumber["nodename"], "number")
or getdeps(RandomNumber.load(name="nodename"), "number")
. The first argument represents the Node and the second argument is the attribute, similar to getattr()
. ZnTrack also provides a shorthand for this via RandomNumber @ "number"
or RandomNumber["nodename"] @ "number"
.
[23]:
compute_power.load()
[24]:
print(f"{compute_power.number} ^ {compute_power.power} = {compute_power.result}")
3.0 ^ 2.0 = 9.0
[ ]: