Data Version Control#
Data Version Control (DVC) is a fundamental building block of the ZnTrack package. To learn more about DVC, please refer to the DVC documentation. DVC is responsible for keeping track of all the files and loading results from the cache if they are already available.
Why not just DVC?#
DVC provides all the tools necessary to build the workflows described in the previous section. However, it is designed as a Command Line Tool. With ZnTrack, you can build workflows more conveniently by using Python functions and classes directly. This is especially useful for more complex workflows where code reuse and building more complex workflows are important. Additionally, nodes written in ZnTrack can be easily shared with others and even pip installed (as we’ll see in later sections).
DVC CLI#
When using ZnTrack, you’ll often make extensive use of the DVC CLI. Here are some of the most important commands:
dvc init
: Initializes a DVC repositorydvc repro
: Reproduces the pipelinedvc exp run
: Runs an experimentdvc checkout
: Checks out the results of the pipelinedvc push
: Pushes the results of the pipeline to a remote storage