Version control is used to keep track of modifications made in a software code. Similarly, when building machine learning (ML) systems, it is essential to track things, such as the datasets used to train the model, the hyperparameters and pipeline used, the version of tensorflow used to create the model, and many more.
ML artifacts’ history and lineage are very complicated than a simple, linear log. Git can be used to track the code to one extent, but we need something to track your models, datasets, and more. The complexity of ML code and artifacts like models, datasets, and much more requires a similar approach.
Therefore, the researchers have introduced Machine Learning Metadata (MLMD), a standalone library to track one’s entire ML workflow’s full lineage from data ingestion, data preprocessing, validation, training, evaluation, deployment, etc. MLMD also comes integrated with TensorFlow Extended. Read More