Data Versioning
← Back to Data Management for ML
Tracking changes to datasets over time, analogous to version control for code. Essential for reproducibility and auditing. Key tools: DVC (Data Version Control), LakeFS, Delta Lake.
Related
- Reproducibility (data versioning enables reproducibility)
- Data Validation (validate each data version)