🔰 Code Repository, Models and Experiments
- [ ] Codebase is well-organized
- [ ] Model naming is clear and intuitive
- [ ] Experiment logs are accurate and detailed
Consider MLFlow, W&B or other similar tools and services.
- [ ] Essential metadata for each model is available
Dataset version, train script version, and training parameters.
Consider using a DVC tool.
📊 Data Preparation and Analysis
- [ ] Original data visualization scripts/tools are used
To ensure accurate interpretation and adequacy of labels.
- [ ] Original data analysis is conducted
Evaluating characteristics like class count, sample distribution by class, object size distribution for detection, and pixel distribution in masks, among others.
🗄 Datasets and Integrity
- [ ] Data has been converted to an optimal format
Consider HDF5 – one of the most convenient formats.
To reduce volume and disk load, it is advisable to store data in 8-bit if acceptable.
- [ ] Split into Train and Test has been executed as separate sets
Ideally, Test and Validation should also be distinguished.
- [ ] Data in the databases/sets are randomly shuffled
- [ ] The relationship between the original data and the data in the databases is preserved
- [ ] Metadata is associated with the data
E.g. attributes in HDF5 store the version of the data generation script, parameters, etc.
- [ ] Developed a script for visualizing data from the database
Thereby ensuring the correctness of data storage in the database.
🧮 Evaluating Models