ML Training Debugging
Systematic debugging guide for machine learning training issues with PyTorch Lightning.
Quick Diagnosis
Identify your problem category:
| Symptom | Category | Quick Check |
|---|---|---|
NaN or Inf in loss | Loss Issues | Check learning rate, gradient clipping |
| Training loss >> validation loss | Overfitting | Add regularization, data augmentation |
| Both losses high | Underfitting | Increase model capacity, train longer |
[Description truncada. Veja o README completo no GitHub.]