promptdojo_

Optimizers and learning-rate schedulers — step 2 of 7

Why do most training runs decay the learning rate over time — larger early, smaller later?