Ask questionsOptimizer clipvalue and clipnorm not working in Tensorflow 2.0
Describe the current behavior clipvalue and clipnorm in Optimizers does nothing!
Describe the expected behavior By setting clipvalue=0 or clipnorm=0 no training should occur (gradients should be 0!), but the network still trains, and if using a large learning rate, loss goes to nan.
Code to reproduce the issue Gradient is clearly not zero since the network is getting modified at each iteration.
Sanity check by setting lr=0 No training occurs when lr=0, as expected.
Answer questions tomerk
@karlhjm no, we still disable it in 2.2 w/ distribution strategies enabled.
@zaccharieramzi yes, happy to elaborate: There's two possible places to clip when you have distribution strategies enabled:
We want it working w/ the second case (clipping after gradients are aggregated). The issue is the optimizers are written with clipping happening in the code before aggregation does.
We looked into changing this, but it would have required either:
apply_gradients/other non-minimize methods
So rather than:
We instead decided to leave this disabled for now. We'll roll support for this into a larger optimizer refactoring that solves a larger set of issues. (RFC for that is at https://github.com/tensorflow/community/pull/234)