Ask questions[TF 2.0] Respect masking in Keras loss reduction (i.e., support an equivalent of default TF 1.0 loss reduction SUM_OVER_NONZERO_WEIGHTS)
Describe the feature and the current behavior/state.
In TF 1.0, default loss reduction was
SUM_OVER_NONZERO_WEIGHTS. For NLP with sequences on inputs, it normalized losses by number of valid elements (i.e., sum of non-padding words in input sentences).
With Keras API, default reduction
SUM_OVER_BATCH_SIZE does not respect masking, so if a batch of sequences is passed on input, it is normalized by total batch size including padding (masked) elements. No reduction in Keras is available which would recover the previous TF 1.0 default
Will this change the current api? How?
My proposal is to respect masking in the
SUM_OVER_BATCH_SIZE reduction. If a mask is set, then
BATCH_SIZE should probably correspond to the number of unmasked elements anyway.
Alternatively, a new reduction
SUM_OVER_MASKED_BATCH_SIZE could be added.
Who will benefit with this feature?
I think anyone using masked losses, especially if upgrading from TF 1.
Answer questions pavithrasv
@foxik would you like to send us a PR with this change?