Ask questionsCan synchronized batch norm (SyncBN) be used to avoid cheating? Is shuffling BN a must?
Ditto. I kept wondering about SyncBN vs ShuffleBN as to whether the former can effectively prevent cheating. SimCLR appears to be using SyncBN (referred to as "Global BN").
SyncBN is out of the box with PyTorch whereas Shuffling BN requires a bit more hacking. The fact that Shuffling BN is chosen must mean that it is better? (or that SyncBN wasn't ready at the time MoCo was designed?)
Answer questions KaimingHe
SyncBN is not sufficient in the case of MoCo: the keys in the queue are still from different batches, so the BN stats in the current batch can serve as a signature to tell which subset the positive key may be in.
Actually, in a typical multi-GPU setting, ShuffleBN is faster than SyncBN. ShuffleBN only requires AllGather twice in the entire network (actually, the input shuffle can be waived if implemented in data loader). SyncBN requires communication per layer.
Related questionsNo questions were found.