profile
viewpoint
Kaiming He KaimingHe Facebook http://kaiminghe.com/ Research Scientist at FAIR

KaimingHe/deep-residual-networks 5422

Deep Residual Learning for Image Recognition

ShaoqingRen/faster_rcnn 2352

Faster R-CNN

KaimingHe/resnet-1k-layers 760

Deep Residual Networks with 1K Layers

ShaoqingRen/caffe 102

Caffe fork that supports SPP_net or faster R-CNN

ppwwyyxx/moco.tensorflow 100

A TensorFlow re-implementation of Momentum Contrast (MoCo): https://arxiv.org/abs/1911.05722

KaimingHe/examples 6

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

issue closedfacebookresearch/moco

Are there results with other normalizations?

Hello, thanks for the awesome project and paper.

Are there some results with other normalizations (instance norm, layer norm...) instead of shuffle BN?

I found that shuffle BN takes 20 % of times in def forward(self, im_q, im_k) with both V100x4, V100x2 settings.

In addition, the shuffle time is 6x times longer than inference of key features in https://github.com/facebookresearch/moco/blob/master/moco/builder.py#L133-L135

  • shuffle time (line:133): 0.06 s
  • inference (line 135): 0.01 s

I think if replacement of batchnorm with other normalizations does not hurts the results, we can make the model training more faster.

closed time in a month

LeeDoYup

issue commentfacebookresearch/moco

Are there results with other normalizations?

I am not sure how you profile your timing. Shuffling is not slow at all. It is much faster than other alternatives such as SyncBN, because SyncBN happens for all layers, but shuffling is just a one-time effort. Shuffling of the input can be further optimized into the dataloader (not in this code), so is virtually free; shuffling of the output feature is very small.

We have also tried GroupNorm, and it is ~2% worse than BN.

LeeDoYup

comment created time in a month

startedgoogle-research/simclr

started time in a month

create barnchfacebookresearch/moco

branch : KaimingHe-patch-2

created branch time in a month

PR opened facebookresearch/moco

Update README.md on Colab notebook CLA Signed
+1 -0

0 comment

1 changed file

pr created time in a month

fork KaimingHe/examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

fork in 2 months

issue commentfacebookresearch/moco

torch.multiprocessing.spawn got error "process 0 terminated with exit code 1"

There is no enough information to locate the problem. I guess if you use 2 GPUs without changing the batch size, it is very likely out of memory.

MiaoZhang0525

comment created time in 2 months

issue closedfacebookresearch/moco

torch.multiprocessing.spawn got error "process 0 terminated with exit code 1"

When I use a node with 2 GPUs to run this code, I got this problem. Could you guys help me to solve this?

Sincerely

closed time in 2 months

MiaoZhang0525

issue closedfacebookresearch/moco

Validation step missing in main_moco.py?

Hey guys thanks a lot for opening up the MOCO code!

I was going through main_moco.py it doesn't really have a validation step. Does adding a simple validation on pretext task make sense, or do authors feel against it and instead suggest using some downstream task only to make the final decision like the KNN-based monitor, as given in paper for ablation of Shuffle-BN?

Adding a validation step should be easy but it is not there so I was curious/

closed time in 2 months

ankuPRK

issue commentfacebookresearch/moco

Validation step missing in main_moco.py?

See #49

ankuPRK

comment created time in 2 months

issue closedfacebookresearch/moco

Code and settings of semantic segmentation task

Hi, I'd like to reproduce the results of semantic segmentation task (VOC and LVIS), but I couldn't find the code and setting files. Do you have any plans to provide them with this repository? Thanks.

closed time in 2 months

yshinya6

issue commentfacebookresearch/moco

Code and settings of semantic segmentation task

We have not planned to provide the semantic segmentation code. We followed the publicly available DeepLab implementation, and the modifications and settings have been specified in the paper.

yshinya6

comment created time in 2 months

issue closedfacebookresearch/moco

Use configurable loggers rather than swallowing print statements

Just a nit, but print statements are used throughout the code rather than configurable loggers. Since the print builtin is overridden on processes other than process 0, this can be surprising to developers. Consider using python's logging module to make this configuration more standardized and clear.

closed time in 2 months

kmatzen

issue commentfacebookresearch/moco

Use configurable loggers rather than swallowing print statements

This repo aims to be a minimal revision on the PyTorch ImageNet training code for illustrating the MoCo idea. Improving and customizing the interface or software design is beyond the scope of this repo. Please feel free to branch and share your version.

kmatzen

comment created time in 2 months

issue closedfacebookresearch/moco

Pre-train time too long.

According to your result, pretrain 200 epochs(Resnet 50 baseline) need 53H in a 8 V100 machine. But the training speed in my 8V100 machine is three/four times slower than this. I don't know why. Maybe the environment configs is different. So, can you release your environment configs? Thanks!

This is pretrain log, 0.6s per batch, 3000s(about 1h) per epoch.

2020-07-16T09:20:06.867Z: [1,0]<stdout>:Epoch: [16][4000/5004]	Time  1.300 ( 0.685)	Data  0.000 ( 0.084)	Loss 1.0633e+00 (1.2471e+00)	Acc@1 100.00 ( 95.40)	Acc@5 100.00 ( 97.76)
2020-07-16T09:20:12.016Z: [1,0]<stdout>:Epoch: [16][4010/5004]	Time  0.309 ( 0.685)	Data  0.000 ( 0.084)	Loss 1.4829e+00 (1.2472e+00)	Acc@1  87.50 ( 95.40)	Acc@5  93.75 ( 97.76)
2020-07-16T09:20:18.283Z: [1,0]<stdout>:Epoch: [16][4020/5004]	Time  1.043 ( 0.685)	Data  0.000 ( 0.084)	Loss 1.1532e+00 (1.2472e+00)	Acc@1  96.88 ( 95.40)	Acc@5  96.88 ( 97.75)
2020-07-16T09:20:24.301Z: [1,0]<stdout>:Epoch: [16][4030/5004]	Time  0.271 ( 0.685)	Data  0.000 ( 0.084)	Loss 1.1201e+00 (1.2469e+00)	Acc@1  96.88 ( 95.40)	Acc@5 100.00 ( 97.75)
2020-07-16T09:20:30.259Z: [1,0]<stdout>:Epoch: [16][4040/5004]	Time  0.413 ( 0.684)	Data  0.000 ( 0.083)	Loss 1.4439e+00 (1.2468e+00)	Acc@1  90.62 ( 95.40)	Acc@5  93.75 ( 97.75)
2020-07-16T09:20:36.487Z: [1,0]<stdout>:Epoch: [16][4050/5004]	Time  0.213 ( 0.684)	Data  0.000 ( 0.083)	Loss 1.1293e+00 (1.2468e+00)	Acc@1  93.75 ( 95.40)	Acc@5 100.00 ( 97.76)
2020-07-16T09:20:42.951Z: [1,0]<stdout>:Epoch: [16][4060/5004]	Time  0.232 ( 0.684)	Data  0.000 ( 0.083)	Loss 1.1727e+00 (1.2470e+00)	Acc@1 100.00 ( 95.40)	Acc@5 100.00 ( 97.75)
2020-07-16T09:20:48.433Z: [1,0]<stdout>:Epoch: [16][4070/5004]	Time  0.260 ( 0.684)	Data  0.000 ( 0.083)	Loss 1.3516e+00 (1.2469e+00)	Acc@1  96.88 ( 95.40)	Acc@5  96.88 ( 97.75)
2020-07-16T09:20:54.556Z: [1,0]<stdout>:Epoch: [16][4080/5004]	Time  0.271 ( 0.684)	Data  0.000 ( 0.083)	Loss 1.0669e+00 (1.2469e+00)	Acc@1  96.88 ( 95.40)	Acc@5 100.00 ( 97.76)
2020-07-16T09:21:01.362Z: [1,0]<stdout>:Epoch: [16][4090/5004]	Time  0.914 ( 0.684)	Data  0.000 ( 0.082)	Loss 1.3178e+00 (1.2468e+00)	Acc@1  90.62 ( 95.40)	Acc@5  96.88 ( 97.75)
2020-07-16T09:21:07.425Z: [1,0]<stdout>:Epoch: [16][4100/5004]	Time  0.215 ( 0.683)	Data  0.000 ( 0.082)	Loss 9.2172e-01 (1.2467e+00)	Acc@1 100.00 ( 95.40)	Acc@5 100.00 ( 97.75)
2020-07-16T09:21:14.707Z: [1,0]<stdout>:Epoch: [16][4110/5004]	Time  0.359 ( 0.684)	Data  0.000 ( 0.082)	Loss 1.3362e+00 (1.2468e+00)	Acc@1  96.88 ( 95.40)	Acc@5  96.88 ( 97.75)
➜  2020-7-16 nvidia-smi
Thu Jul 16 09:41:17 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:05:00.0 Off |                    0 |
| N/A   54C    P0   181W / 250W |   4802MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   56C    P0   109W / 250W |   4810MiB / 32480MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  Off  | 00000000:0D:00.0 Off |                    0 |
| N/A   42C    P0   176W / 250W |   4808MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  Off  | 00000000:13:00.0 Off |                    0 |
| N/A   43C    P0   172W / 250W |   4810MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-PCIE...  Off  | 00000000:83:00.0 Off |                    0 |
| N/A   56C    P0   197W / 250W |   4804MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-PCIE...  Off  | 00000000:89:00.0 Off |                    0 |
| N/A   58C    P0   168W / 250W |   4810MiB / 32480MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-PCIE...  Off  | 00000000:8E:00.0 Off |                    0 |
| N/A   43C    P0    64W / 250W |   4810MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-PCIE...  Off  | 00000000:91:00.0 Off |                    0 |
| N/A   42C    P0   157W / 250W |   4808MiB / 32480MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+

It seems that this problem is caused by pytorch version. This is my running environment:

pytorch1.3.1-py36-cuda10.0-cudnn7.0

closed time in 2 months

shadowyzy

issue commentfacebookresearch/moco

Pre-train time too long.

See also #33

I believe your issue is not particularly related to this repo. Please try running the PyTorch offcial ImageNet training code, which this repo is based, to check the data loading speed of your environment. See #5 for how a similar issue was addressed.

shadowyzy

comment created time in 2 months

issue closedfacebookresearch/moco

Pre-trained models for ResNet50 2x and 4x width

Hi! Thanks for this great code repo. Would it be possible to make available the pre-trained models for ResNet50 2x width and 4x width? These models were used in the original MOCO paper, but it requires a lot of resources to train such wide models.

closed time in 2 months

jlindsey15

issue commentfacebookresearch/moco

Pre-trained models for ResNet50 2x and 4x width

We have no plan to release those models recently as they are outdated comparing the MoCo v2 implementation.

jlindsey15

comment created time in 2 months

issue closedfacebookresearch/moco

Can't use one GPU?

If I want to use one or two GPUs in one server? What should I do?

closed time in 2 months

HymEric

issue commentfacebookresearch/moco

Can't use one GPU?

We exploit the fact that by default BN is split into multiple GPUs in which mean/std is independently computed. To run on one GPU, you may implement BN split along the N (batch) dimension to mimic this effect (see NaiveBatchNorm in Detectron2). You also need to change the lr (e.g., linearly) if you change the batchsize to fit memory. To run on 2 GPUs, try --lr 0.0075 --batch-size 64.

HymEric

comment created time in 2 months

issue closedfacebookresearch/moco

How to valuate during traing

I notice the finnal R@1 after 200epoches in ReadMe is 60%, but there is no code to valuate the model in the repo, only training accuracy.

Can you help me that how to valuate the perfomance during the training?

closed time in 2 months

zjcs

issue commentfacebookresearch/moco

How to valuate during traing

I am not aware of a reliable way of validating during training. You may consider the kNN classifier adopted in the InstDisc paper.

zjcs

comment created time in 2 months

issue closedfacebookresearch/moco

The unsupervised training method in the README breaks

Hi,

Thanks for releasing the code!!! I think the launching method in the README should be updated a bit. I run like this:

python main_moco.py -a resnet50 --lr 0.03 --batch-size 256 --world-size 1 --rank 0 /data2/zzy/imagenet 

And I got the error of:

Traceback (most recent call last):
  File "main_moco.py", line 402, in <module>
    main()
  File "main_moco.py", line 133, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "main_moco.py", line 186, in main_worker
    raise NotImplementedError("Only DistributedDataParallel is supported.")
NotImplementedError: Only DistributedDataParallel is supported.

I think the rank is not correctly assigned. Did I miss anything useful ?

closed time in 3 months

CoinCheung

issue commentfacebookresearch/moco

The unsupervised training method in the README breaks

Please follow the command line in the README: python main_moco.py \ -a resnet50 \ --lr 0.03 \ --batch-size 256 \ --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \ [your imagenet-folder with train and val folders]

Specifically, note --dist-url 'tcp://localhost:10001' --multiprocessing-distributed.

CoinCheung

comment created time in 3 months

issue closedfacebookresearch/moco

Can synchronized batch norm (SyncBN) be used to avoid cheating? Is shuffling BN a must?

Ditto. I kept wondering about SyncBN vs ShuffleBN as to whether the former can effectively prevent cheating. SimCLR appears to be using SyncBN (referred to as "Global BN").

SyncBN is out of the box with PyTorch whereas Shuffling BN requires a bit more hacking. The fact that Shuffling BN is chosen must mean that it is better? (or that SyncBN wasn't ready at the time MoCo was designed?)

closed time in 3 months

w-hc

issue commentfacebookresearch/moco

Can synchronized batch norm (SyncBN) be used to avoid cheating? Is shuffling BN a must?

SyncBN is not sufficient in the case of MoCo: the keys in the queue are still from different batches, so the BN stats in the current batch can serve as a signature to tell which subset the positive key may be in.

Actually, in a typical multi-GPU setting, ShuffleBN is faster than SyncBN. ShuffleBN only requires AllGather twice in the entire network (actually, the input shuffle can be waived if implemented in data loader). SyncBN requires communication per layer.

w-hc

comment created time in 3 months

more