profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/xptree/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Jiezhong Qiu xptree Tsinghua University Beijing jiezhongqiu.com

laekov/fastmoe 253

A fast MoE impl for PyTorch

xptree/DeepInf 226

DeepInf: Social Influence Prediction with Deep Learning

xptree/NetMF 161

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

xptree/NetSMF 109

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

kenchan0226/keyphrase-generation-rl 81

Code for the ACL 19 paper "Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards"

xptree/k-vim 3

vim配置

xptree/BlockBERT 2

Blockwise Self-Attention for Long Document Understanding

xptree/Co-occurrence-Concentration 2

NeurIPS 2020: A Matrix Chernoff Bound for Markov Chains and its Application to Co-occurrence Matrices

startedgoogle/guava

started time in 13 hours

issue openedlaekov/fastmoe

magic number (256) in CUDA functions

There is a magic number (256) in Both in CUDA functions moe_cuda_local_scatter_impl and moe_cuda_local_gather_impl. I cannot understand what it means and not sure if it's a potential bug in fastmoe. Is it related to the parameters of hardwares?

Related codes: batch_scatter_kernel<scalar_t> <<<batch_size, 256, 0, smgr->stream(0)>>>(in_feat, d_pos, input, input_buf);

created time in 3 days

issue closedlaekov/fastmoe

Expert capacity

Hi

I was just wondering if the fastmoe implementation uses the concept of expert capacity as described by the Switch Transformer paper? In other words if we have 8 tokens, 4 experts the expert capacity would be 2 (without considering the capacity factor). So in this scenario if more than 2 tokens gets assigned to a given expert does the token get dropped as in the Switch Transformer formulation? Or does it still get processed in fastmoe?

closed time in 4 days

david-macleod

issue commentlaekov/fastmoe

Expert capacity

OK thanks for the quick reply!

david-macleod

comment created time in 4 days

issue commentlaekov/fastmoe

Expert capacity

This function is still being developed. It will not get processed, similiar to switch, in our design.

david-macleod

comment created time in 4 days

issue openedlaekov/fastmoe

Expert capacity

Hi

I was just wondering if the fastmoe implementation uses the concept of expert capacity as described by the Switch Transformer paper? In other words if we have 8 tokens, 4 experts the expert capacity would be 2 (without considering the capacity factor). So in this scenario if more than 2 tokens gets assigned to a given expert does the token get dropped as in the Switch Transformer formulation? Or does it still get processed in fastmoe?

created time in 4 days

issue closedTHUDM/GCC

About downstream datasets

Hello, I want to run code on the Cora and Citeseer dataset, but I found no downstream dataset named it. So, could you please provide the code for generating downstream datasets? or would you mind offering me the Cora and Citeseer files you have generated? Thanks a million!

closed time in 5 days

flyz1

issue commentTHUDM/GCC

About downstream datasets

OK, thanks for your answer very much!

flyz1

comment created time in 5 days

push eventlaekov/fastmoe

Rich Ho

commit sha 38b334cc26002c3ce06db7662b8fec8863203b7d

test switch gate

view details

push time in 5 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

The version of installed fastmoe is v0.1.2.

I have just solved the problem by switching environment to pytorch1.8+cuda10.2+nccl2.7.8. Maybe it's the cuda10.1 leading to the failure.

Thanks for your help!

ododoyo

comment created time in 5 days

push eventlaekov/fastmoe

Rich Ho

commit sha ddfaaf49858d0f270411bfee537897c8241ef07f

gshard gate test

view details

push time in 5 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I get reminded that fastmoe v0.1.1 does have bugs running with nccl across machines, and it is fixed in v0.1.2. Is your fastmoe v0.1.1 or v0.1.2?

ododoyo

comment created time in 5 days

issue commentTHUDM/GCC

About downstream datasets

Hi @flyz1 ,

Thanks for your interest. This work is not intended to run on datasets like Cora and Citeseer directly because of their attributes (Please see discussions in the paper on this 2.1 Vertex Similarity and 3.3 GCC Design Q1). Combining GCC and a conventional GCN would be interesting but is beyond the scope of this work.

Best,

flyz1

comment created time in 5 days

issue openedTHUDM/GCC

About downstream datasets

Hello, I want to run code on the Cora and Citeseer dataset, but I found no downstream dataset named it. So, could you please provide the code for generating downstream datasets? or would you mind offering me the Cora and Citeseer files you have generated? Thanks a million!

created time in 5 days

push eventlaekov/fastmoe

Rick Ho

commit sha 5a0ba8352ca420e08c2ee6e974da513a522af6a0

add test but cannot pass

view details

push time in 5 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I think the fastmoe model trained with one machine works well and the torch.distributed is also fine across machine. I will rebuild my environment same with yours and try it again. Thanks for your replies~

ododoyo

comment created time in 6 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I do not recognize anything strange from your describe in your environment. Does it run correctly within one machine? Can you try if simply initializing torch.distributed works across machines in your environment?

ododoyo

comment created time in 6 days

issue openedTHUDM/P-tuning

P-tuuning的一些问题

1.你好,我想问一下,在P-tunning中,[Mask]在一众[unused]中得位置是怎么确定的?是人工选择的吗?如果不是的话,是根据什么方式确定的? 2.原论文中写的当数据量比较少的时候用的anchor-word,比如预测“英国首都”,在几个[unused]中加一个[capital]效果会比较好,这个[capital]应该加在哪个位置是如何确定的呢?

created time in 6 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I test it with torch1.7+cuda10.1+nccl2.7.8

  • python version is 3.7 and managed by anaconda
  • torch is installed by pip
  • nccl lib path is added and fastmoe is installed by USE_NCCL=1 python setup.py install with gcc/g++7.3.1

I modified fastmoe code for customized use by just adding some arguments for functions. The modified distributed.py is only conducted with torch.distributed package.

maybe there is something wrong with my environment. would it be the inconsistency of gcc version used to compile pytorch and fastmoe? should I install pytorch from source with gcc of the same version?

ododoyo

comment created time in 6 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I test it with torch1.7+ cuda10.1 + nccl2.7.8

  • the python version is 3.7 and managed by anaconda
  • torch is installed by pip install
  • nccl lib path is added and fastmoe is installed by USE_NCCL=1 python setup.py install.
  • The gcc/g++ version is 7.3.1 Maybe there is something wrong with my environment. Should it be the inconsistency of gcc version used to compile the pytorch and fastmoe?
ododoyo

comment created time in 6 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I tested fastmoe on up to 8 machines without such problem. Can you provide some details on your environment and how you execute your script on multiple machines?

ododoyo

comment created time in 6 days

issue commentTHUDM/Chinese-Transformer-XL

请问何时能在商品数据集合训练一个呢

微调时,“prompt为生成的context,text为生成的内容”,是否可以把上下句,或者右移一位的句子当做输入呢?或者在购物情境中,把用户相邻的两个交互物品的标题放进去呢

sizhongyibanhts

comment created time in 6 days

issue openedTHUDM/Chinese-Transformer-XL

请问何时能在商品数据集合训练一个呢

  • 电商,文化娱乐等领域,请问何时能开放一个呢?

  • 准备在商品名称匹配搜索推荐上用一下,不知百度百科等数据能否适用

created time in 6 days

startedtostercx/GTAO_Booster_PoC

started time in 6 days

issue openedlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

Hi there,

I have installed the fastmoe and the model with distributed experts can be trained successfully in single machine. However, when the experts are distributed on multiple machine, the torch.distributed subprocess will die with <Signals.SIGSEGV: 11>. Have you experimented fastmoe in distributed training with multiple machine?

many thanks~

created time in 7 days

issue openedTHUDM/P-tuning

gpt2-medium LAMA

Hi, i have just used the default params to p-tune the gpt2-medium on LAMA task and the results is as follows. best dev_hit@1: 51.8 best test_hit@1: 44.5 For the results I got, I have some confusions... (1) It seems that there is a gap between the dev results and the test results. Are the dev set and the test set in the same distribution? Is it possible to provide the scipts of generating the train/dev/test sets and the original dataset? (2) The results reported in the paper is 46.5, which is close to the best test_hit@1. Are the results in the paper paper based on the test set? It will be very nice if the shell scipts is provided to reproduce the results in the paper.

created time in 7 days

startedLucas2012/EvolvingGraphicalPlanner

started time in 7 days

issue commentlaekov/fastmoe

What features would you like to see in FastMoE v0.2?

Can MoE be applied to CNN kind network?

laekov

comment created time in 8 days

pull request commentlaekov/fastmoe

Bias improvement #15

@zjujh1995 Hi, could you try running it again? I ran my Megatron now with this fix and it worked, but I was getting code=14 instead of code=13 as you specified.

Thanks for your effort. The problem disappeared with the help of your recent commits.

TiagoMAntunes

comment created time in 8 days

fork qibinc/fast-transformers-1

Pytorch library for fast transformer implementations

fork in 8 days