profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/Sengxian/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Sengxian Sengxian DCST, Tsinghua University Wuhan, China https://www.sengxian.com/

laekov/fastmoe 253

A fast MoE impl for PyTorch

Sengxian/SoulKnightHelper 18

元气骑士联机小工具(支持安卓,iOS)

Sengxian/BaiduLibrary 2

A simple baidu tieba libarary, easy to use and Dama2 connected.

Sengxian/SYZOJ-UI 2

A theme for SYZOJ

saiblo/simple-sandbox-wrapper 1

A simple-sandbox python wrapper, additionally support freeze, thaw, kill operations.

Hunter-CH3/Tabnet 0

A web-mock implementation for HCI project.

saiblo/simple-sandbox-daemon 0

A node daemon for simple-sandbox

Sengxian/code-front 0

front-end of code.insekai.com

issue closedsyzoj/syzoj

评测:本应为WA的但AC了

QQ图片20210129175659

closed time in 16 hours

FZ-c

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

 def add_span(token_type_id, token_ids, is_mask=False):             num_spans,         ) +    def encode_paper(

The codes written here are a bit redundant. Could you make it simpler? Maybe you can generalize different types of entities.

sofyc

comment created time in a day

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

+from cogdl import oagbert+++def test_encode_paper():+    tokenizer, model = oagbert("oagbert-v2")+    title = 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'+    abstract = 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation...'+    authors = ['Jacob Devlin', 'Ming-Wei Chang', 'Kenton Lee', 'Kristina Toutanova']+    venue = 'north american chapter of the association for computational linguistics'+    affiliations = ['Google']+    concepts = ['language model', 'natural language inference', 'question answering']+    # encode paper+    paper_info = model.encode_paper(+        title=title, abstract=abstract, venue=venue, authors=authors, concepts=concepts, affiliations=affiliations, reduction="max"+    )++    assert len(paper_info) == 5+    assert paper_info['text'][0]['type'] == 'Text'+    assert len(paper_info['authors']) == 4+    assert len(paper_info['venue'][0]['token_ids']) == 9+    assert tuple(paper_info['text'][0]['sequence_output'].shape) == (1, 43, 768)

I suggest squeeze the first dimension.

sofyc

comment created time in a day

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

 def add_span(token_type_id, token_ids, is_mask=False):             num_spans,         ) +    def encode_paper(+            self,+            title="",+            abstract="",+            venue="",+            authors=[],+            concepts=[],+            affiliations=[],+            decode_span_type="FOS",+            decode_span_length=0,+            max_seq_length=512,+            mask_propmt_text="",+            reduction="cls",

I suggest using "first" to replace "cls".

sofyc

comment created time in a day

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

 def add_span(token_type_id, token_ids, is_mask=False):             num_spans,         ) +    def encode_paper(

The codes written here are a bit redundant. Could you make it simpler? Maybe you can generalize different types of entities.

sofyc

comment created time in a day

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

+from cogdl import oagbert+++def test_encode_paper():+    tokenizer, model = oagbert("oagbert-v2")+    title = 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'+    abstract = 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation...'+    authors = ['Jacob Devlin', 'Ming-Wei Chang', 'Kenton Lee', 'Kristina Toutanova']+    venue = 'north american chapter of the association for computational linguistics'+    affiliations = ['Google']+    concepts = ['language model', 'natural language inference', 'question answering']+    # encode paper+    paper_info = model.encode_paper(+        title=title, abstract=abstract, venue=venue, authors=authors, concepts=concepts, affiliations=affiliations, reduction="max"+    )++    assert len(paper_info) == 5+    assert paper_info['text'][0]['type'] == 'Text'

The type of paper info should be all uppercased, in accord with other codes.

sofyc

comment created time in a day

Pull request review commentTHUDM/cogdl

[WIP] New Interface for Encoding Paper

 def _convert_token_ids_to_text(self, token_ids):             return self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(token_ids))      def print_oag_instance(-        self,-        input_ids,-        token_type_ids,-        input_masks,-        masked_lm_labels,-        position_ids,-        position_ids_second,-        predictions=None,+            self,

Maybe we use different code formatting settings. I suggest not changing indents here.

sofyc

comment created time in a day

PR opened THUDM/cogdl

encode_paper

Description

<!-- Brief description. Refer to the related issues if existed.-->

Checklist

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [$CATEGORY] (such as [Model], [Doc], [Feature], [Bugfix])
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [ ] Code is well-documented
  • [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • [ ] Related issue is referred in this PR
+350 -72

0 comment

4 changed files

pr created time in a day

push eventsaiblo/saiblo-public-cdn

prnake

commit sha 02ea3780281f6b7e9512cf2d49b1e37a6b32bb04

update: 2.7.6

view details

push time in 2 days

created tagsaiblo/saiblo-public-cdn

tag2.7.6

created time in 2 days

issue closedlaekov/fastmoe

magic number (256) in CUDA functions

There is a magic number (256) in Both in CUDA functions moe_cuda_local_scatter_impl and moe_cuda_local_gather_impl. I cannot understand what it means and not sure if it's a potential bug in fastmoe. Is it related to the parameters of hardwares?

Related codes: batch_scatter_kernel<scalar_t> <<<batch_size, 256, 0, smgr->stream(0)>>>(in_feat, d_pos, input, input_buf);

closed time in 2 days

zjujh1995

issue commentlaekov/fastmoe

magic number (256) in CUDA functions

You can find the meaning of this number in any cuda programming tutorial, e..g. https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/. The number defines how many device threads execute the kernel in parallel.

zjujh1995

comment created time in 2 days

push eventsaiblo/saiblo-public-cdn

prnake

commit sha 48edcf272f04bcf80d73dec54b85838414cbcf74

update: 2.7.5

view details

push time in 2 days

created tagsaiblo/saiblo-public-cdn

tag2.7.5

created time in 2 days

issue openedlaekov/fastmoe

magic number (256) in CUDA functions

There is a magic number (256) in Both in CUDA functions moe_cuda_local_scatter_impl and moe_cuda_local_gather_impl. I cannot understand what it means and not sure if it's a potential bug in fastmoe. Is it related to the parameters of hardwares?

Related codes: batch_scatter_kernel<scalar_t> <<<batch_size, 256, 0, smgr->stream(0)>>>(in_feat, d_pos, input, input_buf);

created time in 3 days

push eventsaiblo/saiblo-public-cdn

prnake

commit sha e337be4c1e5ae435cad25112b8071f6d6d0e9820

update: 2.7.4

view details

push time in 3 days

created tagsaiblo/saiblo-public-cdn

tag2.7.4

created time in 3 days

issue closedlaekov/fastmoe

Expert capacity

Hi

I was just wondering if the fastmoe implementation uses the concept of expert capacity as described by the Switch Transformer paper? In other words if we have 8 tokens, 4 experts the expert capacity would be 2 (without considering the capacity factor). So in this scenario if more than 2 tokens gets assigned to a given expert does the token get dropped as in the Switch Transformer formulation? Or does it still get processed in fastmoe?

closed time in 4 days

david-macleod

issue commentlaekov/fastmoe

Expert capacity

OK thanks for the quick reply!

david-macleod

comment created time in 4 days

issue commentlaekov/fastmoe

Expert capacity

This function is still being developed. It will not get processed, similiar to switch, in our design.

david-macleod

comment created time in 4 days

issue openedlaekov/fastmoe

Expert capacity

Hi

I was just wondering if the fastmoe implementation uses the concept of expert capacity as described by the Switch Transformer paper? In other words if we have 8 tokens, 4 experts the expert capacity would be 2 (without considering the capacity factor). So in this scenario if more than 2 tokens gets assigned to a given expert does the token get dropped as in the Switch Transformer formulation? Or does it still get processed in fastmoe?

created time in 4 days

push eventlaekov/fastmoe

Rich Ho

commit sha 38b334cc26002c3ce06db7662b8fec8863203b7d

test switch gate

view details

push time in 5 days

push eventsaiblo/saiblo-public-cdn

prnake

commit sha 5197a8677c5cc311c7f5f8db1fc00ecc6ac6e9a1

update: 2.7.3

view details

push time in 5 days

created tagsaiblo/saiblo-public-cdn

tag2.7.3

created time in 5 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

The version of installed fastmoe is v0.1.2.

I have just solved the problem by switching environment to pytorch1.8+cuda10.2+nccl2.7.8. Maybe it's the cuda10.1 leading to the failure.

Thanks for your help!

ododoyo

comment created time in 5 days

push eventsaiblo/saiblo-public-cdn

prnake

commit sha 74971616f0accd9d217ad110d78d484a3a3b2cab

update: 2.7.2

view details

push time in 5 days

created tagsaiblo/saiblo-public-cdn

tag2.7.2

created time in 5 days

push eventlaekov/fastmoe

Rich Ho

commit sha ddfaaf49858d0f270411bfee537897c8241ef07f

gshard gate test

view details

push time in 5 days

issue commentlaekov/fastmoe

Does fastmoe support distributed training with multiple machine?

I get reminded that fastmoe v0.1.1 does have bugs running with nccl across machines, and it is fixed in v0.1.2. Is your fastmoe v0.1.1 or v0.1.2?

ododoyo

comment created time in 5 days

push eventlaekov/fastmoe

Rick Ho

commit sha 5a0ba8352ca420e08c2ee6e974da513a522af6a0

add test but cannot pass

view details

push time in 5 days