profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/wcshin-git/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Woncheol Shin wcshin-git Seoul swc1905@kaist.ac.kr Studying medical AI for M.S. in Graduate School of AI at KAIST, 2020~

jiyounglee-0523/reinforcement-learning-stanford 0

🕹️ CS234: Reinforcement Learning, Winter 2019 | YouTube videos 👉

wcshin-git/Awesome-Text-to-Image 0

A Survey on Text-to-Image Generation/Synthesis.

wcshin-git/chexpert-labeler 0

CheXpert NLP tool to extract observations from radiology reports.

wcshin-git/DALLE-pytorch-forked 0

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

wcshin-git/Effective-Python 0

Learn and summarize 'Effective Python'

wcshin-git/Effective_Python 0

파이썬 코딩의 기술 개정2판(Effective Python, 2nd) 소스 코드입니다

wcshin-git/GNNPapers 0

Must-read papers on graph neural networks (GNN)

wcshin-git/graphtransformer 0

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

wcshin-git/minGPT 0

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

fork wcshin-git/Awesome-Text-to-Image

A Survey on Text-to-Image Generation/Synthesis.

fork in a month

startedSeanNaren/minGPT

started time in 2 months

issue commentlucidrains/performer-pytorch

FastAttention doesn't give results in agreement with standard attention?

Hi @simonaxelrod, I tried the above code using SLiM performer code which is written by the original 'Performer' authors. And also it is written in Pytorch, so I could try it easily.

import torch
import numpy as np
from slim_performer_model import MultiHeadAttention

batch = 1
num_nodes = 24 # seq_len
feat_dim = 64
n_heads = 1

num_its = 5
errs = []

for _ in range(num_its):


    # fast attention
    x = torch.randn((batch, num_nodes, feat_dim))  # x: [B, seq_len, feat_dim]
    attn = MultiHeadAttention(feature_type='favor+', n_heads=n_heads, hidden_dim=feat_dim, compute_type='iter')
    rfs = attn.sample_rfs(x.device)  # [n_heads, feat_dim, feat_dim]
    fast = attn.full_forward(x ,rfs)  # x: [B, seq_len, feat_dim] -> fast: [B, seq_len, feat_dim]

    # '_get_original_qkv' method is temporarily made by me to get the same Q,K,V (not in the original 'MultiHeadAttention')
    Q, K ,V = attn._get_original_qkv(x)  # -> Q, K ,V: [B, seq_len, feat_dim]



    
    # standard attention
    A = torch.einsum('bid, bjd -> bij', Q, K) / feat_dim ** 0.5 # [B, seq_len, seq_len]
    A = torch.nn.Softmax(dim=-1)(A)
    slow = torch.einsum('bij, bjd-> bid',  A, V)  # [B, seq_len, feat_dim]


    err = (abs(slow - fast).mean() / abs(slow).mean() * 100).item()
    
    errs.append(err)

mean_err = np.mean(errs)
std_err = np.std(errs)

print("Error is (%.2f +/- %.2f)%%" % (mean_err, std_err)) # Error is (130.53 +/- 2.27)%

But the error is (130.53 +/- 2.27)%. I don't know why we're getting very large errors... @lucidrains, @simonaxelrod, how do you guys think? Is this normal?

simonaxelrod

comment created time in 3 months

issue commentlucidrains/DALLE-pytorch

horovod and OpenAI's Pretrained VAE are incompatible.

Yes, horovod work with taming when I ran horovodrun -np 1 python train_dalle.py --image_text_folder path/to/data --taming --distributed_backend horovod. Even OpenAI's dVAE works when I don't use horovod.

wcshin-git

comment created time in 3 months

push eventwcshin-git/DALLE-pytorch-forked

Romain Beaumont

commit sha 2827c2a1f9247639030a3defd7626c7ad73e0d50

Fix for VQGAN after update of taming transformer dependency (#329) in the v3 of their paper, taming transformers authors changed the implementation of the VectorQuantizer see https://github.com/CompVis/taming-transformers/commit/04c8ad6c0fa4650e3d600faee793515c4aa2658c#diff-92ba76a137f5007f09ae7afd810f7bda5bb8f85d2089658122061ee458155c62R31 (VectorQuantizer2 is imported as Vector Quantizer) As a consequence, the shape of the indices returned by self.model.encode changed slightly This commit adapts for this change.

view details

Phil Wang

commit sha 01e402e4001d8075004c85b07b12429b8a01e822

0.14.1

view details

push time in 3 months

issue commentlucidrains/DALLE-pytorch

horovod and OpenAI's Pretrained VAE are incompatible.

Oh, really? I'm using RTX 3090 and when I try to use horovod with OpenAidVAE, I get an error like that...

wcshin-git

comment created time in 3 months

issue commentgoogle-research/google-research

Performer: `jnp.max` used in `nonnegative_softmax_kernel_feature_creator`

Hi Jinwoo! Thanks for the clear explanations! Applied to Q: Line 102 makes Q', but when calculating attention later, Q' is also in the denominator, so it doesn't affect the result. BTW, I think line 106 also not affecting the results. We are using the maximum value of all elements(specifically, along axis=last_dims_t + attention_dims_t) in the data_dash tensor. And K' is also in the denominator(Q'((K')T1)) when calculating attention later. Am I mistaken?

wcshin-git

comment created time in 3 months

issue openedlucidrains/DALLE-pytorch

horovod and OpenAI's Pretrained VAE are incompatible.

Hi! I think horovod and OpenAI's Pretrained VAE are incompatible. When I ran horovodrun -np 1 python train_dalle.py --image_text_folder path/to/data --distributed_backend horovod, but the following error occurred.

[0]<stderr>:Traceback (most recent call last):
[0]<stderr>:  File "train_dalle.py", line 482, in <module>
[0]<stderr>:    (distr_dalle, distr_opt, distr_dl, distr_scheduler) = distr_back[0]<stderr>:te(
[0]<stderr>:  File "/home/wcshin/DALLE-pytorch-forked/dalle_pytorch/distributed_[0]<stderr>:tributed_backend.py", line 145, in distribute
[0]<stderr>:    return self._distribute(
[0]<stderr>:  File "/home/wcshin/DALLE-pytorch-forked/dalle_pytorch/distributed_[0]<stderr>:ovod_backend.py", line 49, in _distribute
[0]<stderr>:    self.backend_module.broadcast_parameters(
[0]<stderr>:  File "/home/wcshin/anaconda3/envs/tmp_dalle/lib/python3.8/site-pac[0]<stderr>:d/torch/functions.py", line 53, in broadcast_parameters
[0]<stderr>:    handle = broadcast_async_(p, root_rank, name)
[0]<stderr>:  File "/home/wcshin/anaconda3/envs/tmp_dalle/lib/python3.8/site-pac[0]<stderr>:d/torch/mpi_ops.py", line 718, in broadcast_async_
[0]<stderr>:    return _broadcast_async(tensor, tensor, root_rank, name)
[0]<stderr>:  File "/home/wcshin/anaconda3/envs/tmp_dalle/lib/python3.8/site-pac[0]<stderr>:d/torch/mpi_ops.py", line 623, in _broadcast_async
[0]<stderr>:    function = _check_function(_broadcast_function_factory, tensor)
[0]<stderr>:  File "/home/wcshin/anaconda3/envs/tmp_dalle/lib/python3.8/site-pac[0]<stderr>:d/torch/mpi_ops.py", line 87, in _check_function
[0]<stderr>:    raise ValueError('Tensor is required to be contiguous.')
[0]<stderr>:ValueError: Tensor is required to be contiguous.

Thank you!

created time in 3 months

push eventwcshin-git/DALLE-pytorch-forked

r.beaumont

commit sha 9525be9672d3b0d9e55646f0b1f3898efc2f85bc

Save dalle.pt with deepspeed also a small fix for the dalle output file, somehow I removed the .pt addition in previous pr also save once initially to make sure saving is working

view details

Phil Wang

commit sha 1335a1b383f5d2b34f1fc95d45f6fc30ad0376d4

Merge pull request #277 from wcshin-git/main fix horovod and lr decay issue

view details

robvanvolt

commit sha 37afafdb30bc7613a11564dcd3f744bf2f4fead9

Added support for webdataset

view details

robvanvolt

commit sha 6f60c6bd52d1800f474184548cd67c10615bb9af

Removed unnecessary comments.

view details

robvanvolt

commit sha 80508ffc6222e8c124de761762e24abbd34557c4

Added WebDataset to setup.py

view details

Romain Beaumont

commit sha 58c6035c43d54f670755dba8e769ab31213c75b0

add if for deepspeed optimizations

view details

Romain Beaumont

commit sha 914715a3e03987878d0584859fa69dfcbc8664a5

Some changes in comments for deepspeed saving to improve clarity

view details

afiaka87

commit sha e26ee585fd77c7295639194381bf0465a7b6cc66

Easily enable apex O1 amp from train_dalle.py CogView sidestepped needing to "tame" 16 bit precision by just using "O2" (almost 16 bit) precision. I tried implementing O2 and I think it's rather possible but there are some casts from Long to Float which need to happen for it to work. The O1 case works very well out as is however. In my case (RTX 2070) - I go from ~16 samples per second to ~40 samples per second. using fp16 instead I get around ~50 samples. So it's faster but not necessarily by much. I assume this benchmark changes drastically when the comms latency of a multi-node system needs to be considered and my intuition is that sending half as many bits across the wire can make something like optimizer offloading and allreduce more effective. tl;dr automatic mixed precision is slower than fp16, but you'll be able to see generations during training and won't have to deal with potential divergence issues caused by training fp16.

view details

afiaka87

commit sha 29fa3b538875a45362b7af4fb54dd5dcd7580fef

Include section about apex amp in readme.

view details

afiaka87

commit sha dfb1bc0b2fa123de591bf15b901bd910185148b8

Update README.md

view details

afiaka87

commit sha 19cbf169efa4e137da548d69ad315e6a0f5ea506

Fix section regard 16 bit precision

view details

afiaka87

commit sha 4babfda687b9a12d1e019e753969c2f70e141bb0

Create install_apex.sh

view details

afiaka87

commit sha 135b1c27012e5bac052aca29239e66bcc760c89e

Just give users a bash script instead

view details

afiaka87

commit sha ae3a895f223920249bc6e574c18c35c9f9838a83

Fix tabs mixed with spaces

view details

afiaka87

commit sha df3ddea1e4fdf8a530bde6a0a6e0f308b4f6fbb1

Fix typo

view details

Phil Wang

commit sha 80996978cbb7390f981a0832d972e4d8f5bae945

Merge pull request #256 from rom1504/deepspeed_fix Deepspeed fix : save the normal model too

view details

Phil Wang

commit sha 69375005aad0ac84aec0ead23fede6640e0eb5a4

Merge pull request #284 from afiaka87/patch-13 Easily enable apex O1 amp from train_dalle.py

view details

r.beaumont

commit sha 6ea2a7f83186eff6ea9fbe313f2e931abee9df5a

Add an option to keep only N deepspeed checkpoints Very useful to avoid filling up the disk with hundred of GBs of checkpoints

view details

afiaka87

commit sha 7779bd1770e898558cfddbdf2a0f67c68ad8f918

Expose DeepSpeeds built-in gradient accumulation

view details

Phil Wang

commit sha 77da3edf4e76ba614fdf8bb57c7455ede104858c

Merge pull request #289 from afiaka87/patch-14 Expose DeepSpeeds built-in gradient accumulation

view details

push time in 3 months

issue openedgoogle-research/google-research

Performer: `jnp.max` used in `nonnegative_softmax_kernel_feature_creator`

Hi! Thanks for the amazing works! I have one question in nonnegative_softmax_kernel_feature_creator. I don't understand why jnp.max terms are used in nonnegative_softmax_kernel_feature_creator as below. I cannot find the corresponding equations in the original paper. Is it just for normalization? Is there a mathematical background? Thank you!

https://github.com/google-research/google-research/blob/c249ee982c9ca3bb0cca4788758435c87c71fc7d/performer/fast_attention/jax/fast_attention.py#L101-L109

created time in 3 months

issue commentlucidrains/performer-pytorch

FastAttention doesn't give results in agreement with standard attention?

All right, I'm curious, too:)

simonaxelrod

comment created time in 3 months

issue commentlucidrains/performer-pytorch

FastAttention doesn't give results in agreement with standard attention?

@simonaxelrod Does this kind of large error occur when experimenting with the original code written in Jax from Google?

simonaxelrod

comment created time in 3 months

fork wcshin-git/Effective-Python

Learn and summarize 'Effective Python'

fork in 3 months

fork wcshin-git/080235

파이썬 코딩의 기술 개정2판(Effective Python, 2nd) 소스 코드입니다

fork in 3 months

PR opened baeseongsu/test

beta
+1 -0

0 comment

1 changed file

pr created time in 3 months

push eventwcshin-git/test

wcshin

commit sha 053dabb6d62e295f330b7110517dcf0bc074fcfd

beta

view details

push time in 3 months

PR opened baeseongsu/test

second edit
+1 -0

0 comment

1 changed file

pr created time in 3 months

push eventwcshin-git/test

wcshin

commit sha aa7684c033362d972f8700008131c5a958e768a5

second edit

view details

push time in 3 months

fork wcshin-git/minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

fork in 3 months