profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/jon-tow/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

EleutherAI/lm-evaluation-harness 123

A framework for few-shot evaluation of autoregressive language models.

jon-tow/cs224n 50

Solutions to CS224n: Natural Language Processing with Deep Learning assignments.

jon-tow/best-download 0

URL downloader supporting checkpointing and continuous checksumming.

jon-tow/impossible-tic-tac-toe 0

A minimax implementation for deciding the best moves in a tic-tac-toe game.

jon-tow/jon-tow.github.io 0

My personal website

jon-tow/lispp 0

The millionth lisp interpreter.

jon-tow/maze-solver 0

A visualization of the basic uniformed search algorithms, depth-first and breadth-first.

jon-tow/mesh-transformer-jax 0

Model parallel transformers in JAX and Haiku

PR opened EleutherAI/lm-evaluation-harness

Implement `TruthfulQA`

NOTES

  • The official implementation caps the number of tokens generated to 50: max_tokens=50. There isn't a nice way to do this from within a Task at this moment.

  • We default to using the same BLEU, ROUGE, and BLEURT metric packages as the original paper which incurs additional dependencies, t5 and bleurt.

TODOS

  • Add support for the automatic metrics, 'GPT-judge' and 'GPT-info', which predict human evaluation of truth and informativeness (respectively) through a fine-tuned GPT-3 model. NOTE: This requires access keys to the corresponding OpenAI Completion API engines (which the authors obviously do not expose). They do provide the data used to fine-tune GPT-3 into GPT-judge and GPT-info, see https://github.com/sylinrl/TruthfulQA#Fine-tuning-GPT-3-for-evaluation. Maybe we could try this?
+369 -1

0 comment

3 changed files

pr created time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha 17c47812f3cb3666411d9f92c4078184d355c968

Implement `TruthfulQA`

view details

push time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha edca8d45307bd6991f1e927dad7fc96bfdcb0cf8

Implement `TruthfulQA`

view details

push time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha 687a8d4aad8244c4e623b00320996f6392d06ad1

Implement `TruthfulQA`

view details

push time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha 2e5a9dbdff9d2493fed7d1278f4535419e17a70a

Implement `TruthfulQA`

view details

push time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha d2bebbfc1ed061f314e333464561c070191d1003

Implement `TruthfulQA`

view details

push time in 2 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha 2a1f9dd265a8f03f63968ace34bf13e8965bec89

Implement `TruthfulQA`

view details

push time in 3 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha 6f22d2b5c66d86d4a0296f726f1a020a4dce0a4a

Implement `TruthfulQA`

view details

push time in 3 days

create barnchjon-tow/lm-evaluation-harness

branch : truthfulqa

created branch time in 3 days

push eventjon-tow/lm-evaluation-harness

Charles Foster

commit sha 3d432b1a265f476fe467f38377ad5f346bb31dec

Merge pull request #4 from EleutherAI/master Update cfoster0 fork

view details

Charles Foster

commit sha f48b119d33c81755619beed45aca218e03387811

Skeleton of SQuADv2. Not yet tested.

view details

Charles Foster

commit sha 10faacda74a0121781b24c1b73bbe9903b0dd752

Added SQuADv2 to __init__.py. Not yet tested.

view details

Charles Foster

commit sha bba6e0e9711e74f6ea17800879007e104e303d76

Passes tests, except for NotImplementedError for request type greedy_until.

view details

Charles Foster

commit sha eb4c8407af4a66649e62f72a3357c3075253a714

Removed unnecessary import.

view details

Leo Gao

commit sha 7b649ded6e0106415c1af692790b3045a3c6673a

Fixes to make greedy_until work

view details

Leo Gao

commit sha 4a64031cedce06de737bd75bd0a6e45464aee3ab

Undo change to test

view details

Charles Foster

commit sha 232c9ab6b9815c4490ac3e9511bba75d89190cdd

Fixes SQuAD v2 metric computation.

view details

Charles Foster

commit sha 884c29fb352a9fc213d1cf739482b7ac3e93c7ce

Bring SQuAD fork up to date with EAI upstream

view details

Charles Foster

commit sha 5be42b4ddfbb1a8010536a8cfdd541b78907a2ea

SQuAD fixed to use loglikelihood API to calculate the probability of an unanswerable question.

view details

Charles Foster

commit sha 538be6da5d129d8970fa8585f92a4cd9df4eee02

Merge pull request #7 from cfoster0/greedyuntil Fork update and long-overdue SQuAD fixes

view details

Charles Foster

commit sha 14dd29c442552102d5bc6e4337ef31dda59a24fd

Fixed calling of loglikelihood within SQuAD task

view details

Leo Gao

commit sha 5643f20f6f8b2870cac820bc1f1f9fade6bcb726

Update superglue winograd has_test_docs

view details

Leo Gao

commit sha caba51e18093d9d41262e2ae7e6082e89fbd3add

Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness

view details

Leo Gao

commit sha cbc5c9c8c2193af3f9d7b876272f926fa7009031

squad: fix aggregation

view details

Leo Gao

commit sha 8de855340c61b1a2c8ada0a80490429139a18360

Rename task

view details

Leo Gao

commit sha 4b133dca18252c8e10e6c13e2b6ce8e502917615

Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness into cfsquad # Conflicts: # lm_eval/tasks/squad.py

view details

Leo Gao

commit sha f984c88e500d03c5566a3819da3eec9a50f3a2d9

Merge pull request #140 from cfoster0/master Implement SQuADv2 evaluation

view details

Leo Gao

commit sha 42659c342b72b31c098be915009bfcb285479d96

Rename hendrycks ethics and math

view details

Leo Gao

commit sha 5aa601f3d406cbb099d5437ec3909539f79f7e34

Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness

view details

push time in 10 days

push eventjon-tow/lm-evaluation-harness

Jonathan Tow

commit sha f299be8f142d7f32f4798ffc6e257aafc0708e40

Add missing colon to function call

view details

push time in 12 days

push eventjon-tow/lm-evaluation-harness

sdtblck

commit sha 4cd4b05cd56103d38f6513feb79032b11eef3238

Create lambada_multilingual.py

view details

sdtblck

commit sha a92d47eb7515b11f2dd7e2652a56f65c928e0ed4

Update __init__.py

view details

sdtblck

commit sha 5b008dc053c5493c731c4150370cb59e746731f3

Update lambada_multilingual.py

view details

sdtblck

commit sha 40bdb0c46b61128e71f9b37c95600f10aca1297e

Update __init__.py

view details

Jonathan Tow

commit sha 69c8345673fa2102bbba919df2b58e11493d909f

Fix `MuTual` loglikelihood request bug

view details

Leo Gao

commit sha 6b6303d4cb3fae65a16a6ae2ec6d6453c2b1c9f5

Stable file order and don't lower

view details

Leo Gao

commit sha 0fec45555a4ad4a53782c84bc08d4e9d9212058f

Add mutual v1 test data

view details

Leo Gao

commit sha 11f0e6d8eaa269a031aa407116d3862198bc5c96

Indicate that it's machine translated in task name and comments

view details

sdtblck

commit sha cf5823cff6ac6cb150a541d7052872b39416ebca

Merge branch 'master' into lambada-multilingual

view details

Leo Gao

commit sha 3473ea80ae808dea477a8ab71120b4a0f22d0191

Merge pull request #206 from jon-tow/mutual-fix Fix `MuTual` loglikelihood request bug

view details

Leo Gao

commit sha a10b05c576d90313babb8f71801ae6834fc668d5

Add missing json import

view details

Leo Gao

commit sha db85d5aea6d86999114ad767c079c709910fda7f

Add missing functools import

view details

sdtblck

commit sha 75519897b9aa454999f880fd84d9280876012913

Don't download lambada if it already exists

view details

Leo Gao

commit sha af0b1e26a8a938509a4034fa84784a492e10095d

Merge pull request #209 from EleutherAI/sdtblck-patch-1 Don't download lambada if it already exists

view details

Stella Biderman

commit sha 93ea091e7d24df77529b131e9f8db6d263eecdb8

Updated readme for clarity

view details

Leo Gao

commit sha f6e7ae258ca08b381ae1aaba692fd68340717430

Update main.py

view details

Leo Gao

commit sha ba3fa9c87e1fa00c9ad7a79fa1739ecc318a04a7

Update README.md

view details

Leo Gao

commit sha adec7faab69641f018527e9127abcdf3a87e560c

Update README.md

view details

Leo Gao

commit sha 592b2a23d208da02dcc7988d86282e9ca0fc2bc9

Merge pull request #210 from EleutherAI/StellaAthena-patch-1 Updated readme for clarity

view details

Leo Gao

commit sha e35386d9ab3ed2f400da718a26fc8ce6c0595a7f

Fix version issue

view details

push time in 12 days

fork jon-tow/mesh-transformer-jax

Model parallel transformers in JAX and Haiku

fork in 18 days

startedEleutherAI/vqgan-clip

started time in a month

startedClashLuke/revlib

started time in a month

startedhaofanwang/awesome-mlp-papers

started time in a month

Pull request review commentEleutherAI/lm-evaluation-harness

Fixes DROP implementation

 def _load_docs(self, docs):                     "id": qa["query_id"],                     "passage": doc["passage"],                     "question": qa["question"],-                    "answers": self.get_answers(qa["answer"]),+                    "answers": self.get_answers(qa),                 }      @classmethod-    def get_answers(cls, answers):-        # NOTE: We wrap every non-`list` answer into a list for uniformity.-        if answers["number"] != "":-            return [str(answers["number"])]-        if answers["spans"] != []:-            return answers["spans"]-        return [" ".join([answers["date"]["day"],-                          answers["date"]["month"],-                          answers["date"]["year"]]).strip()]+    def get_answers(cls, qa):+        answers = []+        answers_set = set()++        candidates = [qa["answer"]] + qa.get("validated_answers", [])+        for candidate in candidates:+            answer = cls.parse_answer(candidate)+            if answer in answers_set:+                continue+            answers_set.add(answer)+            answers.append(answer)++        return answers

They don't seem to "unique-ify" the candidates here. You can remove this class method and create a local variable in _load_docs like candidate_answers = [qa["answer"]] + qa.get("validated_answers", []). Then set the answers doc entry value to be the list comprehension result of calling parse_answer across candidate answers.

silentv0x

comment created time in a month

PullRequestReviewEvent

startedEleutherAI/knowledge-neurons

started time in 2 months

starteddzryk/antarctic-captions

started time in 2 months

fork jon-tow/CommonLoopUtils

CLU lets you write beautiful training loops in JAX.

fork in 2 months

startedsteggie3/goose-dataset

started time in 3 months