profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/andyljones/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Andy Jones andyljones London andyljones.com indie AI safety researcher

andyljones/coolgpus 184

GPU fan control for headless Linux

andyljones/boardlaw 14

Scaling scaling laws with board games.

andyljones/aljpy 4

Andy's common tools

andyljones/driving-test-booker 4

Spots cancellations in the UK driving test booking schedule

andyljones/house-price-map 3

Plots out commute times and house prices

andyljones/ctci-cpp 1

Learning C++ by way of Cracking the Coding Interview problems

andyljones/cule 1

A conda-installable version of NVIDIA's CuLE

andyljones/flatfinder3 1

My third-generation flat finding tool

delete branch andyljones/zonotable

delete branch : dependabot/npm_and_yarn/modules/translators/lodash-4.17.21

delete time in 11 hours

startedandyljones/coolgpus

started time in 13 hours

created repositorytam-borine/LM_interpretability

created time in 15 hours

PR opened andyljones/zonotable

Bump lodash from 4.17.19 to 4.17.21 in /modules/translators

Bumps lodash from 4.17.19 to 4.17.21. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/lodash/lodash/commit/f299b52f39486275a9e6483b60a410e06520c538"><code>f299b52</code></a> Bump to v4.17.21</li> <li><a href="https://github.com/lodash/lodash/commit/c4847ebe7d14540bb28a8b932a9ce1b9ecbfee1a"><code>c4847eb</code></a> Improve performance of <code>toNumber</code>, <code>trim</code> and <code>trimEnd</code> on large input strings</li> <li><a href="https://github.com/lodash/lodash/commit/3469357cff396a26c363f8c1b5a91dde28ba4b1c"><code>3469357</code></a> Prevent command injection through <code>_.template</code>'s <code>variable</code> option</li> <li><a href="https://github.com/lodash/lodash/commit/ded9bc66583ed0b4e3b7dc906206d40757b4a90a"><code>ded9bc6</code></a> Bump to v4.17.20.</li> <li><a href="https://github.com/lodash/lodash/commit/63150ef7645ac07961b63a86490f419f356429aa"><code>63150ef</code></a> Documentation fixes.</li> <li><a href="https://github.com/lodash/lodash/commit/00f0f62a979d2f5fa0287c06eae70cf9a62d8794"><code>00f0f62</code></a> test.js: Remove trailing comma.</li> <li><a href="https://github.com/lodash/lodash/commit/846e434c7a5b5692c55ebf5715ed677b70a32389"><code>846e434</code></a> Temporarily use a custom fork of <code>lodash-cli</code>.</li> <li><a href="https://github.com/lodash/lodash/commit/5d046f39cbd27f573914768e3b36eeefcc4f1229"><code>5d046f3</code></a> Re-enable Travis tests on <code>4.17</code> branch.</li> <li><a href="https://github.com/lodash/lodash/commit/aa816b36d402a1ad9385142ce7188f17dae514fd"><code>aa816b3</code></a> Remove <code>/npm-package</code>.</li> <li>See full diff in <a href="https://github.com/lodash/lodash/compare/4.17.19...4.17.21">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~bnjmnt4n">bnjmnt4n</a>, a new releaser for lodash since your current version.</p> </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+3 -3

0 comment

1 changed file

pr created time in 16 hours

startedandyljones/boardlaw

started time in a day

issue commentandyljones/boardlaw

\lambda constant wrong ?

Well I did some experiments on connect4: base line standard run cpuct=2, good formula, and 100 iteration of 32k games Base line is able to win vs perfect player when it starts, at the end of the run new policy vs old policy is always around 100 % victory for the one that start (that's intended as first player can force a win at connect 4), also it achieves 6-7 errors on solved positions -same run but cpuct=1 (and in general "too low cpuct") usually it end with many draws, didn't test vs perfect player, around 10-11% erros on tests sets -"your formula" with cpuct =1/16 or cpuct =1/4 end in the same way as too low cpuct If you follow the best run the dynamic is always the same:

    1. first many wins and the ration of game won by starting player increase, length of games increase
    1. then you see more en more draws, length of games increase
    1. then nets seems to find the best strategy and the ration of game won by the first player increase again and games length tends to lower down

It's possible that bad formula only drags things and you see only phase 2 after 100 iteration, maybe if run is longer, it will evolve to pahse 3. But usually you also see the loss getting very low, probably lowering too much exploration, hence it gets harder and harder to escape phase 2 (with same parameters). On the contrary wih cpuct=2 loss is around 1 at the end and stays around this value (whereas with low cpuct it goes down to 0.7 also it lowers way faster than with cpuct=2, expected) so maybe you can counter this by exploring more. I wonder if it could work by using epsilon greedy instead of sampling policy as done in https://arxiv.org/pdf/2012.11045.pdf. After all what make the policy good, is only how good is your q values, what ever the way you achieve this. Somehow you can imagine decoupling completly the search policy with one algorithm, producing another policy, used with a diferent algorithm like standard cpuct. I don't know if it's clear ;)

fabricerosay

comment created time in 2 days

issue commentandyljones/boardlaw

\lambda constant wrong ?

I don't know the effect i think it is hard to predict, so I've launched a run to test your configuration. The lower lambda the less you explore so for large N your formula explore more but, if you do the maths, classical formula with c=2 gives higher lambda than yours with c=1/64 for 64 rollouts. But the dynamic of lambda is completly different. If you take into account A(number of actions), then for the classical formula lambda is increasing till N=A then drops to 0 whereas for you formula lambda is incrinsing. So for classical formula the dynamic is closer to usual monte carlo first explore, then exploit. Yet i realized that Grill and all version of puct is rather different than usual puct. (as "local" visits do not enter into the formula) it's strange, but seems to work. I did many kind of implementations and one strange thing is that many of them where flawed in some way, but it was hard to spot as they still did somehow worked.

fabricerosay

comment created time in 2 days

issue openedandyljones/boardlaw

\lambda constant wrong ?

Hello, I belive the formula for \lambda in the policy calculation is wrong (line 95 in mcts cpp and cuda mcts) p.lambda_n = m.c_puct[b]*float(N)/float(N +A); It should be p.lambda_n = m.c_puct[b]*sqrt(float(N))/float(N +A); I think that's why you found it working with cpuct as small as 1/16 as it compensate the float(N) that's too big. It should also change the whole dynamic of puct algorithm.

created time in 3 days

GollumEvent
GollumEvent

startedandyljones/coolgpus

started time in 6 days

startedandyljones/coolgpus

started time in 8 days

startedandyljones/coolgpus

started time in 9 days

fork jcl4/coolgpus

GPU fan control for headless Linux

fork in 9 days

startedandyljones/coolgpus

started time in 10 days

startedandyljones/coolgpus

started time in 10 days

fork cxz/coolgpus

GPU fan control for headless Linux

fork in 10 days

startedandyljones/coolgpus

started time in 10 days

fork dirtysalt/kindlegen-64

64-bit kindlegen for OSX 10.14 and above

fork in 12 days

startedandyljones/reinforcement-learning-discord-wiki

started time in 12 days

GollumEvent
GollumEvent

fork zeta1999/megastep

megastep helps you build 1-million FPS reinforcement learning environments on a single GPU

https://andyljones.com/megastep

fork in 13 days

startedandyljones/coolgpus

started time in 13 days

startedandyljones/reinforcement-learning-discord-wiki

started time in 14 days

startedandyljones/boardlaw

started time in 14 days

startedandyljones/coolgpus

started time in 16 days

startedandyljones/reinforcement-learning-discord-wiki

started time in 16 days

startedandyljones/reinforcement-learning-discord-wiki

started time in 16 days