GPU fan control for headless Linux
Scaling scaling laws with board games.
andyljones/charrnnexperiments 5
Exploring RNNs
Andy's common tools
andyljones/drivingtestbooker 4
Spots cancellations in the UK driving test booking schedule
Plots out commute times and house prices
andyljones/andyljones.github.io 1
Personal blog
Learning C++ by way of Cracking the Coding Interview problems
A condainstallable version of NVIDIA's CuLE
My thirdgeneration flat finding tool
delete branch andyljones/zonotable
delete branch : dependabot/npm_and_yarn/modules/translators/lodash4.17.21
delete time in 11 hours
startedandyljones/coolgpus
started time in 13 hours
PR opened andyljones/zonotable
Bumps lodash from 4.17.19 to 4.17.21. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/lodash/lodash/commit/f299b52f39486275a9e6483b60a410e06520c538"><code>f299b52</code></a> Bump to v4.17.21</li> <li><a href="https://github.com/lodash/lodash/commit/c4847ebe7d14540bb28a8b932a9ce1b9ecbfee1a"><code>c4847eb</code></a> Improve performance of <code>toNumber</code>, <code>trim</code> and <code>trimEnd</code> on large input strings</li> <li><a href="https://github.com/lodash/lodash/commit/3469357cff396a26c363f8c1b5a91dde28ba4b1c"><code>3469357</code></a> Prevent command injection through <code>_.template</code>'s <code>variable</code> option</li> <li><a href="https://github.com/lodash/lodash/commit/ded9bc66583ed0b4e3b7dc906206d40757b4a90a"><code>ded9bc6</code></a> Bump to v4.17.20.</li> <li><a href="https://github.com/lodash/lodash/commit/63150ef7645ac07961b63a86490f419f356429aa"><code>63150ef</code></a> Documentation fixes.</li> <li><a href="https://github.com/lodash/lodash/commit/00f0f62a979d2f5fa0287c06eae70cf9a62d8794"><code>00f0f62</code></a> test.js: Remove trailing comma.</li> <li><a href="https://github.com/lodash/lodash/commit/846e434c7a5b5692c55ebf5715ed677b70a32389"><code>846e434</code></a> Temporarily use a custom fork of <code>lodashcli</code>.</li> <li><a href="https://github.com/lodash/lodash/commit/5d046f39cbd27f573914768e3b36eeefcc4f1229"><code>5d046f3</code></a> Reenable Travis tests on <code>4.17</code> branch.</li> <li><a href="https://github.com/lodash/lodash/commit/aa816b36d402a1ad9385142ce7188f17dae514fd"><code>aa816b3</code></a> Remove <code>/npmpackage</code>.</li> <li>See full diff in <a href="https://github.com/lodash/lodash/compare/4.17.19...4.17.21">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~bnjmnt4n">bnjmnt4n</a>, a new releaser for lodash since your current version.</p> </details> <br />
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
<details> <summary>Dependabot commands and options</summary> <br />
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the Security Alerts page.
</details>
pr created time in 16 hours
create barnchandyljones/zonotable
branch : dependabot/npm_and_yarn/modules/translators/lodash4.17.21
created branch time in 16 hours
startedandyljones/boardlaw
started time in a day
issue commentandyljones/boardlaw
Well I did some experiments on connect4: base line standard run cpuct=2, good formula, and 100 iteration of 32k games Base line is able to win vs perfect player when it starts, at the end of the run new policy vs old policy is always around 100 % victory for the one that start (that's intended as first player can force a win at connect 4), also it achieves 67 errors on solved positions same run but cpuct=1 (and in general "too low cpuct") usually it end with many draws, didn't test vs perfect player, around 1011% erros on tests sets "your formula" with cpuct =1/16 or cpuct =1/4 end in the same way as too low cpuct If you follow the best run the dynamic is always the same:

 first many wins and the ration of game won by starting player increase, length of games increase

 then you see more en more draws, length of games increase

 then nets seems to find the best strategy and the ration of game won by the first player increase again and games length tends to lower down
It's possible that bad formula only drags things and you see only phase 2 after 100 iteration, maybe if run is longer, it will evolve to pahse 3. But usually you also see the loss getting very low, probably lowering too much exploration, hence it gets harder and harder to escape phase 2 (with same parameters). On the contrary wih cpuct=2 loss is around 1 at the end and stays around this value (whereas with low cpuct it goes down to 0.7 also it lowers way faster than with cpuct=2, expected) so maybe you can counter this by exploring more. I wonder if it could work by using epsilon greedy instead of sampling policy as done in https://arxiv.org/pdf/2012.11045.pdf. After all what make the policy good, is only how good is your q values, what ever the way you achieve this. Somehow you can imagine decoupling completly the search policy with one algorithm, producing another policy, used with a diferent algorithm like standard cpuct. I don't know if it's clear ;)
comment created time in 2 days
issue commentandyljones/boardlaw
I don't know the effect i think it is hard to predict, so I've launched a run to test your configuration. The lower lambda the less you explore so for large N your formula explore more but, if you do the maths, classical formula with c=2 gives higher lambda than yours with c=1/64 for 64 rollouts. But the dynamic of lambda is completly different. If you take into account A(number of actions), then for the classical formula lambda is increasing till N=A then drops to 0 whereas for you formula lambda is incrinsing. So for classical formula the dynamic is closer to usual monte carlo first explore, then exploit. Yet i realized that Grill and all version of puct is rather different than usual puct. (as "local" visits do not enter into the formula) it's strange, but seems to work. I did many kind of implementations and one strange thing is that many of them where flawed in some way, but it was hard to spot as they still did somehow worked.
comment created time in 2 days
issue openedandyljones/boardlaw
Hello,
I belive the formula for \lambda in the policy calculation is wrong (line 95 in mcts cpp and cuda mcts)
p.lambda_n = m.c_puct[b]*float(N)/float(N +A);
It should be
p.lambda_n = m.c_puct[b]*sqrt(float(N))/float(N +A);
I think that's why you found it working with cpuct as small as 1/16 as it compensate the float(N) that's too big. It should also change the whole dynamic of puct algorithm.
created time in 3 days
startedandyljones/coolgpus
started time in 6 days
startedandyljones/coolgpus
started time in 8 days
startedandyljones/coolgpus
started time in 9 days
startedandyljones/coolgpus
started time in 10 days
startedandyljones/coolgpus
started time in 10 days
startedandyljones/coolgpus
started time in 10 days
64bit kindlegen for OSX 10.14 and above
fork in 12 days
startedandyljones/reinforcementlearningdiscordwiki
started time in 12 days
fork zeta1999/megastep
megastep helps you build 1million FPS reinforcement learning environments on a single GPU
https://andyljones.com/megastep
fork in 13 days
startedandyljones/coolgpus
started time in 13 days
startedandyljones/reinforcementlearningdiscordwiki
started time in 14 days
startedandyljones/boardlaw
started time in 14 days
startedandyljones/coolgpus
started time in 16 days
startedandyljones/reinforcementlearningdiscordwiki
started time in 16 days
startedandyljones/reinforcementlearningdiscordwiki
started time in 16 days