profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/sgillen/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Sean Gillen sgillen PhD student at UCSB.

sgillen/fractal_rl 6

Code for CORL 2020 paper: Explicitly Encouraging Low Fractional Dimensional Trajectories Via Reinforcement Learning.

sgillen/fractal_mesh 2

Code to analyze policies trained with a reinforcement learning method that encourages low fractional dimension trajectories.

sgillen/LC-Simulation 2

A collection Matlab scripts and Mathematica notebooks that simulate nematic liquid crystals.

sgillen/seagul 2

A utility library for research in robotics and deep reinforcement learning.

sgillen/aq_bars 1

Interfacing a Novint Falcon with a LynxMotion AL5D arm for haptic teleoperation. (It's a fist bumping robot)

sgillen/408i 0

Quick and dirty code to drive a robot that follows a robot that follows a ball.

sgillen/ARS 0

An implementation of the Augmented Random Search algorithm

push eventsgillen/modern_art

Sean Gillen

commit sha 0c982b89f509f6f50605e1d05349ff410c292ec1

add a working model

view details

push time in 3 days

push eventsgillen/modern_art

sgillen

commit sha 7ce7e9488ad684d7652516e5f38fe213f4c61bd4

add preprocessed data set and notebook to train on it, much faster now

view details

push time in 3 days

push eventtyozmen/n_link_arm

sgillen

commit sha 03f8036e3069d18a7d5b09339de77b18f1ef6572

ARS works, PPO learns something but performance is bad

view details

push time in 4 days

push eventtyozmen/n_link_arm

sgillen

commit sha e59258cf92c8b2f2e64748a7265f82958ef5bffd

combine n_link_arm envs into one, refactor to remove eval (5x speedup), move ARS to seperate folder

view details

push time in 5 days

Pull request review commentStable-Baselines-Team/stable-baselines3-contrib

ARS

+import copy+import io+import pathlib+import time+import warnings+from typing import Any, Dict, Optional, Type, Union++import gym+import numpy as np+import torch as th+import torch.nn.utils+from stable_baselines3.common.base_class import BaseAlgorithm+from stable_baselines3.common.policies import BasePolicy+from stable_baselines3.common.type_aliases import GymEnv, MaybeCallback, Schedule+from stable_baselines3.common.utils import get_schedule_fn, safe_mean++from sb3_contrib.ars.policies import ARSPolicy+++class ARS(BaseAlgorithm):+    """+    Augmented Random Search: https://arxiv.org/abs/1803.07055++    :param policy: The policy to train, can be an instance of ARSPolicy, or a string+    :param env: The environment to train on, may be a string if registred with gym+    :param n_delta: How many random pertubations of the policy to try at each update step.+    :param n_top: How many of the top delta to use in each update step. Default is n_delta+    :param alpha: Float or schedule for the step size+    :param sigma: Float or schedule for the exploration noise+    :param policy_kwargs: Keyword arguments to pass to the policy on creation+    :param policy_base: Base class to use for the policy+    :param tensorboard_log: String with the directory to put tensorboard logs:+    :param seed: Random seed for the training+    :param verbose: Verbosity level: 0 no output, 1 info, 2 debug+    :param device: Torch device to use for training, defaults to "cpu"+    :param _init_setup_model: Whether or not to build the network at the creation of the instance+    :param alive_bonus_offset: Constant added to the reward at each step, a value of -1 is used in the original paper+    """++    def __init__(+        self,+        policy: Union[str, Type[ARSPolicy]],+        env: Union[GymEnv, str],+        n_delta: int = 64,+        n_top: Optional[int] = None,+        alpha: Union[float, Schedule] = 0.05,+        sigma: Union[float, Schedule] = 0.05,+        policy_kwargs: Optional[Dict[str, Any]] = None,+        policy_base: Type[BasePolicy] = ARSPolicy,+        tensorboard_log: Optional[str] = None,+        seed: Optional[int] = None,+        verbose: int = 0,+        device: Union[th.device, str] = "cpu",+        _init_setup_model: bool = True,+        alive_bonus_offset: float = 0,+        zero_policy: bool = True,+    ):++        super().__init__(+            policy,+            env,+            learning_rate=0.0,+            tensorboard_log=tensorboard_log,+            policy_base=policy_base,+            policy_kwargs=policy_kwargs,+            verbose=verbose,+            device=device,+            supported_action_spaces=(gym.spaces.Box,),+            support_multi_env=True,+        )++        self.n_delta = n_delta+        self.alpha = get_schedule_fn(alpha)+        self.sigma = get_schedule_fn(sigma)++        if n_top is None:+            n_top = n_delta+        self.n_top = n_top++        self.n_workers = None  # We check this at training time ... I guess??++        if policy_kwargs is None:+            policy_kwargs = {}+        self.policy_kwargs = policy_kwargs++        self.seed = seed+        self.alive_bonus_offset = alive_bonus_offset++        self.zero_policy = zero_policy+        self.theta = None  # Need to call init model to initialize theta++        if (+            _init_setup_model+        ):  # TODO ... what do I do if this is false? am i guaranteed that someone will call this before training?+            self._setup_model()++    @classmethod  # Override just to change the default device argument to "cpu"+    def load(+        cls,+        path: Union[str, pathlib.Path, io.BufferedIOBase],+        env: Optional[GymEnv] = None,+        device: Union[th.device, str] = "cpu",+        custom_objects: Optional[Dict[str, Any]] = None,+        **kwargs,+    ) -> "BaseAlgorithm":+        return super().load(path, env, device, custom_objects, **kwargs)++    def _setup_model(self) -> None:+        self.set_random_seed(self.seed)+        self.rng = np.random.default_rng(self.seed)++        self.policy = self.policy_class(self.observation_space, self.action_space, **self.policy_kwargs)+        self.dtype = self.policy.parameters().__next__().dtype  # This seems sort of like a hack+        self.theta = th.nn.utils.parameters_to_vector(self.policy.parameters()).detach().numpy()++        if self.zero_policy:+            self.theta = np.zeros_like(self.theta)+            theta_tensor = th.tensor(self.theta, requires_grad=False, dtype=self.dtype)+            th.nn.utils.vector_to_parameters(theta_tensor, self.policy.parameters())++        self.n_params = len(self.theta)+        self.policy = self.policy.to(self.device)++    def _collect_rollouts(self, policy_deltas, callback):+        with th.no_grad():+            batch_steps = 0+            theta_idx = 0++            # Generate 2*n_delta candidate policies by adding noise to the current theta+            candidate_thetas = np.concatenate([self.theta + policy_deltas, self.theta - policy_deltas])

It might yes, I generally prefer to keep things in numpy where possible (because I don't need the gradient information, and multiprocessing with torch tensors has gotchas) but there is no good reason to in this context.

sgillen

comment created time in 5 days

PullRequestReviewEvent

pull request commentStable-Baselines-Team/stable-baselines3-contrib

ARS

That'd be nice. A simple table of the final agent performances side-by-side (with the variances included) would be the best case. We can then determine if it is close enough or something needs tweaking. Again, repeating experiments on the environments original paper used is enough (or on a subset of them).

Ok, I'll make the changes @araffin already suggested, and then rerun my own along with the reference on the v2 Mujoco envs.

sgillen

comment created time in 5 days

pull request commentStable-Baselines-Team/stable-baselines3-contrib

ARS

I would disagree with that as adding Discrete support should be fairly easy, see https://github.com/osigaud/stable-baselines3/blob/64baf2125dec791c20c41f4b39ed1dc2ad914010/stable_baselines3/cem/policies.py#L60

Will do

Also, you are not squashing the output for continuous actions (which means that they will be clipped automatically instead), is there a reason? (btw, if you do, you need to update call to parent class)

No good reason, I think I was changing this while debugging performance and never changed it back. The original paper does not squash though, and I believe it may have a small effect on the learning performance. I can check if this is true.

I saw that you overloaded load to default to cpu? Why is that?

This was changed to pass some of the existing tests.I think ARS should default to CPU because it prefers linear policies or small MLPs. So when one creates a policy object it will default to CPU, but when loading will default to "auto" which searches for a CUDA device, this causes a test to fail and I think is not what the user wants. Now that I think about it though I might only need this for the policy not for the agent?

And can't you do everything with torch? (removing the need of numpy)

with the approach I've landed on I certainly can.

I know you are trying to have an efficient implementation by using n_envs and and the candidates in parallel, but I think it would be clearer (and less prone to errors) to use evaluate_policy as in the CEM implementation.

Yes after going through the trouble of making a parallel version I sort of a agree, it won't take long to make the simple version and profile it to see how much performance that leaves on the table.

sgillen

comment created time in 5 days

Pull request review commentStable-Baselines-Team/stable-baselines3-contrib

ARS

+import copy+import io+import pathlib+import time+import warnings+from typing import Any, Dict, Optional, Type, Union++import gym+import numpy as np+import torch as th+import torch.nn.utils+from stable_baselines3.common.base_class import BaseAlgorithm+from stable_baselines3.common.policies import BasePolicy+from stable_baselines3.common.type_aliases import GymEnv, MaybeCallback, Schedule+from stable_baselines3.common.utils import get_schedule_fn, safe_mean++from sb3_contrib.ars.policies import ARSPolicy+++class ARS(BaseAlgorithm):+    """+    Augmented Random Search: https://arxiv.org/abs/1803.07055++    :param policy: The policy to train, can be an instance of ARSPolicy, or a string+    :param env: The environment to train on, may be a string if registred with gym+    :param n_delta: How many random pertubations of the policy to try at each update step.+    :param n_top: How many of the top delta to use in each update step. Default is n_delta+    :param alpha: Float or schedule for the step size+    :param sigma: Float or schedule for the exploration noise+    :param policy_kwargs: Keyword arguments to pass to the policy on creation+    :param policy_base: Base class to use for the policy+    :param tensorboard_log: String with the directory to put tensorboard logs:+    :param seed: Random seed for the training+    :param verbose: Verbosity level: 0 no output, 1 info, 2 debug+    :param device: Torch device to use for training, defaults to "cpu"+    :param _init_setup_model: Whether or not to build the network at the creation of the instance+    :param alive_bonus_offset: Constant added to the reward at each step, a value of -1 is used in the original paper+    """++    def __init__(+        self,+        policy: Union[str, Type[ARSPolicy]],+        env: Union[GymEnv, str],+        n_delta: int = 64,+        n_top: Optional[int] = None,+        alpha: Union[float, Schedule] = 0.05,+        sigma: Union[float, Schedule] = 0.05,+        policy_kwargs: Optional[Dict[str, Any]] = None,+        policy_base: Type[BasePolicy] = ARSPolicy,+        tensorboard_log: Optional[str] = None,+        seed: Optional[int] = None,+        verbose: int = 0,+        device: Union[th.device, str] = "cpu",+        _init_setup_model: bool = True,+        alive_bonus_offset: float = 0,+        zero_policy: bool = True,+    ):++        super().__init__(+            policy,+            env,+            learning_rate=0.0,+            tensorboard_log=tensorboard_log,+            policy_base=policy_base,+            policy_kwargs=policy_kwargs,+            verbose=verbose,+            device=device,+            supported_action_spaces=(gym.spaces.Box,),+            support_multi_env=True,+        )++        self.n_delta = n_delta+        self.alpha = get_schedule_fn(alpha)+        self.sigma = get_schedule_fn(sigma)++        if n_top is None:+            n_top = n_delta+        self.n_top = n_top++        self.n_workers = None  # We check this at training time ... I guess??++        if policy_kwargs is None:+            policy_kwargs = {}+        self.policy_kwargs = policy_kwargs++        self.seed = seed+        self.alive_bonus_offset = alive_bonus_offset++        self.zero_policy = zero_policy+        self.theta = None  # Need to call init model to initialize theta++        if (+            _init_setup_model+        ):  # TODO ... what do I do if this is false? am i guaranteed that someone will call this before training?+            self._setup_model()++    @classmethod  # Override just to change the default device argument to "cpu"+    def load(+        cls,+        path: Union[str, pathlib.Path, io.BufferedIOBase],+        env: Optional[GymEnv] = None,+        device: Union[th.device, str] = "cpu",+        custom_objects: Optional[Dict[str, Any]] = None,+        **kwargs,+    ) -> "BaseAlgorithm":+        return super().load(path, env, device, custom_objects, **kwargs)++    def _setup_model(self) -> None:+        self.set_random_seed(self.seed)+        self.rng = np.random.default_rng(self.seed)++        self.policy = self.policy_class(self.observation_space, self.action_space, **self.policy_kwargs)+        self.dtype = self.policy.parameters().__next__().dtype  # This seems sort of like a hack+        self.theta = th.nn.utils.parameters_to_vector(self.policy.parameters()).detach().numpy()++        if self.zero_policy:+            self.theta = np.zeros_like(self.theta)+            theta_tensor = th.tensor(self.theta, requires_grad=False, dtype=self.dtype)+            th.nn.utils.vector_to_parameters(theta_tensor, self.policy.parameters())++        self.n_params = len(self.theta)+        self.policy = self.policy.to(self.device)++    def _collect_rollouts(self, policy_deltas, callback):+        with th.no_grad():+            batch_steps = 0+            theta_idx = 0++            # Generate 2*n_delta candidate policies by adding noise to the current theta+            candidate_thetas = np.concatenate([self.theta + policy_deltas, self.theta - policy_deltas])+            candidate_returns = np.zeros(candidate_thetas.shape[0])  # returns == sum of rewards+            self.ep_info_buffer = []++            callback.on_rollout_start()+            while theta_idx < candidate_returns.shape[0]:+                policy_list = []++                # We are using a vecenv with n_envs==n_workers. We batch our candidate theta evaluations into vectors+                # of length n_workers+                for _ in range(self.n_workers):+                    theta_tensor = th.tensor(candidate_thetas[theta_idx], dtype=self.dtype)+                    th.nn.utils.vector_to_parameters(theta_tensor, self.policy.parameters())+                    policy_list.append(copy.deepcopy(self.policy))

during profiling I saw that almost all the time is spent in env.step or policy.predict. ARS typically uses linear policies so the amount of data here is actually fairly small. preallocating the list would help a bit, perhaps just initializing with policy_list =[copy.deepcopy(policy) for _ in range(n_workers)], IIRC list comprehensions avoid the extra copies that append can run into.

sgillen

comment created time in 5 days

PullRequestReviewEvent

issue openedsbhackerspace/sbhackerspace.github.io

Update to Wiki

Hello, I've updated the wiki slightly to include information about the Prusa printers and the machine shop. I can't open a pull request for the Wiki so I've opened this issue instead.

Machine-Shop.md (Replaces Bridgeport-Mill.md)

# Machine Shop
We have a machine shop, it contains:

- CF3 22HP CNC (which can be run on request).
- An Atlas lathe
- Bridgeport series I 2hp mill
- Central machinery 30'' shear press 
- Central machinery band saw 
- Craftsman 12'' band saw 
- Drill press
- Assorted bench grinders
- Assorted hand tools

3D-printers.md

# 3D Printers

The Hackerspace has several 3D printers:
* Mendel Max 2 - unknown operational state
* Ultimaker 2 - Up and running
* Peopoly Moai - Up and running
* ZCorp Zprinter 310 - Tested, needs binder
* Prusa i3 Original - 2 Printers up and running

## Ultimaker 2
The Ultimaker 2 uses 1.75mm filament. It can be accessed at ultimaker.internal.sbhackerspace.com and runs octoprint to handle slicing and printing. Octoprint requires a log in using your SBHX credentials. 

## Peopoly Moai
The Peopoly Moai is an SLA printer and uses a UV cure resin to print. Please send any files for printing to info@sbhackerspace.com and they will be printed on Saturdays. 

## Prusa I3
The Prusa I3 uses 1.75mm filament. It can be accessed on the SBHX network at: http://i3.internal.sbhackerspace.com and runs octoprint to handle slicing and printing. Octoprint requires a log in using your SBHX credentials. 

created time in 6 days

PR opened Stable-Baselines-Team/stable-baselines3-contrib

ARS

Description

WIP pull request for ARS. Still some work to be done, but ready for some feedback before moving forward.

A few questions that I would like feedback on

  • I don't plan to support CNN policies for ARS. It is certainly possibly to make ARS support them, but not really what ARS was meant for. Do you think that is alright for now?
  • I only support Box spaces for action and observation, the only other one that I think makes sense to add is dict observations, which I will probably add. But if other spaces should be included let me know.
  • I validate user hyper-parameters when learn is called, and will automatically adjust them if n_delta is not a multiple of n_workers. I could support the case where this condition is not met but it will considerably complicate the code to support a use case that doesn't really provide a benefit and makes the training slower.

Critical feedback on the code quality is definitely welcome, I'm not super happy with the code right now, it seems more complicated than it needs to be, but I think its what makes the most sense for sb3 vec envs. Also, I put a lot of time trying and tried several approaches to make ARS fast with large env# subprocvecenvs but ultimately I find it's considerably faster to run single threaded (1 env dummyvec) and run multiple seeds in parallel. Perhaps in the future I can make an MPI version along with PPO

Context

<!--- Link the related issue here. You can use the syntax closes #100 if this solves the issue #100 -->

Types of changes

<!--- What types of changes does your code introduce? Put an x in all the boxes that apply: -->

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation (update in the documentation)

Checklist:

<!--- Go over all the following points, and put an x in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->

  • [x] I've read the CONTRIBUTION guide (required)
  • [x] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • [x] I have updated the tests accordingly (required for a bug fix or a new feature). - Cnn related tests fail, and probably want to add some more.
  • [x] I have included an example of using the feature (required for new features).
  • [x] I have included baseline results (required for new training algorithms or training-related features). - See Baseline reward curves below
  • [x] I have updated the documentation accordingly. - Results and comments section TODO
  • [ ] I have updated the changelog accordingly (required).
  • [x] I have reformatted the code using make format (required)
  • [x] I have checked the codestyle using make check-codestyle and make lint (required)
  • [x] I have ensured make pytest and make type both pass. (required)

Baseline Reward Curves

The original paper uses v1 versions of the Mujoco envs. getting back to the old V1 envs will require an ancient version of gym and mujoco_py, which IIRC have breaking changes and are not compatible with sb3. But I did train on v2 using hyper params from original paper, using my fork of zoo. Each environment trained with 8 random seeds. Results are marginally worse than the paper, but seem fine to me given the environment version difference and the fact that only 3 seeds are presented for the optimized params in the paper. However If you'd like it should be possible to compare the papers original implementation on the newer envs.

I can also train on v2 for Walker, Ant, and Humanoid, they just take a bit longer, and I'm not sure what the differences between v1 and v2 are there. However perhaps instead it might make sense to ignore Mujoco and to run optuna on the bullet envs and box2d before merging, since I want to do this in any case. I have several workstations to put to this task, but it will still take quite some time.

Results_Swimmer Results_Hopper Results_HalfCheetah

+502 -4

0 comment

10 changed files

pr created time in 6 days

push eventsgillen/rl-baselines3-zoo

sgillen

commit sha b04adbfa0e95d165c66d62d7ac9164ba9e5e3b70

update hyper params to include alive bonus and n_timesteps from paper

view details

push time in 6 days

push eventsgillen/stable-baselines3-contrib

sgillen

commit sha 2acea5f99b1520ff4f83e61b8127e21acb221c2b

add module docs

view details

sgillen

commit sha d97c804d4d124a4a982eb817d4f27957fce55ce7

run formatter

view details

sgillen

commit sha 6705e5d4067412d0a084c1411fa12e5c4da51fd2

fix load and rerun formatter

view details

push time in 6 days

push eventsgillen/stable-baselines3-contrib

sgillen

commit sha fa3f2f923594681122e2843a203e80fd5bc5d986

remove callback from self, remove torch multiprocessing

view details

push time in 6 days

push eventsgillen/stable-baselines3-contrib

sgillen

commit sha f9b019da724a8ae798695a177803fd961ebbfb07

break out dump logs

view details

sgillen

commit sha d749cca2073de48b0bc3759a4e5f82a114d9baf5

rollback so there are now predict workers, some refactoring

view details

push time in 6 days

create barnchsgillen/stable-baselines3

branch : feat/do_rollout

created branch time in 11 days

fork sgillen/stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

fork in 11 days

issue commentgoogle/brax

Replacing gym's Mujoco envs with brax envs

@erikfrey I agree with @benelot list on what to prioritize. They will probably impact training, making the environment slightly harder if anything, but also closer to the original. The contact reward might lead to more pleasing gaits but it's hard to say.

vwxyzjn

comment created time in 11 days

push eventsgillen/stable-baselines3-contrib

sgillen

commit sha 7a4c7811e25c9c965684da99ab1ad948874c4cf2

debug and comment

view details

push time in 11 days

issue commentgoogle/brax

Replacing gym's Mujoco envs with brax envs

@vwxyzjn good to hear about the normalization wrapper, I agree that the normalization and clipping should all be done on the brax side. This makes things awkward with respect to saving and loading environments / agents, since it will make brax a special case for gym, sb3 etc. Related, I also think that if the brax envs aren't going to be extremely fast that it would be to use pybullet.

vwxyzjn

comment created time in 12 days

issue commentgoogle/brax

Replacing gym's Mujoco envs with brax envs

Ok, I was a bit busier than I expected this week, but as promised I did start comparing the ant environments this evening. Here is a notebook I was using that may be useful to anyone else who wants to compare and tweak the envs.

With regards to the observations:

  1. I believe all the state position and velocity information match up. For Mujoco it seems to be: z + quaternion for the torso (5), 8 joint angles, dxyz/drot (6) for torso, 8 more joint velocities, which matches exactly what brax has.
  2. The contact information is where big differences appear. Brax ignores the contact forces for some internal bodies that present in the Mujoco model, this accounts for the difference in observation size (The brax team was already aware of this).
  3. I'm not sure what the ordering for the contact forces is in brax. It doesn't match what mujoco does (see the notebook linked above) and it also doesn't seem to match up with the bodies in env.sys.body_idx.keys().
  4. No matter the ordering , the magnitude of the force and and moment are substantially different, but that may be because of the difference in mass mentioned below.

If the goal is to make as faithful representation of mujoco envs as possible (which IMO it shouldn't necessarily be) then we will at least need to address the following:

  1. The mj ant starts life suspended .75m in the air, the brax ant at .5
  2. mj adds a relatively large amount random noise to its initial state on reset.
  3. Inertial parameters for the two envs are different. Does brax have a way to infer an inertia from geometry? This is what mj does.
  4. Torque limits appear different 300 in brax vs 150 in mj (units? That would be a lot of N*m)
  5. Will need to find out what brax integrator settings are closest to an rk4 with dt = .01.
  6. May also need to tune friction parameters, which will probably need to be done empirically.

TLDR: For the ant the difference in observations is in the ordering and number of contact forces. To make them match exactly we would need to re order the existing forces, and insert some dummy, zeroed elements into the observation. Should also probably adjust the mass, inertia, and torque limits.

vwxyzjn

comment created time in 12 days

push eventsgillen/stable-baselines3-contrib

sgillen

commit sha ea2a2f9c819a810d4f80d26ba71903f29c0d7441

add a few docs and tests, bugfixes for ARS

view details

push time in 13 days

create barnchsgillen/rl-baselines3-zoo

branch : feat/ars

created branch time in 13 days

fork sgillen/rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

https://stable-baselines3.readthedocs.io

fork in 13 days

create barnchsgillen/stable-baselines3-contrib

branch : feat/ars

created branch time in 13 days

delete branch sgillen/stable-baselines3-contrib

delete branch : ars

delete time in 13 days

create barnchsgillen/stable-baselines3-contrib

branch : ars

created branch time in 13 days

fork sgillen/stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code

https://sb3-contrib.readthedocs.io

fork in 13 days