A while ago I posted about a Reinforcement Learning framework I developed using Evolutionary Strategies. However, at the time, I was unable to open-source it because it was made for a project. Now I have finally gotten approval and released it to the public.
Skip the explanation: https://github.com/ob-trading/portable-es
The name comes from the fact that it's low on dependencies and easy to distribute across a compute cluster.
It's a framework based on a paper by OpenAI, however at this point we have far increased the functionality described in the paper. We also have a roadmap for other features which are planned to be implemented, such as ESAC.
Show me the money!
While I can't share the results using our internal project we have provided benchmarks for common OpenAI Gym Enviroments using different optimizers:
CartPole-v1 (GRU, 367 parameters, 25 epochs):
1. Novograd v1 (Nov 2020; 497.6, σ=18.76) 2. AdaBelief (Oct 2020; 484.1, σ=41.00) 3. Adam v2 (Oct 2020; 220.9, σ=82.99) 4. AdaMM v2 (Oct 2020; 129.0, σ=112.14) 5. Adam v1 (Sep 2020; 97.6, σ=61.92) 6. AdaMM v1 (Sep 2020; 68.4, σ=39.49) 7. Radam v1 (Oct 2020; 48.6, σ=52.29)
Acrobot-v1 (GRU, 603 parameters; 25 epochs):
1. AdaBelief (Oct 2020; -82.1, σ=26.26) 2. Adam v2 (Oct 2020; -84.9, σ=34.51) 3. Novograd v1 (Nov 2020; -86.3, σ=42.03) 4. AdaMM v2 (Oct 2020; -102.0, σ=73.68) 5. RAdam v1 (Oct 2020; -211.6, σ=182.32)
Also check out the original article: https://blog.devdroplets.com/diving-into-evolutionary-strategies/, it contains a lot of information & also some cool visuals all created with this framework.
The main focus of the project is creating an efficient & easy to use/'hack' evolutionary strategy (ES) implementation.
The main advantage ES is that it's very easy to distribute across CPUs & networks, because the source of progression is a RNG. However it wouldn't be very useful if the framework was hard to use, which is why I made sure it has a relatively simple/extendable setup. We have also made it possible to dynamically add/remove workers from a setup, this is a feature provided by the distributed-worker library.
We've made efforts to optimize the training time using RandomNumberGenerator-seeds to transfer data of the network. We even attempted to cache RNG outputs, however the latency between (de-)serializing took longer than running the RNG itself; that's how fast it is.
Another focus was reusing PyTorch primitives, as one of the most popular deep learning frameworks there is a lot to gain from making it compatible with PyTorch. With this CUDA support was also trivial as PyTorch has native support for it.
The one non-critical part of our implementation is the integration of Tensorboard, we have it enabled by default. This is mainly done to speed up development when starting with the project, we may make this more modular in the future.
- Dynamic distribution
- Add/remove worker nodes over networks
- No additional code required
- Infinitely Distributable
- CPU & GPU possible
- Easy setup
- Get going in less time than a regular PyTorch setup
- Reuse PyTorch model/architectures
- Optimized for quick training
- Very simplistic implementation
- Extendable grid search