• Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

Giuseppe Bonaccorso

Artificial Intelligence – Machine Learning – Data Science

  • Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

OpenAI-Gym Cartpole-v0 LSTM experiment with Keras (Theano)

08/14/2016 Artificial Intelligence Deep Learning Keras Machine Learning Neural networks Python Software 5 Comments

OpenAI-Gym evaluation page: https://gym.openai.com/evaluations/eval_JxPKNwd1QjaofWkaE4aLfQ.

Below there’s the whole Gist page containing the full Python code. It has been developed and tested with Theano/GPU support, but it can easily work with CPU-only support. Any comment or suggestion is welcome!

See also:

CIFAR-10 image classification with Keras ConvNet – Giuseppe Bonaccorso

CIFAR-10 is a small image (32 x 32) dataset made up of 60000 images subdivided into 10 main categories. Check the web page in the reference list in order to have further information about it and download the whole set.

Share:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Skype (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

You can also be interested in these articles:

deep learningkeraslstmmachine learningopenai

Deep learning, God and Zen emptiness

Neural artistic style transfer experiments with Keras

5 thoughts on “OpenAI-Gym Cartpole-v0 LSTM experiment with Keras (Theano)”

  1. YongDuk
    08/19/2016 at 4:55

    A comment or maybe a question.
    Your setting is as follows:
    training_threshold=1.5
    step_threshold=0.5
    At line 55, if prev_norm > 1.5, then prev_norm cannot be smaller than 0.5, so the if-statement in line 59 is always false, therefore the code always executes the line 62.
    The code always choose the inverse action of the previous action in the case that pre_norm > 1.5. Then the LSTM learns the inverse action only when norm(prev_observation) > 1.5, and otherwise it does not have to learn any.

    Maybe, my understanding is wrong and I missed something.
    Above all, I found your algorithm very interesting, and could not help but asking why you chose those two thresholds 1.5 and 0.5.

    In addition, may I ask the theoretical background of the approach? Any related reference is also very welcome!
    Thanks.

    Reply
    • Giuseppe Bonaccorso
      08/19/2016 at 10:41

      First of all, thanks for your comment. Your consideration is correct and depends on the fact that I’ve made several hyperparameter adjustments. My initial idea is based on the concept that a reward (+1) is given only when the pole is almost vertical and there’s no difference until it overcomes the limits. So I thought to create a feedback mechanism (LSTM) that should preserve the target state keeping the oscillations under a very strict threshold.

      Indeed, the LSTM has to learn little. The only important corrections happen only when a state is reaching (or has overcome) a certain limit (in terms of positions and speeds). So the learning steps happen only in those cases and doesn’t reinforce an output that is already correct. Of course this is a toy-problem, with several strange restrictions (for example, about reward) and my approach cannot be the best one at all. I’m still studying other non-conventional ways to optimize those kind of problems.

      There’s no theoretical background but what about LSTM and RL. I’d like to write a short paper showing some results, but I’m still collecting data.

      Reply
  2. Kurt Peek
    08/20/2016 at 16:24

    Hi Giuseppe,

    Impressive result! According to the OpenAI gym evaluation (https://gym.openai.com/evaluations/eval_JxPKNwd1QjaofWkaE4aLfQ), your algorithm takes 0 episodes to solve – that is, it manages to balance the cart pole at the first try, without ever falling.

    Do I interpret this result correctly and if so, can you perhaps explain how this works at a high level? How can the system ‘learn’ without ever receiving any (negative) reinforcement?

    Reply
    • Giuseppe Bonaccorso
      08/20/2016 at 16:56

      Hi Kurt,

      with current (and default – after some tuning) parameters, there are only corrections after a negative reinforcement. The system develops a high level of inertia keeping the norm of each observation close to a very small value (less than 1.5, as default condition). Of course, it’s ok if the cart stops in a certain position different than 0, making no movements (that means that the two speed components are almost null while there’s a constant “bias” due to the position).

      Considering that the real goal is to reduce any oscillation (so both speed components must as smaller as possible), the LSTM will train its memory so to reduce the entity of all movements. If you stop the monitor and let the system evolve for a longer time, you can see that during the first steps it can easily fail after 500-600 steps (so the episode is solved, but maybe the cart reaches one of the boundaries), but, starting from the second or third attempt, it’ll remain still to a stable position indefinitely.

      Try to change the values of both parameters: it can be interesting to compare different results!

      Reply
      • Giuseppe Bonaccorso
        08/20/2016 at 17:01

        An addendum: I’d like to test this solution with a different starting condition (unfortunately I can’t…). In those cases, I’m afraid that the learning speed becomes much slower and maybe it’s even impossible to reach a convergence keeping the “done” flag on.

        Reply

Leave a Reply Cancel reply

Follow Me

  • linkedin
  • twitter
  • facebook
  • googlescholar
  • youtube
  • github
  • amazon
  • medium

Search articles

Latest blog posts

  • Mastering Machine Learning Algorithms Second Edition 02/03/2020
  • EphMrA 2019 Switzerland one day meeting 08/30/2019
  • Machine Learning Algorithms – Second Edition 08/28/2018
  • Recommendations and User-Profiling from Implicit Feedbacks 07/10/2018
  • Are recommendations really helpful? A brief non-technical discussion 06/29/2018

Subscribe to this blog

Join 2,199 other subscribers

Follow me on Twitter

My Tweets
Copyright © Giuseppe Bonaccorso. All Rights Reserved
Proudly powered by WordPress | Theme: Doo by ThemeVS.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.