Reinforcement learning – applications and challenges

Reading Time: 4 minutes

Reinforcement Learning

In the last blog we covered the basics of Reinforcement Learning (RL), its common rules and its terminologies. In this edition, we will focus on its real-world applications, challenges associated and its relevance in the world of Data Science.

Challenges in RL

As discussed in the last blog, an agent does an action in the environment based on the policy of the system. This policy is learned through the reward mechanism that the user defines. Now, defining a perfect reward mechanism is a fairly complex task. If the rewards are long-term, then the short-term actions of the agent whether right or wrong have a very small effect on the overall result. In order to optimize this whole ecosystem a lot of data is required. This is known as the Credit Assignment problem. This is one of the biggest challenges that researchers face while solving a problem using RL. However, researchers have come up with various short-term reward mechanisms and also a lot of custom optimizers which helps attain optima at a faster rate.

Applications of RL

Despite having few challenges while defining the best reward mechanism, RL is the best choice of algorithm in high-dimensional control problems and various similar industrial applications.
Some of the applications of RL are:


  1. Games
  2. The ability to play games is usually considered a consequence of intelligence, and Deep RL has been successfully applied to various range of games, and astonishingly enough with no information of dynamics of game as an input but the pixels. From playing perfect information board games such as go and chess to imperfect information games such as poker, RL has successfully beaten the experts in respective domains. It has also made its mark in various video games from classic Atari games to complex online games such as DOTA. Even in complex and multiplayer settings the AI can cooperate with human players to beat the opponents. However, it is important to note that RL can be very sample inefficient, for instance it took 45,000 years of gameplay simulation for RL to learn to play DOTA.

  3. Recommendation System
  4. The ability of RL to deal with complex structure of information has made it a potential solution for recommendation systems. Alibaba, Taobao have successfully implemented RL on their e-commerce platform. Facebook is using RL to choose relevant notifications to send to its users. All this works with a simple and intuitive idea of reinforcing the content which is likely to generate better engagement from the user.

  5. Humanoid or Robots
  6. From navigation to performing automated motor tasks, RL has shown promising results at least in simulated environments. The development of robots with human like interaction is still an implementation which lies far ahead in the future. However, task-specific robots have successfully been built for relevant industrial applications. Future of RL in the industrial applications space looks promising. Microsoft recently acquired a startup working in deep RL for Industrial systems.

  7. Natural language Processing
  8. Reinforcement Learning has the ability to specialize in specific tasks which it learns by repeating it over and over, hence in the domain of NLP it has yielded significant results for goal-oriented chat bots. The aim of these chatbots is to answer a directed query for specific domain. For instance, it can help users book a ticket, find a reservation, and other areas such as machine translation, text generation etc.

  9. Finance
  10. Reinforcement Learning has the ability to specialize in specific tasks which it learns by repeating it over and over, hence in the domain of NLP it has yielded significant results for goal-oriented chat bots. The aim of these chatbots is to answer a directed query for specific domain. For instance, it can help users book a ticket, find a reservation, and other areas such as machine translation, text generation etc.

Conclusion

Millions of data points are generated each second in the world of finance. Even Czars of financial industry find it difficult to predict trends consistently. Can an algorithm, with absolutely no knowledge of dynamics but just with data, come up with sequential dynamic strategy to choose relevant stocks? The answer is a definite yes. Scientific papers have been published on the profitable strategies in niche markets. RL can further be extended to areas such as option pricing, portfolio optimization and so on.

About the Author

Abhijeet is a Data Scientist at Sigmoid. He mainly works in the domain of Recommendation Engines, Time Series Forecasting, Reinforcement Learning and Computer Vision.

Transform data into real-world outcomes with us.