
Understanding Reinforcement Learning in five minutes

Reinforcement learning (RL) is an area of Machine Learning (ML) that takes suitable actions to maximize rewards situations. The goal of reinforcement learning algorithms is to find the best possible action to take in a specific situation. Just like the human brain, it is rewarded for good choices and penalized for bad choices and learns from each choice. RL tries to mimic the way that humans learn new things, not from a teacher but via interaction with the environment. At the end, the RL learns to achieve a goal in an uncertain, potentially complex environment.
How does one learn cycling? How does a baby learn to walk? How do we become better at doing something with more practice? Let us explore learning to cycle to illustrate the idea behind RL.
Did somebody tell you how to cycle or gave you steps to follow? Or did you learn it by spending hours watching videos of people cycling? All these will surely give you an idea about cycling; but will it be enough to actually get you cycling? The answer is no. You learn to cycle only by cycling (action). Through trials and errors (practice), and going through all the positive experiences (positive reward) and negative experiences (negative rewards or punishments), before getting your balance and control right (maximum reward or best outcome). This analogy of how our brain learns cycling applies to reinforcement learning. Through trials, errors, and rewards, it finds the best course of action.
The major components of RL are as detailed below:
Instead of simply scanning the datasets to find a mathematical equation that can reproduce historical outcomes like other Machine Learning techniques, reinforcement learning is focused on discovering the optimal actions that will lead to the desired outcome.
There are no supervisors to guide the model on how well it is doing. The RL agent gets a scalar reward and tries to figure out how good the action was.
Feedback is delayed. The agent gets an instant reward for action, however, the long-term effect of an action is known only later. Just like a move in chess may seem good at the time it is made, but may turn out to be a bad long term move as the game progress.
Time matters (sequential). People who are familiar with supervised and unsupervised learning will know that the sequence in which data is used for training does not matter for the outcome. However, for RL, since action and reward at current state influence future state and action, the time and sequence of data matters.
Action affects subsequent data RL agent receives.
The type of problems that reinforcement learning solves are simply beyond human capabilities. They are even beyond the solving capabilities of ML techniques. Besides, RL eliminates the need for data to learn, as the agent learns by interacting with the environment. This is a great advantage to solve problems where data availability or data collection is an issue.
Reinforcement Learning applications
RL is the darling of ML researchers now. It is advancing with incredible pace, to solve business and industrial problems and garnering a lot of attention due to its potential. Going forward, RL will be core to organizations’ AI strategies.
Reinforcement Learning
Reinforcement Learning is core to GAVS’ AI strategy and is being actively pursued to power the IP led AIOps platform – ZIF. We had our first success on RL; developing an RL agent for automated log rotation in servers.
References:
Reinforcement Learning: An Introduction second edition by Richard S. Sutton and Andrew G. Barto
https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
“Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence. “
Please complete the form details and a customer success representative will reach out to you shortly to schedule the demo. Thanks for your interest in ZIF!