Epsilon and learning rate decay in epsilon greedy q learning
up vote
0
down vote
favorite
I understand that epsilon marks the trade off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can exploit the higher qvalues youve found. However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay. How do we set our epsilon and alpha such that values converge?
machine-learning reinforcement-learning q-learning decay
share | improve this question
edited Nov 7 at 22:35
...