Epsilon and learning rate decay in epsilon greedy q learning

up vote
0
down vote

favorite

I understand that epsilon marks the trade off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can exploit the higher qvalues youve found.

However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay.

How do we set our epsilon and alpha such that values converge?

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

add a comment |

up vote
0
down vote

favorite

However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay.

How do we set our epsilon and alpha such that values converge?

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

add a comment |

up vote
0
down vote

favorite

However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay.

How do we set our epsilon and alpha such that values converge?

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

However, does our learning rate also decay with time in a stochastic environment? The posts on SO that I've seen only discuss epsilon decay.

How do we set our epsilon and alpha such that values converge?

machine-learning reinforcement-learning q-learning decay

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

edited Nov 7 at 22:35

asked Nov 7 at 22:00

Matt

4751624

asked Nov 7 at 22:00

Matt

4751624

asked Nov 7 at 22:00

Matt

4751624

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

At the beginning, you want epsilon to be high so that you take big leaps and learn things

I think you have have mistaken epsilon and learning rate. This definition is actually related to the learning rate.

Learning rate decay

Learning rate is how big you take a leap in finding optimal policy. In the terms of simple QLearning it's how much you are updating the Q value with each step.

enter image description here

Higher alpha means you are updating your Q values in big steps. When the agent is learning you should decay this to stabilize your model output which eventually converges to an optimal policy.

Epsilon Decay

Epsilon is used when we are selecting specific actions base on the Q values we already have. As an example if we select pure greedy method ( epsilon = 0 ) then we are always selecting the highest q value among the all the q values for a specific state. This causes issue in exploration as we can get stuck easily at a local optima.

Therefore we introduce a randomness using epsilon. As an example if epsilon = 0.3 then we are selecting random actions with 0.3 probability regardless of the actual q value.

Find more details on epsilon-greedy policy here.

In conclusion learning rate is associated with how big you take a leap and epsilon is associated with how random you take an action. As the learning goes on both should decayed to stabilize and exploit the learned policy which converges to an optimal one.

answered Nov 8 at 7:03

Vishma Dias

863

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53198503%2fepsilon-and-learning-rate-decay-in-epsilon-greedy-q-learning%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

At the beginning, you want epsilon to be high so that you take big leaps and learn things

I think you have have mistaken epsilon and learning rate. This definition is actually related to the learning rate.

Learning rate decay

Learning rate is how big you take a leap in finding optimal policy. In the terms of simple QLearning it's how much you are updating the Q value with each step.

enter image description here

Higher alpha means you are updating your Q values in big steps. When the agent is learning you should decay this to stabilize your model output which eventually converges to an optimal policy.

Epsilon Decay

Therefore we introduce a randomness using epsilon. As an example if epsilon = 0.3 then we are selecting random actions with 0.3 probability regardless of the actual q value.

Find more details on epsilon-greedy policy here.

answered Nov 8 at 7:03

Vishma Dias

863

add a comment |

up vote
1
down vote

accepted

At the beginning, you want epsilon to be high so that you take big leaps and learn things

I think you have have mistaken epsilon and learning rate. This definition is actually related to the learning rate.

Learning rate decay

Learning rate is how big you take a leap in finding optimal policy. In the terms of simple QLearning it's how much you are updating the Q value with each step.

enter image description here

Higher alpha means you are updating your Q values in big steps. When the agent is learning you should decay this to stabilize your model output which eventually converges to an optimal policy.

Epsilon Decay

Therefore we introduce a randomness using epsilon. As an example if epsilon = 0.3 then we are selecting random actions with 0.3 probability regardless of the actual q value.

Find more details on epsilon-greedy policy here.

answered Nov 8 at 7:03

Vishma Dias

863

add a comment |

up vote
1
down vote

accepted

At the beginning, you want epsilon to be high so that you take big leaps and learn things

I think you have have mistaken epsilon and learning rate. This definition is actually related to the learning rate.

Learning rate decay

Learning rate is how big you take a leap in finding optimal policy. In the terms of simple QLearning it's how much you are updating the Q value with each step.

enter image description here

Higher alpha means you are updating your Q values in big steps. When the agent is learning you should decay this to stabilize your model output which eventually converges to an optimal policy.

Epsilon Decay

Therefore we introduce a randomness using epsilon. As an example if epsilon = 0.3 then we are selecting random actions with 0.3 probability regardless of the actual q value.

Find more details on epsilon-greedy policy here.

answered Nov 8 at 7:03

Vishma Dias

863

At the beginning, you want epsilon to be high so that you take big leaps and learn things

I think you have have mistaken epsilon and learning rate. This definition is actually related to the learning rate.

Learning rate decay

Learning rate is how big you take a leap in finding optimal policy. In the terms of simple QLearning it's how much you are updating the Q value with each step.

enter image description here

Higher alpha means you are updating your Q values in big steps. When the agent is learning you should decay this to stabilize your model output which eventually converges to an optimal policy.

Epsilon Decay

Therefore we introduce a randomness using epsilon. As an example if epsilon = 0.3 then we are selecting random actions with 0.3 probability regardless of the actual q value.

Find more details on epsilon-greedy policy here.

answered Nov 8 at 7:03

Vishma Dias

863

answered Nov 8 at 7:03

Vishma Dias

863

answered Nov 8 at 7:03

Vishma Dias

863

answered Nov 8 at 7:03

Vishma Dias

863

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

8f57Q1S 2Wr

搜尋此網誌

Wsrtjtyk