Tree search based Game AI: How to avoid AI 'wandering'/'procrastination' with sparse rewards?












0















My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.



This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.



But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.



Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.



That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?



Solutions I've tried:




  • Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.


  • Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm


  • Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.


  • Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.











share|improve this question



























    0















    My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.



    This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.



    But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.



    Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.



    That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?



    Solutions I've tried:




    • Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.


    • Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm


    • Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.


    • Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.











    share|improve this question

























      0












      0








      0








      My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.



      This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.



      But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.



      Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.



      That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?



      Solutions I've tried:




      • Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.


      • Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm


      • Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.


      • Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.











      share|improve this question














      My game AI makes use of an algorithm that searches all possible future states based on the moves I can make (minimax / monte carlo esque). It evaluates these states using a scoring system, picks the highest scored final state and follows it.



      This works well in most situations, but awfully when rewards are sparse. For example: there's a desirable collectable object that's 3 tiles to the right of me. The natural solution would be to go right->right->right.



      But, my algorithm searches 6 turns deep. And it will will find many paths that eventually collect the object, including ones that take longer than 3 turns. It might for example find a path that's: up->right->down->right->right->down, collecting the object on turn 5 instead.



      Since in both cases, the final leaf nodes detect the object as collected, it doesn't naturally prefer one or the other. So, instead of going right on turn 1, it might go up, or down, or left. This behavior will be repeated exactly on the next turn, so that it basically ends up dancing randomly in front of the collectable object, only luck will make it step on it.



      That's clearly suboptimal and I want to fix it, but have run out of ideas how to handle this appropriately. Are there any solutions for this issue or is there any theoretical work that deals with handling this issue?



      Solutions I've tried:




      • Make it value object collection more on earlier turns. While this works, to beat evaluator 'noise', the difference between turns must be quite high. Turn 1 must be rated higher than 2, turn 2 rated higher than 3, etc. The difference between turn 1 and 6 needs to be so high that it ends up making the behavior extremely greedy, which is not desirable in most situations. In an environment with multiple objects, it might end up choosing the path that grabs an object on turn 1, instead of the much better path that can grab objects on turn 5 and 6.


      • Assign the object as a target and value distance to it. If not done on a turn to turn basis, the original problem persists. If done on a turn to turn basis, the difference in importance required per turn once again makes it too greedy. This method also decreases flexibility and causes other issues. Target selection is not trivial and kind of ruins the point of a minimax style algorithm


      • Going much deeper in my searches so that it can always find a second object. This would cost so much computing power that I'd have to make concessions, like pruning paths much more aggressively. If I do so, I'll be back at the same problem since I won't know how to get it to prefer pruning the 5 turn version over the 3 turn version.


      • Give extra value to the plans laid out last turn. If it can at least follow the suboptimal path, there wouldn't be as much of an issue. Unfortunately, this once again has to be a pretty strong effect for it to work reliably, making it follow sub-optimal paths in all scenarios, hurting overall performance.








      artificial-intelligence minimax game-ai monte-carlo-tree-search






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 11:52









      AnythingElseAnythingElse

      396




      396
























          1 Answer
          1






          active

          oldest

          votes


















          0














          When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?



          I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.



          This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280469%2ftree-search-based-game-ai-how-to-avoid-ai-wandering-procrastination-with-sp%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?



            I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.



            This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.






            share|improve this answer




























              0














              When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?



              I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.



              This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.






              share|improve this answer


























                0












                0








                0







                When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?



                I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.



                This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.






                share|improve this answer













                When weighting the outcome of the last step of your move, are you calculating in the number of moves needed to pick up an object?



                I presume, you are quantifying each step of your move actions, giving a +1 if the step results in the picking up of an object. This means that in 3 steps, I can pick up the object with your above example, and get a +1 state of the play field, but I can also do this with 4-5-6-x steps, getting the same +1 state. If only a single object is reachable in the depth you are searching, your algorithm will likely select one of the random +1 states, giving you the above behaviour.



                This could be solved, by quantifying with a negative score, each of the moves the AI must make. Thus, getting the object in 3 moves, will result in a -2, but getting the object in 6 moves, will result in -5. This way, the AI will clearly know, that it is preferable to get the object in the least amount of moves, ie, 3.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 8 at 0:28









                Adam BaranyaiAdam Baranyai

                1,44611031




                1,44611031






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280469%2ftree-search-based-game-ai-how-to-avoid-ai-wandering-procrastination-with-sp%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Post-Redirect-Get with Spring WebFlux and Thymeleaf

                    Xamarin.form Move up view when keyboard appear

                    JBPM : POST request for execute process go wrong