Recommended Algorithim for time based clustering












4














I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.



I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.



My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.



Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.










share|improve this question





























    4














    I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.



    I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.



    My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.



    Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.










    share|improve this question



























      4












      4








      4







      I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.



      I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.



      My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.



      Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.










      share|improve this question















      I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.



      I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.



      My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.



      Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.







      algorithm k-means data-science hierarchical-clustering






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 12 at 5:39









      Kromster

      4,77164886




      4,77164886










      asked Nov 11 at 22:25









      mornindew

      5363721




      5363721
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations



          L(1, r) = C(0, r)
          L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).


          If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.



          A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53253857%2frecommended-algorithim-for-time-based-clustering%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations



            L(1, r) = C(0, r)
            L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).


            If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.



            A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).






            share|improve this answer


























              0














              I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations



              L(1, r) = C(0, r)
              L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).


              If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.



              A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).






              share|improve this answer
























                0












                0








                0






                I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations



                L(1, r) = C(0, r)
                L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).


                If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.



                A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).






                share|improve this answer












                I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations



                L(1, r) = C(0, r)
                L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).


                If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.



                A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 12 at 13:01









                David Eisenstat

                36.6k73373




                36.6k73373






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53253857%2frecommended-algorithim-for-time-based-clustering%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Xamarin.form Move up view when keyboard appear

                    Post-Redirect-Get with Spring WebFlux and Thymeleaf

                    Anylogic : not able to use stopDelay()