Cluster analysis algorithm for identifying line clusters on a map












-3














I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.





I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question




















  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01
















-3














I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.





I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question




















  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01














-3












-3








-3







I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.





I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question















I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.





I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here







python algorithm machine-learning scikit-learn deep-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 20:30

























asked Nov 10 at 17:56









Ruan

35618




35618








  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01














  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01








1




1




I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01




I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01












1 Answer
1






active

oldest

votes


















1














I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






    share|improve this answer


























      1














      I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






      share|improve this answer
























        1












        1








        1






        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






        share|improve this answer












        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 18:30









        mcdowella

        17.5k21120




        17.5k21120






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Xamarin.form Move up view when keyboard appear

            Post-Redirect-Get with Spring WebFlux and Thymeleaf

            Anylogic : not able to use stopDelay()