POSTGIS limiting a distance query by a threshold












1















I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.



I found an example of getting the closest distances at
PostGIS minimum distance between two large sets of points



The code there looks like



SELECT 
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;


I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.



How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value



SELECT 
a.id, nn.id AS id_nn,
a.wkb_geometry, nn.geom_closest,
ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM
mammography21 AS a
CROSS JOIN LATERAL
(SELECT
b.gid as id,
b.wkb_geometry AS geom_closest
FROM cartographic_boundary_us_zcta_2016 b
ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;


but this gives the error:
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?










share|improve this question























  • You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

    – thibautg
    Nov 22 '18 at 8:15
















1















I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.



I found an example of getting the closest distances at
PostGIS minimum distance between two large sets of points



The code there looks like



SELECT 
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;


I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.



How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value



SELECT 
a.id, nn.id AS id_nn,
a.wkb_geometry, nn.geom_closest,
ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM
mammography21 AS a
CROSS JOIN LATERAL
(SELECT
b.gid as id,
b.wkb_geometry AS geom_closest
FROM cartographic_boundary_us_zcta_2016 b
ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;


but this gives the error:
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?










share|improve this question























  • You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

    – thibautg
    Nov 22 '18 at 8:15














1












1








1








I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.



I found an example of getting the closest distances at
PostGIS minimum distance between two large sets of points



The code there looks like



SELECT 
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;


I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.



How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value



SELECT 
a.id, nn.id AS id_nn,
a.wkb_geometry, nn.geom_closest,
ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM
mammography21 AS a
CROSS JOIN LATERAL
(SELECT
b.gid as id,
b.wkb_geometry AS geom_closest
FROM cartographic_boundary_us_zcta_2016 b
ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;


but this gives the error:
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?










share|improve this question














I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.



I found an example of getting the closest distances at
PostGIS minimum distance between two large sets of points



The code there looks like



SELECT 
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;


I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.



How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value



SELECT 
a.id, nn.id AS id_nn,
a.wkb_geometry, nn.geom_closest,
ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM
mammography21 AS a
CROSS JOIN LATERAL
(SELECT
b.gid as id,
b.wkb_geometry AS geom_closest
FROM cartographic_boundary_us_zcta_2016 b
ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;


but this gives the error:
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?







postgis






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 '18 at 3:16









R RommelR Rommel

63




63













  • You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

    – thibautg
    Nov 22 '18 at 8:15



















  • You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

    – thibautg
    Nov 22 '18 at 8:15

















You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

– thibautg
Nov 22 '18 at 8:15





You can use ST_DWithin in the WHERE clause to limit the search to a certain distance.

– thibautg
Nov 22 '18 at 8:15












1 Answer
1






active

oldest

votes


















1














The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.



Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.



Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.



At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.



SELECT 
a.id, nn.id AS id_nn,
a.geom srcGeom, nn.geom nearGeom,
ST_DistanceSphere(a.geom, nn.geom) AS near_dist
FROM
mammography21 AS a,
cartographic_boundary_us_zcta_2016 nn
WHERE ST_DWithin(a.geom, nn.geom, 10);





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423373%2fpostgis-limiting-a-distance-query-by-a-threshold%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.



    Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.



    Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.



    At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.



    SELECT 
    a.id, nn.id AS id_nn,
    a.geom srcGeom, nn.geom nearGeom,
    ST_DistanceSphere(a.geom, nn.geom) AS near_dist
    FROM
    mammography21 AS a,
    cartographic_boundary_us_zcta_2016 nn
    WHERE ST_DWithin(a.geom, nn.geom, 10);





    share|improve this answer




























      1














      The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.



      Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.



      Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.



      At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.



      SELECT 
      a.id, nn.id AS id_nn,
      a.geom srcGeom, nn.geom nearGeom,
      ST_DistanceSphere(a.geom, nn.geom) AS near_dist
      FROM
      mammography21 AS a,
      cartographic_boundary_us_zcta_2016 nn
      WHERE ST_DWithin(a.geom, nn.geom, 10);





      share|improve this answer


























        1












        1








        1







        The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.



        Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.



        Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.



        At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.



        SELECT 
        a.id, nn.id AS id_nn,
        a.geom srcGeom, nn.geom nearGeom,
        ST_DistanceSphere(a.geom, nn.geom) AS near_dist
        FROM
        mammography21 AS a,
        cartographic_boundary_us_zcta_2016 nn
        WHERE ST_DWithin(a.geom, nn.geom, 10);





        share|improve this answer













        The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.



        Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.



        Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.



        At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.



        SELECT 
        a.id, nn.id AS id_nn,
        a.geom srcGeom, nn.geom nearGeom,
        ST_DistanceSphere(a.geom, nn.geom) AS near_dist
        FROM
        mammography21 AS a,
        cartographic_boundary_us_zcta_2016 nn
        WHERE ST_DWithin(a.geom, nn.geom, 10);






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 '18 at 14:29









        JGHJGH

        3,55441126




        3,55441126
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423373%2fpostgis-limiting-a-distance-query-by-a-threshold%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini