Combining new ArangoSearch views and graph traversals












2















I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:



"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"



The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).



Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)



Thanks!










share|improve this question























  • You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

    – camba1
    Nov 15 '18 at 1:00











  • @camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

    – Dale
    Nov 15 '18 at 15:44













  • It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

    – CoDEmanX
    Nov 19 '18 at 12:18











  • @CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

    – Dale
    Nov 26 '18 at 14:20











  • Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

    – CoDEmanX
    Nov 27 '18 at 15:46
















2















I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:



"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"



The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).



Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)



Thanks!










share|improve this question























  • You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

    – camba1
    Nov 15 '18 at 1:00











  • @camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

    – Dale
    Nov 15 '18 at 15:44













  • It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

    – CoDEmanX
    Nov 19 '18 at 12:18











  • @CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

    – Dale
    Nov 26 '18 at 14:20











  • Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

    – CoDEmanX
    Nov 27 '18 at 15:46














2












2








2


1






I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:



"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"



The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).



Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)



Thanks!










share|improve this question














I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:



"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"



The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).



Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)



Thanks!







arangodb






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 19:21









DaleDale

698




698













  • You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

    – camba1
    Nov 15 '18 at 1:00











  • @camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

    – Dale
    Nov 15 '18 at 15:44













  • It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

    – CoDEmanX
    Nov 19 '18 at 12:18











  • @CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

    – Dale
    Nov 26 '18 at 14:20











  • Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

    – CoDEmanX
    Nov 27 '18 at 15:46



















  • You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

    – camba1
    Nov 15 '18 at 1:00











  • @camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

    – Dale
    Nov 15 '18 at 15:44













  • It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

    – CoDEmanX
    Nov 19 '18 at 12:18











  • @CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

    – Dale
    Nov 26 '18 at 14:20











  • Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

    – CoDEmanX
    Nov 27 '18 at 15:46

















You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00





You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00













@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44







@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44















It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18





It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18













@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20





@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20













Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46





Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288120%2fcombining-new-arangosearch-views-and-graph-traversals%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288120%2fcombining-new-arangosearch-views-and-graph-traversals%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud