Combining new ArangoSearch views and graph traversals

I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:

"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"

The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).

Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)

Thanks!

asked Nov 13 '18 at 19:21

Dale

698

You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00

@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44

It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18

@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20

Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46

add a comment |

"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"

Thanks!

asked Nov 13 '18 at 19:21

Dale

698

You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00

@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44

It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18

@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20

Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46

add a comment |

"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"

Thanks!

asked Nov 13 '18 at 19:21

Dale

698

"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"

Thanks!

arangodb

asked Nov 13 '18 at 19:21

Dale

698

asked Nov 13 '18 at 19:21

Dale

698

asked Nov 13 '18 at 19:21

Dale

698

asked Nov 13 '18 at 19:21

Dale

698

asked Nov 13 '18 at 19:21

Dale

698

You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00

@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44

It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18

@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20

Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46

add a comment |

You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00

@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44

It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18

@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20

Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46

You could try something like: let pools =( for I In 2..2 outbound startDoc edges1, inbound edges2 return I) for pool in pools for v in view filter pool.something = v.something and [additional filters] return v

– camba1
Nov 15 '18 at 1:00

@camba1 - Yes, that's the general join pattern; however, after some testing I found this to be very slow. The view filtering is executed for each result in the "pools" result. I have tested using an intersection (not "union" as said initially) and it is far more performant: "for doc in intersection(([graph traversal]),([search expression])) return doc" This seems to be roughly the sum of the cost of the two queries. I'm just not sure if this the best way when the search expression could return very large result sets (with a large collection and a low-selectivity query).

– Dale
Nov 15 '18 at 15:44

It should be more efficient to search for the text in the view, then start the traversal from the matches (but it may not be applicable to your use case). Edge indexes can be utilized for any starting vertices, but the ArangoSearch view inverted index can only be queried in it's entirety. It can't benefit from previous filters / traversal which leave a subset of documents remaining.

– CoDEmanX
Nov 19 '18 at 12:18

@CoDEmanX - The issue in my case is that the text view result set may be very large and many (most) of those results would be invalid starting points for the subsequent traversal. I feel like there is no universally performant approach since it's a function of two operations of unknown cost (a priori). At this point I am considering indexing additional fields to de-normalize and flatten the structure of the model enough to bound size of the text view search results.

– Dale
Nov 26 '18 at 14:20

Can you filter out invalid starting points perhaps? Or do you mean by "invalid" that the search returns a lot of documents which have no connected edges at all? That shouldn't be much of a problem. If you know of another system that supports a combined index for fulltext and graphs as you desire, let me know.

– CoDEmanX
Nov 27 '18 at 15:46

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288120%2fcombining-new-arangosearch-views-and-graph-traversals%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk