More efficiently compute transitive closures of each dependents while incrementally building the directed...












9















I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node.



In other words, given a node in a dependency graph, find the set of sets of direct dependents which transitively have common dependents that derive from that particular starting node.



e.g. given the pseudo code:



let a = 1
let b = 2
let c = a + b
let d = a + b
let e = a
let f = a + e
let g = c + d


You could compute this graph:



Graph diagram



If we used a as the start node we can see that of the dependents of a, both c and d have a dependent of g. And f has a dependent of e and a.



Notice that a has no impact on b at all, so it should not be taken into account when deciding how to group the dependents of a.



Using a as the start node, we'd want to get this grouped sets of dependents:



groups = {{c, d}, {e, f}}


c and d have direct or transitive downstream relations, and e and f together as well. But, for example, e and f have no dependent (downstream) relation at all with c or d either directly or indirectly (transitively). And b doesn't derive from a directly or indirectly, so it should not have any impact on deciding our grouping.



Also keep in mind that this graph is small for simplicity. It could be that transitive dependents happen much further down the subgraph than this example happens to have.





I did a bunch of paper research and indeed there are numerous solutions, however they don't have the performance characteristics I'm looking for. The graph is incrementally created over time, and at each stage I need to be able to answer this question so traversing the entire graph each time is a dealbreaker.



I think I have a major advantage that isn't referenced in the various approaches I could find: I have full control over the creation of the graph and the dependents are added in reverse topological order, so the graph is correctly sorted. With that in mind, I considered the obvious solution of computing the answer incrementally (dynamic programming).



I figured a bitmask would be a fast way to store and look up what dependents a given node has. When a dependent is added to a node, I would update that node's mask to include the bits of that dependent (which itself includes its dependents and so on)



let maskCounter = 0;

class Node {
constructor(name) {
this.name = name;
this.dependents = ;
this.mask = 1 << maskCounter;
maskCounter++;
}

addDependent(dependent) {
// Now our mask contains the bits representing all of
// its direct and transitive dependents
this.mask = this.mask | dependent.mask;

// Need to see if this dependent has a transitive
// dependent of its own that exists in one of the groups
for (const group of this.dependents) {
const result = group.mask & dependent.mask;

if (result) {
group.mask |= dependent.mask;
group.values.push(dependent);
return;
}
}

// If reached, this dependent has no transitive dependents
// of its own with any of this node's other dependents.
// That's confusing, huh?
this.dependents.push({
mask: dependent.mask,
values: [dependent]
});
}
}


The dependents would, however, need to be added in reverse order up the graph so that the graph sorted correctly and the top of the graph contains the masks of all its dependents.



const a = new Node('a');
const b = new Node('b');
const c = new Node('c');
const d = new Node('d');
const e = new Node('e');
const f = new Node('f');
const g = new Node('g');

b.addDependent(c);
b.addDependent(d);
c.addDependent(g);
d.addDependent(g);
e.addDependent(f);
a.addDependent(c);
a.addDependent(d);
a.addDependent(e);
a.addDependent(f);


The bitmasks would incrementally look like this:



b = b 00000010 | c 00000100
b = b 00000110 | d 00001000
c = c 00000100 | g 01000000
d = d 00001000 | g 01000000
e = e 00010000 | f 00100000
a = a 00000001 | c 01000100
a = a 01000101 | d 01001000
a = a 01001101 | e 00110000
a = a 01111101 | f 00100000
===========================
a = 01111101


At the end a has a mask of 01111101, each bit represents each of its downstream transitive dependents. Notice that the second to last bit is not flipped, that's the bit for b which does not depend on a at all.



If we look at the resulting value of a.dependents we see:



[
{ values: [c, d], mask: 0b00110000 },
{ values: [e, f], mask: 0b01001100 }
]


which provides the answer we're looking for, ultimately a set of sets. a.dependents.map(group => group.values)--this an array aka list but it's being used as a set for simplicity.



Here's a JSBin: https://jsbin.com/jexofip/edit?js,console



This works, and CPU-wise is acceptable because I'll very frequently need to know the grouped dependents but dependents change far less often.



The example above uses JavaScript for simplicity of demo, which uses 32-bit signed integers for bitwise operations so we can only create 31 unique nodes. We could use an arbitrary precision integer (e.g. BigInt) to create a "unlimited" number of nodes, but the problem is memory usage.



Because each node needs their own unique bit flipped, I think the memory usage is:



N * (N + 1) / 2 = bits      (where N = number of nodes)

e.g. 10,000 nodes is about 6.25 megabytes!


That's excluding any platform overhead for using arbitrary precision integers (or a similar custom data structure).



In my use case, it would be common to have 10k+ and in fact 100k+ is probable in some cases (625 MB !!!) and it's also possible for new nodes to be created indefinitely, using an infinite amount of memory, so this solution isn't practical because there is no easy way to "garbage collect" no longer used mask bits from nodes that drop off the graph--it's of course possible, but it's the traditional GC problem I'd like to avoid if possible.



Side note: depending on the size and depth of the graph, this might not actually perform well either. Even though bitwise operations themselves are relatively fast, doing it on on a BigInt of 100,000 bits for each node at the top of the graph is not. So completely rethinking my approach is also welcome.





Ultimately trading memory for CPU is the usual give and take, but I'm wondering if perhaps I'm missing another approach that strikes a better balance or requires significantly less memory?



There may be other unique considerations I haven't thought of that could be used.



School me!










share|improve this question

























  • This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

    – philipxy
    Nov 17 '18 at 4:56













  • @philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

    – jayphelps
    Nov 17 '18 at 5:00











  • @philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

    – jayphelps
    Nov 17 '18 at 5:01






  • 1





    "the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

    – user2357112
    Nov 21 '18 at 23:36






  • 1





    How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

    – user2357112
    Nov 21 '18 at 23:40
















9















I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node.



In other words, given a node in a dependency graph, find the set of sets of direct dependents which transitively have common dependents that derive from that particular starting node.



e.g. given the pseudo code:



let a = 1
let b = 2
let c = a + b
let d = a + b
let e = a
let f = a + e
let g = c + d


You could compute this graph:



Graph diagram



If we used a as the start node we can see that of the dependents of a, both c and d have a dependent of g. And f has a dependent of e and a.



Notice that a has no impact on b at all, so it should not be taken into account when deciding how to group the dependents of a.



Using a as the start node, we'd want to get this grouped sets of dependents:



groups = {{c, d}, {e, f}}


c and d have direct or transitive downstream relations, and e and f together as well. But, for example, e and f have no dependent (downstream) relation at all with c or d either directly or indirectly (transitively). And b doesn't derive from a directly or indirectly, so it should not have any impact on deciding our grouping.



Also keep in mind that this graph is small for simplicity. It could be that transitive dependents happen much further down the subgraph than this example happens to have.





I did a bunch of paper research and indeed there are numerous solutions, however they don't have the performance characteristics I'm looking for. The graph is incrementally created over time, and at each stage I need to be able to answer this question so traversing the entire graph each time is a dealbreaker.



I think I have a major advantage that isn't referenced in the various approaches I could find: I have full control over the creation of the graph and the dependents are added in reverse topological order, so the graph is correctly sorted. With that in mind, I considered the obvious solution of computing the answer incrementally (dynamic programming).



I figured a bitmask would be a fast way to store and look up what dependents a given node has. When a dependent is added to a node, I would update that node's mask to include the bits of that dependent (which itself includes its dependents and so on)



let maskCounter = 0;

class Node {
constructor(name) {
this.name = name;
this.dependents = ;
this.mask = 1 << maskCounter;
maskCounter++;
}

addDependent(dependent) {
// Now our mask contains the bits representing all of
// its direct and transitive dependents
this.mask = this.mask | dependent.mask;

// Need to see if this dependent has a transitive
// dependent of its own that exists in one of the groups
for (const group of this.dependents) {
const result = group.mask & dependent.mask;

if (result) {
group.mask |= dependent.mask;
group.values.push(dependent);
return;
}
}

// If reached, this dependent has no transitive dependents
// of its own with any of this node's other dependents.
// That's confusing, huh?
this.dependents.push({
mask: dependent.mask,
values: [dependent]
});
}
}


The dependents would, however, need to be added in reverse order up the graph so that the graph sorted correctly and the top of the graph contains the masks of all its dependents.



const a = new Node('a');
const b = new Node('b');
const c = new Node('c');
const d = new Node('d');
const e = new Node('e');
const f = new Node('f');
const g = new Node('g');

b.addDependent(c);
b.addDependent(d);
c.addDependent(g);
d.addDependent(g);
e.addDependent(f);
a.addDependent(c);
a.addDependent(d);
a.addDependent(e);
a.addDependent(f);


The bitmasks would incrementally look like this:



b = b 00000010 | c 00000100
b = b 00000110 | d 00001000
c = c 00000100 | g 01000000
d = d 00001000 | g 01000000
e = e 00010000 | f 00100000
a = a 00000001 | c 01000100
a = a 01000101 | d 01001000
a = a 01001101 | e 00110000
a = a 01111101 | f 00100000
===========================
a = 01111101


At the end a has a mask of 01111101, each bit represents each of its downstream transitive dependents. Notice that the second to last bit is not flipped, that's the bit for b which does not depend on a at all.



If we look at the resulting value of a.dependents we see:



[
{ values: [c, d], mask: 0b00110000 },
{ values: [e, f], mask: 0b01001100 }
]


which provides the answer we're looking for, ultimately a set of sets. a.dependents.map(group => group.values)--this an array aka list but it's being used as a set for simplicity.



Here's a JSBin: https://jsbin.com/jexofip/edit?js,console



This works, and CPU-wise is acceptable because I'll very frequently need to know the grouped dependents but dependents change far less often.



The example above uses JavaScript for simplicity of demo, which uses 32-bit signed integers for bitwise operations so we can only create 31 unique nodes. We could use an arbitrary precision integer (e.g. BigInt) to create a "unlimited" number of nodes, but the problem is memory usage.



Because each node needs their own unique bit flipped, I think the memory usage is:



N * (N + 1) / 2 = bits      (where N = number of nodes)

e.g. 10,000 nodes is about 6.25 megabytes!


That's excluding any platform overhead for using arbitrary precision integers (or a similar custom data structure).



In my use case, it would be common to have 10k+ and in fact 100k+ is probable in some cases (625 MB !!!) and it's also possible for new nodes to be created indefinitely, using an infinite amount of memory, so this solution isn't practical because there is no easy way to "garbage collect" no longer used mask bits from nodes that drop off the graph--it's of course possible, but it's the traditional GC problem I'd like to avoid if possible.



Side note: depending on the size and depth of the graph, this might not actually perform well either. Even though bitwise operations themselves are relatively fast, doing it on on a BigInt of 100,000 bits for each node at the top of the graph is not. So completely rethinking my approach is also welcome.





Ultimately trading memory for CPU is the usual give and take, but I'm wondering if perhaps I'm missing another approach that strikes a better balance or requires significantly less memory?



There may be other unique considerations I haven't thought of that could be used.



School me!










share|improve this question

























  • This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

    – philipxy
    Nov 17 '18 at 4:56













  • @philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

    – jayphelps
    Nov 17 '18 at 5:00











  • @philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

    – jayphelps
    Nov 17 '18 at 5:01






  • 1





    "the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

    – user2357112
    Nov 21 '18 at 23:36






  • 1





    How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

    – user2357112
    Nov 21 '18 at 23:40














9












9








9








I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node.



In other words, given a node in a dependency graph, find the set of sets of direct dependents which transitively have common dependents that derive from that particular starting node.



e.g. given the pseudo code:



let a = 1
let b = 2
let c = a + b
let d = a + b
let e = a
let f = a + e
let g = c + d


You could compute this graph:



Graph diagram



If we used a as the start node we can see that of the dependents of a, both c and d have a dependent of g. And f has a dependent of e and a.



Notice that a has no impact on b at all, so it should not be taken into account when deciding how to group the dependents of a.



Using a as the start node, we'd want to get this grouped sets of dependents:



groups = {{c, d}, {e, f}}


c and d have direct or transitive downstream relations, and e and f together as well. But, for example, e and f have no dependent (downstream) relation at all with c or d either directly or indirectly (transitively). And b doesn't derive from a directly or indirectly, so it should not have any impact on deciding our grouping.



Also keep in mind that this graph is small for simplicity. It could be that transitive dependents happen much further down the subgraph than this example happens to have.





I did a bunch of paper research and indeed there are numerous solutions, however they don't have the performance characteristics I'm looking for. The graph is incrementally created over time, and at each stage I need to be able to answer this question so traversing the entire graph each time is a dealbreaker.



I think I have a major advantage that isn't referenced in the various approaches I could find: I have full control over the creation of the graph and the dependents are added in reverse topological order, so the graph is correctly sorted. With that in mind, I considered the obvious solution of computing the answer incrementally (dynamic programming).



I figured a bitmask would be a fast way to store and look up what dependents a given node has. When a dependent is added to a node, I would update that node's mask to include the bits of that dependent (which itself includes its dependents and so on)



let maskCounter = 0;

class Node {
constructor(name) {
this.name = name;
this.dependents = ;
this.mask = 1 << maskCounter;
maskCounter++;
}

addDependent(dependent) {
// Now our mask contains the bits representing all of
// its direct and transitive dependents
this.mask = this.mask | dependent.mask;

// Need to see if this dependent has a transitive
// dependent of its own that exists in one of the groups
for (const group of this.dependents) {
const result = group.mask & dependent.mask;

if (result) {
group.mask |= dependent.mask;
group.values.push(dependent);
return;
}
}

// If reached, this dependent has no transitive dependents
// of its own with any of this node's other dependents.
// That's confusing, huh?
this.dependents.push({
mask: dependent.mask,
values: [dependent]
});
}
}


The dependents would, however, need to be added in reverse order up the graph so that the graph sorted correctly and the top of the graph contains the masks of all its dependents.



const a = new Node('a');
const b = new Node('b');
const c = new Node('c');
const d = new Node('d');
const e = new Node('e');
const f = new Node('f');
const g = new Node('g');

b.addDependent(c);
b.addDependent(d);
c.addDependent(g);
d.addDependent(g);
e.addDependent(f);
a.addDependent(c);
a.addDependent(d);
a.addDependent(e);
a.addDependent(f);


The bitmasks would incrementally look like this:



b = b 00000010 | c 00000100
b = b 00000110 | d 00001000
c = c 00000100 | g 01000000
d = d 00001000 | g 01000000
e = e 00010000 | f 00100000
a = a 00000001 | c 01000100
a = a 01000101 | d 01001000
a = a 01001101 | e 00110000
a = a 01111101 | f 00100000
===========================
a = 01111101


At the end a has a mask of 01111101, each bit represents each of its downstream transitive dependents. Notice that the second to last bit is not flipped, that's the bit for b which does not depend on a at all.



If we look at the resulting value of a.dependents we see:



[
{ values: [c, d], mask: 0b00110000 },
{ values: [e, f], mask: 0b01001100 }
]


which provides the answer we're looking for, ultimately a set of sets. a.dependents.map(group => group.values)--this an array aka list but it's being used as a set for simplicity.



Here's a JSBin: https://jsbin.com/jexofip/edit?js,console



This works, and CPU-wise is acceptable because I'll very frequently need to know the grouped dependents but dependents change far less often.



The example above uses JavaScript for simplicity of demo, which uses 32-bit signed integers for bitwise operations so we can only create 31 unique nodes. We could use an arbitrary precision integer (e.g. BigInt) to create a "unlimited" number of nodes, but the problem is memory usage.



Because each node needs their own unique bit flipped, I think the memory usage is:



N * (N + 1) / 2 = bits      (where N = number of nodes)

e.g. 10,000 nodes is about 6.25 megabytes!


That's excluding any platform overhead for using arbitrary precision integers (or a similar custom data structure).



In my use case, it would be common to have 10k+ and in fact 100k+ is probable in some cases (625 MB !!!) and it's also possible for new nodes to be created indefinitely, using an infinite amount of memory, so this solution isn't practical because there is no easy way to "garbage collect" no longer used mask bits from nodes that drop off the graph--it's of course possible, but it's the traditional GC problem I'd like to avoid if possible.



Side note: depending on the size and depth of the graph, this might not actually perform well either. Even though bitwise operations themselves are relatively fast, doing it on on a BigInt of 100,000 bits for each node at the top of the graph is not. So completely rethinking my approach is also welcome.





Ultimately trading memory for CPU is the usual give and take, but I'm wondering if perhaps I'm missing another approach that strikes a better balance or requires significantly less memory?



There may be other unique considerations I haven't thought of that could be used.



School me!










share|improve this question
















I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node.



In other words, given a node in a dependency graph, find the set of sets of direct dependents which transitively have common dependents that derive from that particular starting node.



e.g. given the pseudo code:



let a = 1
let b = 2
let c = a + b
let d = a + b
let e = a
let f = a + e
let g = c + d


You could compute this graph:



Graph diagram



If we used a as the start node we can see that of the dependents of a, both c and d have a dependent of g. And f has a dependent of e and a.



Notice that a has no impact on b at all, so it should not be taken into account when deciding how to group the dependents of a.



Using a as the start node, we'd want to get this grouped sets of dependents:



groups = {{c, d}, {e, f}}


c and d have direct or transitive downstream relations, and e and f together as well. But, for example, e and f have no dependent (downstream) relation at all with c or d either directly or indirectly (transitively). And b doesn't derive from a directly or indirectly, so it should not have any impact on deciding our grouping.



Also keep in mind that this graph is small for simplicity. It could be that transitive dependents happen much further down the subgraph than this example happens to have.





I did a bunch of paper research and indeed there are numerous solutions, however they don't have the performance characteristics I'm looking for. The graph is incrementally created over time, and at each stage I need to be able to answer this question so traversing the entire graph each time is a dealbreaker.



I think I have a major advantage that isn't referenced in the various approaches I could find: I have full control over the creation of the graph and the dependents are added in reverse topological order, so the graph is correctly sorted. With that in mind, I considered the obvious solution of computing the answer incrementally (dynamic programming).



I figured a bitmask would be a fast way to store and look up what dependents a given node has. When a dependent is added to a node, I would update that node's mask to include the bits of that dependent (which itself includes its dependents and so on)



let maskCounter = 0;

class Node {
constructor(name) {
this.name = name;
this.dependents = ;
this.mask = 1 << maskCounter;
maskCounter++;
}

addDependent(dependent) {
// Now our mask contains the bits representing all of
// its direct and transitive dependents
this.mask = this.mask | dependent.mask;

// Need to see if this dependent has a transitive
// dependent of its own that exists in one of the groups
for (const group of this.dependents) {
const result = group.mask & dependent.mask;

if (result) {
group.mask |= dependent.mask;
group.values.push(dependent);
return;
}
}

// If reached, this dependent has no transitive dependents
// of its own with any of this node's other dependents.
// That's confusing, huh?
this.dependents.push({
mask: dependent.mask,
values: [dependent]
});
}
}


The dependents would, however, need to be added in reverse order up the graph so that the graph sorted correctly and the top of the graph contains the masks of all its dependents.



const a = new Node('a');
const b = new Node('b');
const c = new Node('c');
const d = new Node('d');
const e = new Node('e');
const f = new Node('f');
const g = new Node('g');

b.addDependent(c);
b.addDependent(d);
c.addDependent(g);
d.addDependent(g);
e.addDependent(f);
a.addDependent(c);
a.addDependent(d);
a.addDependent(e);
a.addDependent(f);


The bitmasks would incrementally look like this:



b = b 00000010 | c 00000100
b = b 00000110 | d 00001000
c = c 00000100 | g 01000000
d = d 00001000 | g 01000000
e = e 00010000 | f 00100000
a = a 00000001 | c 01000100
a = a 01000101 | d 01001000
a = a 01001101 | e 00110000
a = a 01111101 | f 00100000
===========================
a = 01111101


At the end a has a mask of 01111101, each bit represents each of its downstream transitive dependents. Notice that the second to last bit is not flipped, that's the bit for b which does not depend on a at all.



If we look at the resulting value of a.dependents we see:



[
{ values: [c, d], mask: 0b00110000 },
{ values: [e, f], mask: 0b01001100 }
]


which provides the answer we're looking for, ultimately a set of sets. a.dependents.map(group => group.values)--this an array aka list but it's being used as a set for simplicity.



Here's a JSBin: https://jsbin.com/jexofip/edit?js,console



This works, and CPU-wise is acceptable because I'll very frequently need to know the grouped dependents but dependents change far less often.



The example above uses JavaScript for simplicity of demo, which uses 32-bit signed integers for bitwise operations so we can only create 31 unique nodes. We could use an arbitrary precision integer (e.g. BigInt) to create a "unlimited" number of nodes, but the problem is memory usage.



Because each node needs their own unique bit flipped, I think the memory usage is:



N * (N + 1) / 2 = bits      (where N = number of nodes)

e.g. 10,000 nodes is about 6.25 megabytes!


That's excluding any platform overhead for using arbitrary precision integers (or a similar custom data structure).



In my use case, it would be common to have 10k+ and in fact 100k+ is probable in some cases (625 MB !!!) and it's also possible for new nodes to be created indefinitely, using an infinite amount of memory, so this solution isn't practical because there is no easy way to "garbage collect" no longer used mask bits from nodes that drop off the graph--it's of course possible, but it's the traditional GC problem I'd like to avoid if possible.



Side note: depending on the size and depth of the graph, this might not actually perform well either. Even though bitwise operations themselves are relatively fast, doing it on on a BigInt of 100,000 bits for each node at the top of the graph is not. So completely rethinking my approach is also welcome.





Ultimately trading memory for CPU is the usual give and take, but I'm wondering if perhaps I'm missing another approach that strikes a better balance or requires significantly less memory?



There may be other unique considerations I haven't thought of that could be used.



School me!







graph-theory dataflow directed-graph transitive-dependency transitive-closure






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 '18 at 1:59







jayphelps

















asked Nov 17 '18 at 4:28









jayphelpsjayphelps

10.8k22237




10.8k22237













  • This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

    – philipxy
    Nov 17 '18 at 4:56













  • @philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

    – jayphelps
    Nov 17 '18 at 5:00











  • @philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

    – jayphelps
    Nov 17 '18 at 5:01






  • 1





    "the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

    – user2357112
    Nov 21 '18 at 23:36






  • 1





    How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

    – user2357112
    Nov 21 '18 at 23:40



















  • This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

    – philipxy
    Nov 17 '18 at 4:56













  • @philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

    – jayphelps
    Nov 17 '18 at 5:00











  • @philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

    – jayphelps
    Nov 17 '18 at 5:01






  • 1





    "the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

    – user2357112
    Nov 21 '18 at 23:36






  • 1





    How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

    – user2357112
    Nov 21 '18 at 23:40

















This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

– philipxy
Nov 17 '18 at 4:56







This does not clearly define you want--output as a function of input. It's not obvious from your example & "group its dependents by their own transitive dependents" is not clear.

– philipxy
Nov 17 '18 at 4:56















@philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

– jayphelps
Nov 17 '18 at 5:00





@philipxy sorry about that. I edited it a bit to hopefully clarify. Given a node in a dependency graph, group its dependents by their own transitive dependents. E.g. here are the two groups in the example [[b, c], [e, f]. I've achieved that the with the example code using bitmasks, however my goal is to find an approach that doesn't use nearly as much memory and hopefully is similar (or better) in speed. Does that make sense?

– jayphelps
Nov 17 '18 at 5:00













@philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

– jayphelps
Nov 17 '18 at 5:01





@philipxy just realized I replied with the exact same phrase you mentioned isn't clear. I'm not sure how to reword, I'll give it some thought. Thanks!

– jayphelps
Nov 17 '18 at 5:01




1




1





"the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

– user2357112
Nov 21 '18 at 23:36





"the dependents are added in reverse topological order" and "Also, the graph can also contain cycles" don't make sense together. Topological order only make sense in a directed acyclic graph. We need a graph theoretically-consistent specification of this problem before we can know how to answer it.

– user2357112
Nov 21 '18 at 23:36




1




1





How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

– user2357112
Nov 21 '18 at 23:40





How sparse or dense are you expecting this graph to be? How many nodes do you need to perform this dependency grouping on, and how frequently, relative to graph updates? Do you know the nodes of interest in advance?

– user2357112
Nov 21 '18 at 23:40












3 Answers
3






active

oldest

votes


















1





+200









Storing the 'reachable' nodes for each node as a bitmask and doing a bitwise AND certainly sounds hard to beat computationally. If the main issue with this is the high memory usage then perhaps this could be seen as a memory compression issue.



If the bitmasks are very sparse (lots of zeros) there is a chance they will compress down to a much smaller size.



I image you'll want to find a compression library that could de-compress the bitmasks as a stream. That way you could do the bitwise AND as it decompresses - allowing you to avoid storing the fully decompresses bitmask.






share|improve this answer



















  • 1





    I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

    – jayphelps
    Nov 29 '18 at 0:07





















3





+50









The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:



a->bcd, bc->e, cd->f



Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:



a->bd, bc->e, cd->f



Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.



One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.



Here is an implementation:






function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>








share|improve this answer


























  • Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

    – jayphelps
    Nov 19 '18 at 18:32













  • I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

    – jayphelps
    Nov 19 '18 at 18:35











  • Of course correct me if I'm wrong or missing your point!

    – jayphelps
    Nov 19 '18 at 18:36






  • 1





    @jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

    – abacabadabacaba
    Nov 19 '18 at 19:14











  • Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

    – jayphelps
    Nov 19 '18 at 22:16



















1














If it's a directed acyclic graph, you can perform topological sorting of the nodes, and that seems like a good basis for subsequent steps. Toposorting itself can be done efficiently. There are implementations in FRP-inspired libraries such as my crosslink or paldepind's flyd



Also, check out this answer.






share|improve this answer
























  • Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

    – jayphelps
    Nov 17 '18 at 6:49













  • ... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

    – Robert Monfera
    Nov 17 '18 at 6:49











  • @jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

    – Robert Monfera
    Nov 17 '18 at 6:53











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348217%2fmore-efficiently-compute-transitive-closures-of-each-dependents-while-incrementa%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









1





+200









Storing the 'reachable' nodes for each node as a bitmask and doing a bitwise AND certainly sounds hard to beat computationally. If the main issue with this is the high memory usage then perhaps this could be seen as a memory compression issue.



If the bitmasks are very sparse (lots of zeros) there is a chance they will compress down to a much smaller size.



I image you'll want to find a compression library that could de-compress the bitmasks as a stream. That way you could do the bitwise AND as it decompresses - allowing you to avoid storing the fully decompresses bitmask.






share|improve this answer



















  • 1





    I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

    – jayphelps
    Nov 29 '18 at 0:07


















1





+200









Storing the 'reachable' nodes for each node as a bitmask and doing a bitwise AND certainly sounds hard to beat computationally. If the main issue with this is the high memory usage then perhaps this could be seen as a memory compression issue.



If the bitmasks are very sparse (lots of zeros) there is a chance they will compress down to a much smaller size.



I image you'll want to find a compression library that could de-compress the bitmasks as a stream. That way you could do the bitwise AND as it decompresses - allowing you to avoid storing the fully decompresses bitmask.






share|improve this answer



















  • 1





    I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

    – jayphelps
    Nov 29 '18 at 0:07
















1





+200







1





+200



1




+200





Storing the 'reachable' nodes for each node as a bitmask and doing a bitwise AND certainly sounds hard to beat computationally. If the main issue with this is the high memory usage then perhaps this could be seen as a memory compression issue.



If the bitmasks are very sparse (lots of zeros) there is a chance they will compress down to a much smaller size.



I image you'll want to find a compression library that could de-compress the bitmasks as a stream. That way you could do the bitwise AND as it decompresses - allowing you to avoid storing the fully decompresses bitmask.






share|improve this answer













Storing the 'reachable' nodes for each node as a bitmask and doing a bitwise AND certainly sounds hard to beat computationally. If the main issue with this is the high memory usage then perhaps this could be seen as a memory compression issue.



If the bitmasks are very sparse (lots of zeros) there is a chance they will compress down to a much smaller size.



I image you'll want to find a compression library that could de-compress the bitmasks as a stream. That way you could do the bitwise AND as it decompresses - allowing you to avoid storing the fully decompresses bitmask.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 28 '18 at 9:01









AshleyAshley

543212




543212








  • 1





    I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

    – jayphelps
    Nov 29 '18 at 0:07
















  • 1





    I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

    – jayphelps
    Nov 29 '18 at 0:07










1




1





I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

– jayphelps
Nov 29 '18 at 0:07







I originally wanted to avoid compression because I naively thought that it would be slower (decompression usually is) but I discovered Roaring Bitmaps roaringbitmap.org and did some profiling and discover it is in fact faster in many cases where I have an extremely large graph. Presumably because a very large BigInt requires more memory alloc and checking the intersection between two of them has to check every bit and they're often very sparse, whereas Roaring Bitmaps doesn't (complex implementation, I recommend reading about it). Very cool! Thank you!

– jayphelps
Nov 29 '18 at 0:07















3





+50









The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:



a->bcd, bc->e, cd->f



Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:



a->bd, bc->e, cd->f



Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.



One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.



Here is an implementation:






function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>








share|improve this answer


























  • Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

    – jayphelps
    Nov 19 '18 at 18:32













  • I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

    – jayphelps
    Nov 19 '18 at 18:35











  • Of course correct me if I'm wrong or missing your point!

    – jayphelps
    Nov 19 '18 at 18:36






  • 1





    @jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

    – abacabadabacaba
    Nov 19 '18 at 19:14











  • Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

    – jayphelps
    Nov 19 '18 at 22:16
















3





+50









The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:



a->bcd, bc->e, cd->f



Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:



a->bd, bc->e, cd->f



Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.



One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.



Here is an implementation:






function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>








share|improve this answer


























  • Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

    – jayphelps
    Nov 19 '18 at 18:32













  • I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

    – jayphelps
    Nov 19 '18 at 18:35











  • Of course correct me if I'm wrong or missing your point!

    – jayphelps
    Nov 19 '18 at 18:36






  • 1





    @jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

    – abacabadabacaba
    Nov 19 '18 at 19:14











  • Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

    – jayphelps
    Nov 19 '18 at 22:16














3





+50







3





+50



3




+50





The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:



a->bcd, bc->e, cd->f



Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:



a->bd, bc->e, cd->f



Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.



One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.



Here is an implementation:






function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>








share|improve this answer















The relation you want to group by is not an equivalence relation. For example, consider this dependency graph:



a->bcd, bc->e, cd->f



Here, b and c have a common dependent, and so do c and d, but there are no common dependent between b and d. In this case, you probably want to have b, c and d in the same group. However, it gets trickier with this case:



a->bd, bc->e, cd->f



Here, a doesn't depend on c, so you may want to have b and d in separate groups, now that you don't need to care about c. However, there is a class of algorithms which will group b and d together in this case: algorithms which maintain grouping of all nodes, and use this as a basis for grouping new node's direct descendants.



One such algorithm uses a disjoint-set structure to efficiently track which nodes are connected. In my example, before processing a, the algorithm will have nodes b, c, d, e, and f all in the same set, so they will be grouped together.



Here is an implementation:






function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>








function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>





function find(node) {
return node.parent == null ? node : (node.parent = find(node.parent));
}

function merge(a, b) {
a = find(a);
b = find(b);
if (a.rank < b.rank) {
a.parent = b;
} else {
b.parent = a;
if (a.rank == b.rank) {
++a.rank;
}
}
}

class Node {
constructor(name, dependents) {
this.name = name;
this.parent = null;
this.rank = 0;
let depMap = new Map();
for (let d of dependents) {
let dset = find(d);
if (!depMap.has(dset)) {
depMap.set(dset, );
}
depMap.get(dset).push(d);
}
output += name + ': ' + Array.from(depMap.values()).map(a => '{' + a.join(', ') + '}').join(', ') + 'n';
for (let d of depMap.keys()) {
// or: for (let d of dependents) {
merge(this, d);
}
}

toString() {
return this.name;
}
}

let output = '';
const f = new Node('f', );
const e = new Node('e', [f]);
const d = new Node('d', );
const c = new Node('c', [d]);
const b = new Node('b', [d]);
const a = new Node('a', [b, c, e, f]);
document.getElementById('output').textContent = output;

<pre id=output></pre>






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '18 at 19:04

























answered Nov 19 '18 at 17:58









abacabadabacabaabacabadabacaba

2,3641615




2,3641615













  • Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

    – jayphelps
    Nov 19 '18 at 18:32













  • I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

    – jayphelps
    Nov 19 '18 at 18:35











  • Of course correct me if I'm wrong or missing your point!

    – jayphelps
    Nov 19 '18 at 18:36






  • 1





    @jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

    – abacabadabacaba
    Nov 19 '18 at 19:14











  • Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

    – jayphelps
    Nov 19 '18 at 22:16



















  • Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

    – jayphelps
    Nov 19 '18 at 18:32













  • I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

    – jayphelps
    Nov 19 '18 at 18:35











  • Of course correct me if I'm wrong or missing your point!

    – jayphelps
    Nov 19 '18 at 18:36






  • 1





    @jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

    – abacabadabacaba
    Nov 19 '18 at 19:14











  • Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

    – jayphelps
    Nov 19 '18 at 22:16

















Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

– jayphelps
Nov 19 '18 at 18:32







Thanks for the thorough answer! I want to clarify a few things to make sure we're on the same page: In the first example, my code would produce [[b, c, d]] all grouped together, which is correct for my use case. In the second example my code would produce: [[b], [d]] which is also correct because a change to a has no impact on c and so b and d have no transitive relation in regards to a--they do have a transitive relation in general via c as you mention, but that relation isn't applicable in this use case.

– jayphelps
Nov 19 '18 at 18:32















I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

– jayphelps
Nov 19 '18 at 18:35





I think that's a very interesting point that I failed to note in my question, that it's not just any transitive relation, it's only those which would be impacted by a. Not sure of the term for that. So with this in mind, can you clarify your comment "This, however, means that algorithms that maintain grouping of all vertices processed so far won't work"?

– jayphelps
Nov 19 '18 at 18:35













Of course correct me if I'm wrong or missing your point!

– jayphelps
Nov 19 '18 at 18:36





Of course correct me if I'm wrong or missing your point!

– jayphelps
Nov 19 '18 at 18:36




1




1





@jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

– abacabadabacaba
Nov 19 '18 at 19:14





@jayphelps I edited my answer to make it more clear. My algorithm doesn't always produce the exact result that you need (sometimes it puts nodes in the same group when they should be in different groups), but it is much more efficient than your bitmask-based algorithm, so this may be an acceptable trade-off.

– abacabadabacaba
Nov 19 '18 at 19:14













Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

– jayphelps
Nov 19 '18 at 22:16





Gotcha. Thanks again! Unfortunately I believe it is not an acceptable tradeoff for my use case.

– jayphelps
Nov 19 '18 at 22:16











1














If it's a directed acyclic graph, you can perform topological sorting of the nodes, and that seems like a good basis for subsequent steps. Toposorting itself can be done efficiently. There are implementations in FRP-inspired libraries such as my crosslink or paldepind's flyd



Also, check out this answer.






share|improve this answer
























  • Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

    – jayphelps
    Nov 17 '18 at 6:49













  • ... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

    – Robert Monfera
    Nov 17 '18 at 6:49











  • @jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

    – Robert Monfera
    Nov 17 '18 at 6:53
















1














If it's a directed acyclic graph, you can perform topological sorting of the nodes, and that seems like a good basis for subsequent steps. Toposorting itself can be done efficiently. There are implementations in FRP-inspired libraries such as my crosslink or paldepind's flyd



Also, check out this answer.






share|improve this answer
























  • Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

    – jayphelps
    Nov 17 '18 at 6:49













  • ... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

    – Robert Monfera
    Nov 17 '18 at 6:49











  • @jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

    – Robert Monfera
    Nov 17 '18 at 6:53














1












1








1







If it's a directed acyclic graph, you can perform topological sorting of the nodes, and that seems like a good basis for subsequent steps. Toposorting itself can be done efficiently. There are implementations in FRP-inspired libraries such as my crosslink or paldepind's flyd



Also, check out this answer.






share|improve this answer













If it's a directed acyclic graph, you can perform topological sorting of the nodes, and that seems like a good basis for subsequent steps. Toposorting itself can be done efficiently. There are implementations in FRP-inspired libraries such as my crosslink or paldepind's flyd



Also, check out this answer.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 6:37









Robert MonferaRobert Monfera

8261911




8261911













  • Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

    – jayphelps
    Nov 17 '18 at 6:49













  • ... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

    – Robert Monfera
    Nov 17 '18 at 6:49











  • @jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

    – Robert Monfera
    Nov 17 '18 at 6:53



















  • Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

    – jayphelps
    Nov 17 '18 at 6:49













  • ... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

    – Robert Monfera
    Nov 17 '18 at 6:49











  • @jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

    – Robert Monfera
    Nov 17 '18 at 6:53

















Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

– jayphelps
Nov 17 '18 at 6:49







Thanks for the link! I may be misreading but it looks like the linked answer is effectively what I’m doing already, is it not? I’m not sorting because the graph is being built up incrementally already in correct topological order.

– jayphelps
Nov 17 '18 at 6:49















... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

– Robert Monfera
Nov 17 '18 at 6:49





... looks like this referred approach can handle circularities too cs.hut.fi/~enu/thesis.html, based on Tarjan's algorithm for strong component detection en.wikipedia.org/wiki/…

– Robert Monfera
Nov 17 '18 at 6:49













@jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

– Robert Monfera
Nov 17 '18 at 6:53





@jayphelps I'm not sure, take a look at the time and space complexity in the above link with Tarján's; of course O(whatever) doesn't make constant factors unimportant

– Robert Monfera
Nov 17 '18 at 6:53


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348217%2fmore-efficiently-compute-transitive-closures-of-each-dependents-while-incrementa%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini