What is the Apache Beam way to handle 'routing'
up vote
0
down vote
favorite
Using Apache Beam I am doing computations - and if they succeed I'd like to write the output to one sink, and if there is a failure I'd like to write that to another sink.
Is there any way to handle metadata or content based routing in Apache Beam?
I've used Apache Camel extensively, and so in my mind based on the outcome of a previous transform, I should route a message to a different sink using a router (perhaps determined by a metadata flag I set on the message header). Is there an analogous capability with Apache Beam, or would I instead just have a sequential transform that inspects the PCollection and handles writing to sinks within the transform?
Ideally I'd like this logic (written verbosely for attempted clarity)
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | ([success_failure_router]
| 'sucess_sink' >> beam.io.WriteToText('/path/to/file')
| 'failure_sink' >> beam.io.WriteStringsToPubSub('mytopic'))
However.. I suspect the 'Beam' way of handling this is
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | 'write_results_appropriately' >> write_results_appropriately(result))
...
def write_results_appropriately(result):
if result == ..:
# success, write to file
else:
# failure, write to topic
Thanks,
Kevin
google-cloud-dataflow apache-beam
add a comment |
up vote
0
down vote
favorite
Using Apache Beam I am doing computations - and if they succeed I'd like to write the output to one sink, and if there is a failure I'd like to write that to another sink.
Is there any way to handle metadata or content based routing in Apache Beam?
I've used Apache Camel extensively, and so in my mind based on the outcome of a previous transform, I should route a message to a different sink using a router (perhaps determined by a metadata flag I set on the message header). Is there an analogous capability with Apache Beam, or would I instead just have a sequential transform that inspects the PCollection and handles writing to sinks within the transform?
Ideally I'd like this logic (written verbosely for attempted clarity)
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | ([success_failure_router]
| 'sucess_sink' >> beam.io.WriteToText('/path/to/file')
| 'failure_sink' >> beam.io.WriteStringsToPubSub('mytopic'))
However.. I suspect the 'Beam' way of handling this is
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | 'write_results_appropriately' >> write_results_appropriately(result))
...
def write_results_appropriately(result):
if result == ..:
# success, write to file
else:
# failure, write to topic
Thanks,
Kevin
google-cloud-dataflow apache-beam
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Using Apache Beam I am doing computations - and if they succeed I'd like to write the output to one sink, and if there is a failure I'd like to write that to another sink.
Is there any way to handle metadata or content based routing in Apache Beam?
I've used Apache Camel extensively, and so in my mind based on the outcome of a previous transform, I should route a message to a different sink using a router (perhaps determined by a metadata flag I set on the message header). Is there an analogous capability with Apache Beam, or would I instead just have a sequential transform that inspects the PCollection and handles writing to sinks within the transform?
Ideally I'd like this logic (written verbosely for attempted clarity)
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | ([success_failure_router]
| 'sucess_sink' >> beam.io.WriteToText('/path/to/file')
| 'failure_sink' >> beam.io.WriteStringsToPubSub('mytopic'))
However.. I suspect the 'Beam' way of handling this is
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | 'write_results_appropriately' >> write_results_appropriately(result))
...
def write_results_appropriately(result):
if result == ..:
# success, write to file
else:
# failure, write to topic
Thanks,
Kevin
google-cloud-dataflow apache-beam
Using Apache Beam I am doing computations - and if they succeed I'd like to write the output to one sink, and if there is a failure I'd like to write that to another sink.
Is there any way to handle metadata or content based routing in Apache Beam?
I've used Apache Camel extensively, and so in my mind based on the outcome of a previous transform, I should route a message to a different sink using a router (perhaps determined by a metadata flag I set on the message header). Is there an analogous capability with Apache Beam, or would I instead just have a sequential transform that inspects the PCollection and handles writing to sinks within the transform?
Ideally I'd like this logic (written verbosely for attempted clarity)
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | ([success_failure_router]
| 'sucess_sink' >> beam.io.WriteToText('/path/to/file')
| 'failure_sink' >> beam.io.WriteStringsToPubSub('mytopic'))
However.. I suspect the 'Beam' way of handling this is
result = my_pcollections | 'compute_stuff' >> beam.Map(lambda (pcollection): my_compute_func(pcollection))
result | 'write_results_appropriately' >> write_results_appropriately(result))
...
def write_results_appropriately(result):
if result == ..:
# success, write to file
else:
# failure, write to topic
Thanks,
Kevin
google-cloud-dataflow apache-beam
google-cloud-dataflow apache-beam
asked Nov 8 at 18:39
Kevin
6952710
6952710
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
High-level:
I am not sure of specifics of the Python API in this case, but from high level it looks like this:
- par-dos support multiple outputs;
- outputs are identified by the tag you give them (e.g. "correct-elements", "invalid-elements");
- in your main par-do you write to multiple outputs choosing the output using your criteria;
- each output is represented by a separate
PCollection
; - then you get the separate
PCollections
representing the tagged outputs from your par-do; - then apply different sinks to each of the tagged
PCollections
;
In detail see the section
https://beam.apache.org/documentation/programming-guide/#additional-outputs
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
High-level:
I am not sure of specifics of the Python API in this case, but from high level it looks like this:
- par-dos support multiple outputs;
- outputs are identified by the tag you give them (e.g. "correct-elements", "invalid-elements");
- in your main par-do you write to multiple outputs choosing the output using your criteria;
- each output is represented by a separate
PCollection
; - then you get the separate
PCollections
representing the tagged outputs from your par-do; - then apply different sinks to each of the tagged
PCollections
;
In detail see the section
https://beam.apache.org/documentation/programming-guide/#additional-outputs
add a comment |
up vote
0
down vote
High-level:
I am not sure of specifics of the Python API in this case, but from high level it looks like this:
- par-dos support multiple outputs;
- outputs are identified by the tag you give them (e.g. "correct-elements", "invalid-elements");
- in your main par-do you write to multiple outputs choosing the output using your criteria;
- each output is represented by a separate
PCollection
; - then you get the separate
PCollections
representing the tagged outputs from your par-do; - then apply different sinks to each of the tagged
PCollections
;
In detail see the section
https://beam.apache.org/documentation/programming-guide/#additional-outputs
add a comment |
up vote
0
down vote
up vote
0
down vote
High-level:
I am not sure of specifics of the Python API in this case, but from high level it looks like this:
- par-dos support multiple outputs;
- outputs are identified by the tag you give them (e.g. "correct-elements", "invalid-elements");
- in your main par-do you write to multiple outputs choosing the output using your criteria;
- each output is represented by a separate
PCollection
; - then you get the separate
PCollections
representing the tagged outputs from your par-do; - then apply different sinks to each of the tagged
PCollections
;
In detail see the section
https://beam.apache.org/documentation/programming-guide/#additional-outputs
High-level:
I am not sure of specifics of the Python API in this case, but from high level it looks like this:
- par-dos support multiple outputs;
- outputs are identified by the tag you give them (e.g. "correct-elements", "invalid-elements");
- in your main par-do you write to multiple outputs choosing the output using your criteria;
- each output is represented by a separate
PCollection
; - then you get the separate
PCollections
representing the tagged outputs from your par-do; - then apply different sinks to each of the tagged
PCollections
;
In detail see the section
https://beam.apache.org/documentation/programming-guide/#additional-outputs
answered Nov 8 at 19:45
Anton
789115
789115
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53214163%2fwhat-is-the-apache-beam-way-to-handle-routing%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown