Python Apache Beam: date value out of range
up vote
1
down vote
favorite
Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:
OverflowError: date value out of range [while running 'Format']
My Beam Pipeline is this:
Bigquery = (transformation
| 'Format' >> beam.ParDo(FormatBigQueryoFn())
| 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
'XXXX',
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)))
In class FormatBigQueryoFn is where it should be the logic of the window data time
The code of exmple 1:
def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
"""Converts a unix timestamp into a formatted string."""
return datetime.fromtimestamp(t).strftime(fmt)
class TeamScoresDict(beam.DoFn):
"""Formats the data into a dictionary of BigQuery columns with their values
Receives a (team, score) pair, extracts the window start timestamp, and
formats everything together into a dictionary. The dictionary is in the format
{'bigquery_column': value}
"""
def process(self, team_score, window=beam.DoFn.WindowParam):
team, score = team_score
start = timestamp2str(int(window.start))
yield {
'team': team,
'total_score': score,
'window_start': start,
'processing_time': timestamp2str(int(time.time()))
}
The code of example 2:
class FormatDoFn(beam.DoFn):
def process(self, element, window=beam.DoFn.WindowParam):
ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
window_start = window.start.to_utc_datetime().strftime(ts_format)
window_end = window.end.to_utc_datetime().strftime(ts_format)
return [{'word': element[0],
'count': element[1],
'window_start':window_start,
'window_end':window_end}]
What could be wrong in my pipeline?
EDIT:
If I print, for example, the window.start i get:
Timestamp(-9223372036860)
python google-cloud-dataflow apache-beam
add a comment |
up vote
1
down vote
favorite
Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:
OverflowError: date value out of range [while running 'Format']
My Beam Pipeline is this:
Bigquery = (transformation
| 'Format' >> beam.ParDo(FormatBigQueryoFn())
| 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
'XXXX',
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)))
In class FormatBigQueryoFn is where it should be the logic of the window data time
The code of exmple 1:
def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
"""Converts a unix timestamp into a formatted string."""
return datetime.fromtimestamp(t).strftime(fmt)
class TeamScoresDict(beam.DoFn):
"""Formats the data into a dictionary of BigQuery columns with their values
Receives a (team, score) pair, extracts the window start timestamp, and
formats everything together into a dictionary. The dictionary is in the format
{'bigquery_column': value}
"""
def process(self, team_score, window=beam.DoFn.WindowParam):
team, score = team_score
start = timestamp2str(int(window.start))
yield {
'team': team,
'total_score': score,
'window_start': start,
'processing_time': timestamp2str(int(time.time()))
}
The code of example 2:
class FormatDoFn(beam.DoFn):
def process(self, element, window=beam.DoFn.WindowParam):
ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
window_start = window.start.to_utc_datetime().strftime(ts_format)
window_end = window.end.to_utc_datetime().strftime(ts_format)
return [{'word': element[0],
'count': element[1],
'window_start':window_start,
'window_end':window_end}]
What could be wrong in my pipeline?
EDIT:
If I print, for example, the window.start i get:
Timestamp(-9223372036860)
python google-cloud-dataflow apache-beam
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:
OverflowError: date value out of range [while running 'Format']
My Beam Pipeline is this:
Bigquery = (transformation
| 'Format' >> beam.ParDo(FormatBigQueryoFn())
| 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
'XXXX',
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)))
In class FormatBigQueryoFn is where it should be the logic of the window data time
The code of exmple 1:
def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
"""Converts a unix timestamp into a formatted string."""
return datetime.fromtimestamp(t).strftime(fmt)
class TeamScoresDict(beam.DoFn):
"""Formats the data into a dictionary of BigQuery columns with their values
Receives a (team, score) pair, extracts the window start timestamp, and
formats everything together into a dictionary. The dictionary is in the format
{'bigquery_column': value}
"""
def process(self, team_score, window=beam.DoFn.WindowParam):
team, score = team_score
start = timestamp2str(int(window.start))
yield {
'team': team,
'total_score': score,
'window_start': start,
'processing_time': timestamp2str(int(time.time()))
}
The code of example 2:
class FormatDoFn(beam.DoFn):
def process(self, element, window=beam.DoFn.WindowParam):
ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
window_start = window.start.to_utc_datetime().strftime(ts_format)
window_end = window.end.to_utc_datetime().strftime(ts_format)
return [{'word': element[0],
'count': element[1],
'window_start':window_start,
'window_end':window_end}]
What could be wrong in my pipeline?
EDIT:
If I print, for example, the window.start i get:
Timestamp(-9223372036860)
python google-cloud-dataflow apache-beam
Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:
OverflowError: date value out of range [while running 'Format']
My Beam Pipeline is this:
Bigquery = (transformation
| 'Format' >> beam.ParDo(FormatBigQueryoFn())
| 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
'XXXX',
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)))
In class FormatBigQueryoFn is where it should be the logic of the window data time
The code of exmple 1:
def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
"""Converts a unix timestamp into a formatted string."""
return datetime.fromtimestamp(t).strftime(fmt)
class TeamScoresDict(beam.DoFn):
"""Formats the data into a dictionary of BigQuery columns with their values
Receives a (team, score) pair, extracts the window start timestamp, and
formats everything together into a dictionary. The dictionary is in the format
{'bigquery_column': value}
"""
def process(self, team_score, window=beam.DoFn.WindowParam):
team, score = team_score
start = timestamp2str(int(window.start))
yield {
'team': team,
'total_score': score,
'window_start': start,
'processing_time': timestamp2str(int(time.time()))
}
The code of example 2:
class FormatDoFn(beam.DoFn):
def process(self, element, window=beam.DoFn.WindowParam):
ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
window_start = window.start.to_utc_datetime().strftime(ts_format)
window_end = window.end.to_utc_datetime().strftime(ts_format)
return [{'word': element[0],
'count': element[1],
'window_start':window_start,
'window_end':window_end}]
What could be wrong in my pipeline?
EDIT:
If I print, for example, the window.start i get:
Timestamp(-9223372036860)
python google-cloud-dataflow apache-beam
python google-cloud-dataflow apache-beam
edited Nov 8 at 8:52
asked Nov 7 at 17:07
IoT user
808
808
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
The problem was I was reading the data from a file before to test it with Google Pub/Sub.
While I was reading the data from a file the elements doesnt have timestamp.
Is a must to have a timestamp in your element.
Pub/Sub attach this timestamp automatically.
From documentation:
The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
The problem was I was reading the data from a file before to test it with Google Pub/Sub.
While I was reading the data from a file the elements doesnt have timestamp.
Is a must to have a timestamp in your element.
Pub/Sub attach this timestamp automatically.
From documentation:
The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.
add a comment |
up vote
1
down vote
accepted
The problem was I was reading the data from a file before to test it with Google Pub/Sub.
While I was reading the data from a file the elements doesnt have timestamp.
Is a must to have a timestamp in your element.
Pub/Sub attach this timestamp automatically.
From documentation:
The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
The problem was I was reading the data from a file before to test it with Google Pub/Sub.
While I was reading the data from a file the elements doesnt have timestamp.
Is a must to have a timestamp in your element.
Pub/Sub attach this timestamp automatically.
From documentation:
The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.
The problem was I was reading the data from a file before to test it with Google Pub/Sub.
While I was reading the data from a file the elements doesnt have timestamp.
Is a must to have a timestamp in your element.
Pub/Sub attach this timestamp automatically.
From documentation:
The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.
edited Nov 8 at 9:16
answered Nov 8 at 9:11
IoT user
808
808
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194377%2fpython-apache-beam-date-value-out-of-range%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown