Python Apache Beam: date value out of range











up vote
1
down vote

favorite












Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:



OverflowError: date value out of range [while running 'Format']



My Beam Pipeline is this:



Bigquery = (transformation
| 'Format' >> beam.ParDo(FormatBigQueryoFn())
| 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
'XXXX',
schema=TABLE_SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)))


In class FormatBigQueryoFn is where it should be the logic of the window data time



The code of exmple 1:



def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
"""Converts a unix timestamp into a formatted string."""
return datetime.fromtimestamp(t).strftime(fmt)

class TeamScoresDict(beam.DoFn):
"""Formats the data into a dictionary of BigQuery columns with their values
Receives a (team, score) pair, extracts the window start timestamp, and
formats everything together into a dictionary. The dictionary is in the format
{'bigquery_column': value}
"""

def process(self, team_score, window=beam.DoFn.WindowParam):
team, score = team_score
start = timestamp2str(int(window.start))
yield {
'team': team,
'total_score': score,
'window_start': start,
'processing_time': timestamp2str(int(time.time()))
}


The code of example 2:



class FormatDoFn(beam.DoFn):
def process(self, element, window=beam.DoFn.WindowParam):
ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
window_start = window.start.to_utc_datetime().strftime(ts_format)
window_end = window.end.to_utc_datetime().strftime(ts_format)
return [{'word': element[0],
'count': element[1],
'window_start':window_start,
'window_end':window_end}]


What could be wrong in my pipeline?



EDIT:



If I print, for example, the window.start i get:



Timestamp(-9223372036860)









share|improve this question




























    up vote
    1
    down vote

    favorite












    Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:



    OverflowError: date value out of range [while running 'Format']



    My Beam Pipeline is this:



    Bigquery = (transformation
    | 'Format' >> beam.ParDo(FormatBigQueryoFn())
    | 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
    'XXXX',
    schema=TABLE_SCHEMA,
    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
    )))


    In class FormatBigQueryoFn is where it should be the logic of the window data time



    The code of exmple 1:



    def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
    """Converts a unix timestamp into a formatted string."""
    return datetime.fromtimestamp(t).strftime(fmt)

    class TeamScoresDict(beam.DoFn):
    """Formats the data into a dictionary of BigQuery columns with their values
    Receives a (team, score) pair, extracts the window start timestamp, and
    formats everything together into a dictionary. The dictionary is in the format
    {'bigquery_column': value}
    """

    def process(self, team_score, window=beam.DoFn.WindowParam):
    team, score = team_score
    start = timestamp2str(int(window.start))
    yield {
    'team': team,
    'total_score': score,
    'window_start': start,
    'processing_time': timestamp2str(int(time.time()))
    }


    The code of example 2:



    class FormatDoFn(beam.DoFn):
    def process(self, element, window=beam.DoFn.WindowParam):
    ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
    window_start = window.start.to_utc_datetime().strftime(ts_format)
    window_end = window.end.to_utc_datetime().strftime(ts_format)
    return [{'word': element[0],
    'count': element[1],
    'window_start':window_start,
    'window_end':window_end}]


    What could be wrong in my pipeline?



    EDIT:



    If I print, for example, the window.start i get:



    Timestamp(-9223372036860)









    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:



      OverflowError: date value out of range [while running 'Format']



      My Beam Pipeline is this:



      Bigquery = (transformation
      | 'Format' >> beam.ParDo(FormatBigQueryoFn())
      | 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
      'XXXX',
      schema=TABLE_SCHEMA,
      create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
      write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
      )))


      In class FormatBigQueryoFn is where it should be the logic of the window data time



      The code of exmple 1:



      def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
      """Converts a unix timestamp into a formatted string."""
      return datetime.fromtimestamp(t).strftime(fmt)

      class TeamScoresDict(beam.DoFn):
      """Formats the data into a dictionary of BigQuery columns with their values
      Receives a (team, score) pair, extracts the window start timestamp, and
      formats everything together into a dictionary. The dictionary is in the format
      {'bigquery_column': value}
      """

      def process(self, team_score, window=beam.DoFn.WindowParam):
      team, score = team_score
      start = timestamp2str(int(window.start))
      yield {
      'team': team,
      'total_score': score,
      'window_start': start,
      'processing_time': timestamp2str(int(time.time()))
      }


      The code of example 2:



      class FormatDoFn(beam.DoFn):
      def process(self, element, window=beam.DoFn.WindowParam):
      ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
      window_start = window.start.to_utc_datetime().strftime(ts_format)
      window_end = window.end.to_utc_datetime().strftime(ts_format)
      return [{'word': element[0],
      'count': element[1],
      'window_start':window_start,
      'window_end':window_end}]


      What could be wrong in my pipeline?



      EDIT:



      If I print, for example, the window.start i get:



      Timestamp(-9223372036860)









      share|improve this question















      Applying this or this example to build my program, each time I try to insert to Big Query, I have this error:



      OverflowError: date value out of range [while running 'Format']



      My Beam Pipeline is this:



      Bigquery = (transformation
      | 'Format' >> beam.ParDo(FormatBigQueryoFn())
      | 'Write to BigQuery' >> beam.io.Write(beam.io.BigQuerySink(
      'XXXX',
      schema=TABLE_SCHEMA,
      create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
      write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
      )))


      In class FormatBigQueryoFn is where it should be the logic of the window data time



      The code of exmple 1:



      def timestamp2str(t, fmt='%Y-%m-%d %H:%M:%S.000'):
      """Converts a unix timestamp into a formatted string."""
      return datetime.fromtimestamp(t).strftime(fmt)

      class TeamScoresDict(beam.DoFn):
      """Formats the data into a dictionary of BigQuery columns with their values
      Receives a (team, score) pair, extracts the window start timestamp, and
      formats everything together into a dictionary. The dictionary is in the format
      {'bigquery_column': value}
      """

      def process(self, team_score, window=beam.DoFn.WindowParam):
      team, score = team_score
      start = timestamp2str(int(window.start))
      yield {
      'team': team,
      'total_score': score,
      'window_start': start,
      'processing_time': timestamp2str(int(time.time()))
      }


      The code of example 2:



      class FormatDoFn(beam.DoFn):
      def process(self, element, window=beam.DoFn.WindowParam):
      ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
      window_start = window.start.to_utc_datetime().strftime(ts_format)
      window_end = window.end.to_utc_datetime().strftime(ts_format)
      return [{'word': element[0],
      'count': element[1],
      'window_start':window_start,
      'window_end':window_end}]


      What could be wrong in my pipeline?



      EDIT:



      If I print, for example, the window.start i get:



      Timestamp(-9223372036860)






      python google-cloud-dataflow apache-beam






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 8:52

























      asked Nov 7 at 17:07









      IoT user

      808




      808
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          The problem was I was reading the data from a file before to test it with Google Pub/Sub.



          While I was reading the data from a file the elements doesnt have timestamp.



          Is a must to have a timestamp in your element.



          Pub/Sub attach this timestamp automatically.



          From documentation:



          The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194377%2fpython-apache-beam-date-value-out-of-range%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            The problem was I was reading the data from a file before to test it with Google Pub/Sub.



            While I was reading the data from a file the elements doesnt have timestamp.



            Is a must to have a timestamp in your element.



            Pub/Sub attach this timestamp automatically.



            From documentation:



            The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.






            share|improve this answer



























              up vote
              1
              down vote



              accepted










              The problem was I was reading the data from a file before to test it with Google Pub/Sub.



              While I was reading the data from a file the elements doesnt have timestamp.



              Is a must to have a timestamp in your element.



              Pub/Sub attach this timestamp automatically.



              From documentation:



              The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.






              share|improve this answer

























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                The problem was I was reading the data from a file before to test it with Google Pub/Sub.



                While I was reading the data from a file the elements doesnt have timestamp.



                Is a must to have a timestamp in your element.



                Pub/Sub attach this timestamp automatically.



                From documentation:



                The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.






                share|improve this answer














                The problem was I was reading the data from a file before to test it with Google Pub/Sub.



                While I was reading the data from a file the elements doesnt have timestamp.



                Is a must to have a timestamp in your element.



                Pub/Sub attach this timestamp automatically.



                From documentation:



                The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five minute interval.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 8 at 9:16

























                answered Nov 8 at 9:11









                IoT user

                808




                808






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194377%2fpython-apache-beam-date-value-out-of-range%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud

                    Zucchini