Add a date loaded field when uploading csv to big query












0















Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?



Google code example I have used ..



# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result() # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))









share|improve this question


















  • 1





    Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

    – Willian Fuks
    Nov 21 '18 at 15:16






  • 1





    ..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

    – Graham Polley
    Nov 22 '18 at 12:15


















0















Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?



Google code example I have used ..



# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result() # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))









share|improve this question


















  • 1





    Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

    – Willian Fuks
    Nov 21 '18 at 15:16






  • 1





    ..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

    – Graham Polley
    Nov 22 '18 at 12:15
















0












0








0








Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?



Google code example I have used ..



# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result() # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))









share|improve this question














Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?



Google code example I have used ..



# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result() # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))






python google-bigquery






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 12:49









mez63mez63

359




359








  • 1





    Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

    – Willian Fuks
    Nov 21 '18 at 15:16






  • 1





    ..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

    – Graham Polley
    Nov 22 '18 at 12:15
















  • 1





    Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

    – Willian Fuks
    Nov 21 '18 at 15:16






  • 1





    ..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

    – Graham Polley
    Nov 22 '18 at 12:15










1




1





Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

– Willian Fuks
Nov 21 '18 at 15:16





Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.

– Willian Fuks
Nov 21 '18 at 15:16




1




1





..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

– Graham Polley
Nov 22 '18 at 12:15







..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the date_loaded field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)

– Graham Polley
Nov 22 '18 at 12:15














2 Answers
2






active

oldest

votes


















1














By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.



Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.



I hope this helps.



#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery

#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'


# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:

dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()


for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)

#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)


client = bigquery.Client()

dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))

load_job.result() # Waits for table load to complete.
print('Job finished.')

destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))





share|improve this answer































    0














    You can keep loading your data as you are loading, but into a table called old_table.



    Once loaded, you can run something like:



    bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'


    This basically loads the content of old table with a new column of date_loaded at the end to the new_table. This way, you now have a new column without downloading locally or all the mess.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412407%2fadd-a-date-loaded-field-when-uploading-csv-to-big-query%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.



      Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.



      I hope this helps.



      #Import the dependencies
      import csv,datetime,subprocess
      from google.cloud import bigquery

      #Replace the values for variables with the appropriate ones
      #Name of the input csv file
      csv_in_name = 'us-states.csv'
      #Name of the output csv file to avoid messing up the original
      csv_out_name = 'out_file_us-states.csv'
      #Name of the NEW COLUMN NAME to be added
      new_col_name = 'date_loaded'
      #Type of the new column
      col_type = 'DATETIME'
      #Name of your bucket
      bucket_id = 'YOUR BUCKET ID'
      #Your dataset name
      ds_id = 'YOUR DATASET ID'
      #The destination table name
      destination_table_name = 'TABLE NAME'


      # read and write csv files
      with open(csv_in_name,'r') as r_csvfile:
      with open(csv_out_name,'w') as w_csvfile:

      dict_reader = csv.DictReader(r_csvfile,delimiter=',')
      #add new column with existing
      fieldnames = dict_reader.fieldnames + [new_col_name]
      writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
      writer_csv.writeheader()


      for row in dict_reader:
      #Put the timestamp after the last comma so that the column is not empty
      row[new_col_name] = datetime.datetime.now()
      writer_csv.writerow(row)

      #Copy the file to your Google Storage bucket
      subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)


      client = bigquery.Client()

      dataset_ref = client.dataset(ds_id)
      job_config = bigquery.LoadJobConfig()
      #Add a new column to the schema!
      job_config.schema = [
      bigquery.SchemaField('name', 'STRING'),
      bigquery.SchemaField('post_abbr', 'STRING'),
      bigquery.SchemaField(new_col_name, col_type)
      ]
      job_config.skip_leading_rows = 1
      # The source format defaults to CSV, so the line below is optional.
      job_config.source_format = bigquery.SourceFormat.CSV
      #Address string of the output csv file
      uri = 'gs://' + bucket_id + '/' + csv_out_name
      load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
      print('Starting job {}'.format(load_job.job_id))

      load_job.result() # Waits for table load to complete.
      print('Job finished.')

      destination_table = client.get_table(dataset_ref.table(destination_table_name))
      print('Loaded {} rows.'.format(destination_table.num_rows))





      share|improve this answer




























        1














        By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.



        Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.



        I hope this helps.



        #Import the dependencies
        import csv,datetime,subprocess
        from google.cloud import bigquery

        #Replace the values for variables with the appropriate ones
        #Name of the input csv file
        csv_in_name = 'us-states.csv'
        #Name of the output csv file to avoid messing up the original
        csv_out_name = 'out_file_us-states.csv'
        #Name of the NEW COLUMN NAME to be added
        new_col_name = 'date_loaded'
        #Type of the new column
        col_type = 'DATETIME'
        #Name of your bucket
        bucket_id = 'YOUR BUCKET ID'
        #Your dataset name
        ds_id = 'YOUR DATASET ID'
        #The destination table name
        destination_table_name = 'TABLE NAME'


        # read and write csv files
        with open(csv_in_name,'r') as r_csvfile:
        with open(csv_out_name,'w') as w_csvfile:

        dict_reader = csv.DictReader(r_csvfile,delimiter=',')
        #add new column with existing
        fieldnames = dict_reader.fieldnames + [new_col_name]
        writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
        writer_csv.writeheader()


        for row in dict_reader:
        #Put the timestamp after the last comma so that the column is not empty
        row[new_col_name] = datetime.datetime.now()
        writer_csv.writerow(row)

        #Copy the file to your Google Storage bucket
        subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)


        client = bigquery.Client()

        dataset_ref = client.dataset(ds_id)
        job_config = bigquery.LoadJobConfig()
        #Add a new column to the schema!
        job_config.schema = [
        bigquery.SchemaField('name', 'STRING'),
        bigquery.SchemaField('post_abbr', 'STRING'),
        bigquery.SchemaField(new_col_name, col_type)
        ]
        job_config.skip_leading_rows = 1
        # The source format defaults to CSV, so the line below is optional.
        job_config.source_format = bigquery.SourceFormat.CSV
        #Address string of the output csv file
        uri = 'gs://' + bucket_id + '/' + csv_out_name
        load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
        print('Starting job {}'.format(load_job.job_id))

        load_job.result() # Waits for table load to complete.
        print('Job finished.')

        destination_table = client.get_table(dataset_ref.table(destination_table_name))
        print('Loaded {} rows.'.format(destination_table.num_rows))





        share|improve this answer


























          1












          1








          1







          By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.



          Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.



          I hope this helps.



          #Import the dependencies
          import csv,datetime,subprocess
          from google.cloud import bigquery

          #Replace the values for variables with the appropriate ones
          #Name of the input csv file
          csv_in_name = 'us-states.csv'
          #Name of the output csv file to avoid messing up the original
          csv_out_name = 'out_file_us-states.csv'
          #Name of the NEW COLUMN NAME to be added
          new_col_name = 'date_loaded'
          #Type of the new column
          col_type = 'DATETIME'
          #Name of your bucket
          bucket_id = 'YOUR BUCKET ID'
          #Your dataset name
          ds_id = 'YOUR DATASET ID'
          #The destination table name
          destination_table_name = 'TABLE NAME'


          # read and write csv files
          with open(csv_in_name,'r') as r_csvfile:
          with open(csv_out_name,'w') as w_csvfile:

          dict_reader = csv.DictReader(r_csvfile,delimiter=',')
          #add new column with existing
          fieldnames = dict_reader.fieldnames + [new_col_name]
          writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
          writer_csv.writeheader()


          for row in dict_reader:
          #Put the timestamp after the last comma so that the column is not empty
          row[new_col_name] = datetime.datetime.now()
          writer_csv.writerow(row)

          #Copy the file to your Google Storage bucket
          subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)


          client = bigquery.Client()

          dataset_ref = client.dataset(ds_id)
          job_config = bigquery.LoadJobConfig()
          #Add a new column to the schema!
          job_config.schema = [
          bigquery.SchemaField('name', 'STRING'),
          bigquery.SchemaField('post_abbr', 'STRING'),
          bigquery.SchemaField(new_col_name, col_type)
          ]
          job_config.skip_leading_rows = 1
          # The source format defaults to CSV, so the line below is optional.
          job_config.source_format = bigquery.SourceFormat.CSV
          #Address string of the output csv file
          uri = 'gs://' + bucket_id + '/' + csv_out_name
          load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
          print('Starting job {}'.format(load_job.job_id))

          load_job.result() # Waits for table load to complete.
          print('Job finished.')

          destination_table = client.get_table(dataset_ref.table(destination_table_name))
          print('Loaded {} rows.'.format(destination_table.num_rows))





          share|improve this answer













          By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.



          Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.



          I hope this helps.



          #Import the dependencies
          import csv,datetime,subprocess
          from google.cloud import bigquery

          #Replace the values for variables with the appropriate ones
          #Name of the input csv file
          csv_in_name = 'us-states.csv'
          #Name of the output csv file to avoid messing up the original
          csv_out_name = 'out_file_us-states.csv'
          #Name of the NEW COLUMN NAME to be added
          new_col_name = 'date_loaded'
          #Type of the new column
          col_type = 'DATETIME'
          #Name of your bucket
          bucket_id = 'YOUR BUCKET ID'
          #Your dataset name
          ds_id = 'YOUR DATASET ID'
          #The destination table name
          destination_table_name = 'TABLE NAME'


          # read and write csv files
          with open(csv_in_name,'r') as r_csvfile:
          with open(csv_out_name,'w') as w_csvfile:

          dict_reader = csv.DictReader(r_csvfile,delimiter=',')
          #add new column with existing
          fieldnames = dict_reader.fieldnames + [new_col_name]
          writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
          writer_csv.writeheader()


          for row in dict_reader:
          #Put the timestamp after the last comma so that the column is not empty
          row[new_col_name] = datetime.datetime.now()
          writer_csv.writerow(row)

          #Copy the file to your Google Storage bucket
          subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)


          client = bigquery.Client()

          dataset_ref = client.dataset(ds_id)
          job_config = bigquery.LoadJobConfig()
          #Add a new column to the schema!
          job_config.schema = [
          bigquery.SchemaField('name', 'STRING'),
          bigquery.SchemaField('post_abbr', 'STRING'),
          bigquery.SchemaField(new_col_name, col_type)
          ]
          job_config.skip_leading_rows = 1
          # The source format defaults to CSV, so the line below is optional.
          job_config.source_format = bigquery.SourceFormat.CSV
          #Address string of the output csv file
          uri = 'gs://' + bucket_id + '/' + csv_out_name
          load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
          print('Starting job {}'.format(load_job.job_id))

          load_job.result() # Waits for table load to complete.
          print('Job finished.')

          destination_table = client.get_table(dataset_ref.table(destination_table_name))
          print('Loaded {} rows.'.format(destination_table.num_rows))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 8 at 21:08









          FimFim

          236




          236

























              0














              You can keep loading your data as you are loading, but into a table called old_table.



              Once loaded, you can run something like:



              bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'


              This basically loads the content of old table with a new column of date_loaded at the end to the new_table. This way, you now have a new column without downloading locally or all the mess.






              share|improve this answer




























                0














                You can keep loading your data as you are loading, but into a table called old_table.



                Once loaded, you can run something like:



                bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'


                This basically loads the content of old table with a new column of date_loaded at the end to the new_table. This way, you now have a new column without downloading locally or all the mess.






                share|improve this answer


























                  0












                  0








                  0







                  You can keep loading your data as you are loading, but into a table called old_table.



                  Once loaded, you can run something like:



                  bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'


                  This basically loads the content of old table with a new column of date_loaded at the end to the new_table. This way, you now have a new column without downloading locally or all the mess.






                  share|improve this answer













                  You can keep loading your data as you are loading, but into a table called old_table.



                  Once loaded, you can run something like:



                  bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'


                  This basically loads the content of old table with a new column of date_loaded at the end to the new_table. This way, you now have a new column without downloading locally or all the mess.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 9 at 2:17









                  khankhan

                  2,10793052




                  2,10793052






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412407%2fadd-a-date-loaded-field-when-uploading-csv-to-big-query%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      這個網誌中的熱門文章

                      Xamarin.form Move up view when keyboard appear

                      Post-Redirect-Get with Spring WebFlux and Thymeleaf

                      Anylogic : not able to use stopDelay()