Add a date loaded field when uploading csv to big query
Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?
Google code example I have used ..
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))
python google-bigquery
add a comment |
Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?
Google code example I have used ..
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))
python google-bigquery
1
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
1
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add thedate_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)
– Graham Polley
Nov 22 '18 at 12:15
add a comment |
Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?
Google code example I have used ..
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))
python google-bigquery
Using Python.
Is there any way to add an extra field while processing a csv file to Big Query.
I'd like to add a date_loaded field with the current date ?
Google code example I have used ..
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
uri,
dataset_ref.table('us_states'),
job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table('us_states'))
print('Loaded {} rows.'.format(destination_table.num_rows))
python google-bigquery
python google-bigquery
asked Nov 21 '18 at 12:49
mez63mez63
359
359
1
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
1
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add thedate_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)
– Graham Polley
Nov 22 '18 at 12:15
add a comment |
1
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
1
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add thedate_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)
– Graham Polley
Nov 22 '18 at 12:15
1
1
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
1
1
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the
date_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)– Graham Polley
Nov 22 '18 at 12:15
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the
date_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)– Graham Polley
Nov 22 '18 at 12:15
add a comment |
2 Answers
2
active
oldest
votes
By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.
Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.
I hope this helps.
#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery
#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'
# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()
for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)
#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)
client = bigquery.Client()
dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))
add a comment |
You can keep loading your data as you are loading, but into a table called old_table
.
Once loaded, you can run something like:
bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'
This basically loads the content of old table with a new column of date_loaded
at the end to the new_table
. This way, you now have a new column without downloading locally or all the mess.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412407%2fadd-a-date-loaded-field-when-uploading-csv-to-big-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.
Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.
I hope this helps.
#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery
#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'
# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()
for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)
#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)
client = bigquery.Client()
dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))
add a comment |
By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.
Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.
I hope this helps.
#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery
#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'
# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()
for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)
#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)
client = bigquery.Client()
dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))
add a comment |
By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.
Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.
I hope this helps.
#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery
#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'
# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()
for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)
#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)
client = bigquery.Client()
dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))
By modifying this Python example to fit your issue you open and read the original CSV file from my local PC, edit it by adding a column and append timestamps at the end of each line to avoid having an empty column. This link explains how to get a timestamp in Python with custom date and time.
Then you write the resulting data to an output file and load it to Google Storage. Here you can find the information on how to run external commands from a Python file.
I hope this helps.
#Import the dependencies
import csv,datetime,subprocess
from google.cloud import bigquery
#Replace the values for variables with the appropriate ones
#Name of the input csv file
csv_in_name = 'us-states.csv'
#Name of the output csv file to avoid messing up the original
csv_out_name = 'out_file_us-states.csv'
#Name of the NEW COLUMN NAME to be added
new_col_name = 'date_loaded'
#Type of the new column
col_type = 'DATETIME'
#Name of your bucket
bucket_id = 'YOUR BUCKET ID'
#Your dataset name
ds_id = 'YOUR DATASET ID'
#The destination table name
destination_table_name = 'TABLE NAME'
# read and write csv files
with open(csv_in_name,'r') as r_csvfile:
with open(csv_out_name,'w') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter=',')
#add new column with existing
fieldnames = dict_reader.fieldnames + [new_col_name]
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter=',')
writer_csv.writeheader()
for row in dict_reader:
#Put the timestamp after the last comma so that the column is not empty
row[new_col_name] = datetime.datetime.now()
writer_csv.writerow(row)
#Copy the file to your Google Storage bucket
subprocess.call('gsutil cp ' + csv_out_name + ' gs://' + bucket_id , shell=True)
client = bigquery.Client()
dataset_ref = client.dataset(ds_id)
job_config = bigquery.LoadJobConfig()
#Add a new column to the schema!
job_config.schema = [
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('post_abbr', 'STRING'),
bigquery.SchemaField(new_col_name, col_type)
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
#Address string of the output csv file
uri = 'gs://' + bucket_id + '/' + csv_out_name
load_job = client.load_table_from_uri(uri,dataset_ref.table(destination_table_name),job_config=job_config) # API request
print('Starting job {}'.format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(destination_table_name))
print('Loaded {} rows.'.format(destination_table.num_rows))
answered Feb 8 at 21:08
FimFim
236
236
add a comment |
add a comment |
You can keep loading your data as you are loading, but into a table called old_table
.
Once loaded, you can run something like:
bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'
This basically loads the content of old table with a new column of date_loaded
at the end to the new_table
. This way, you now have a new column without downloading locally or all the mess.
add a comment |
You can keep loading your data as you are loading, but into a table called old_table
.
Once loaded, you can run something like:
bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'
This basically loads the content of old table with a new column of date_loaded
at the end to the new_table
. This way, you now have a new column without downloading locally or all the mess.
add a comment |
You can keep loading your data as you are loading, but into a table called old_table
.
Once loaded, you can run something like:
bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'
This basically loads the content of old table with a new column of date_loaded
at the end to the new_table
. This way, you now have a new column without downloading locally or all the mess.
You can keep loading your data as you are loading, but into a table called old_table
.
Once loaded, you can run something like:
bq --location=US query --destination_table mydataset.newtable --use_legacy_sql=false --replace=true 'select *, current_date() as date_loaded from mydataset.old_table'
This basically loads the content of old table with a new column of date_loaded
at the end to the new_table
. This way, you now have a new column without downloading locally or all the mess.
answered Feb 9 at 2:17
khankhan
2,10793052
2,10793052
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412407%2fadd-a-date-loaded-field-when-uploading-csv-to-big-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Does date partitioned tables work for you? If not, maybe a better approach would be to use apache beam instead. If still it doesnt work, then only way out I see is to bring this data to local, iterate over it and add the date field. If you are working with lots of data this is not recommended though.
– Willian Fuks
Nov 21 '18 at 15:16
1
..or load it into a staging/tmp table in BigQuery then hit it with SQL, and add the
date_loaded
field as part of that SQL transform. Write the results to your main table. If you use ingestion based partition table, just be aware that its's in UTC unless you address the partition directly (cloud.google.com/bigquery/docs/…)– Graham Polley
Nov 22 '18 at 12:15