BigQuery: Atomically replace a date partition using DML
up vote
1
down vote
favorite
I often want to load one day's worth of data into a date-partitioned BigQuery table, replacing any data that's already there. I know how to do this for 'old-style' data partitioned tables (the ones that have a _PARTITIONTIME
field) but don't know how to do this with the new-style date-partitioned tables (which use a normal date/timestamp column to specify the partitioning because they don't allow one to use the $
decorator.
Let's say I want to do this on my_table
. With old-style date-partitioned tables, I accomplished this using a load job that utilized the $
decorator and the WRITE_TRUNCATE
write disposition -- e.g., I'd set the destination table to be my_table$20181005
.
However, I'm not sure how to perform the equivalent operation using a DML. I find myself performing separate DELETE
and INSERT
commands. This isn't great because it increases complexity, the number of queries, and the operation isn't atomic.
I want to know how to do this using the MERGE
command to keep this all contained within a single, atomic operation. However I can't wrap my head around the MERGE command's syntax and haven't found an example for this use case. Does anyone know how this should be done?
The ideal answer would be a DML statement that selected all columns from source_table
and inserted it into the 2018-10-05
date partition of my_table
, deleting any existing data that was in my_table
's 2018-10-05
date partition. We can assume that source_table
and my_table
have the same schemas, and that my_table
is partitioned on the day
column, which is of type DATE
.
sql google-bigquery
add a comment |
up vote
1
down vote
favorite
I often want to load one day's worth of data into a date-partitioned BigQuery table, replacing any data that's already there. I know how to do this for 'old-style' data partitioned tables (the ones that have a _PARTITIONTIME
field) but don't know how to do this with the new-style date-partitioned tables (which use a normal date/timestamp column to specify the partitioning because they don't allow one to use the $
decorator.
Let's say I want to do this on my_table
. With old-style date-partitioned tables, I accomplished this using a load job that utilized the $
decorator and the WRITE_TRUNCATE
write disposition -- e.g., I'd set the destination table to be my_table$20181005
.
However, I'm not sure how to perform the equivalent operation using a DML. I find myself performing separate DELETE
and INSERT
commands. This isn't great because it increases complexity, the number of queries, and the operation isn't atomic.
I want to know how to do this using the MERGE
command to keep this all contained within a single, atomic operation. However I can't wrap my head around the MERGE command's syntax and haven't found an example for this use case. Does anyone know how this should be done?
The ideal answer would be a DML statement that selected all columns from source_table
and inserted it into the 2018-10-05
date partition of my_table
, deleting any existing data that was in my_table
's 2018-10-05
date partition. We can assume that source_table
and my_table
have the same schemas, and that my_table
is partitioned on the day
column, which is of type DATE
.
sql google-bigquery
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I often want to load one day's worth of data into a date-partitioned BigQuery table, replacing any data that's already there. I know how to do this for 'old-style' data partitioned tables (the ones that have a _PARTITIONTIME
field) but don't know how to do this with the new-style date-partitioned tables (which use a normal date/timestamp column to specify the partitioning because they don't allow one to use the $
decorator.
Let's say I want to do this on my_table
. With old-style date-partitioned tables, I accomplished this using a load job that utilized the $
decorator and the WRITE_TRUNCATE
write disposition -- e.g., I'd set the destination table to be my_table$20181005
.
However, I'm not sure how to perform the equivalent operation using a DML. I find myself performing separate DELETE
and INSERT
commands. This isn't great because it increases complexity, the number of queries, and the operation isn't atomic.
I want to know how to do this using the MERGE
command to keep this all contained within a single, atomic operation. However I can't wrap my head around the MERGE command's syntax and haven't found an example for this use case. Does anyone know how this should be done?
The ideal answer would be a DML statement that selected all columns from source_table
and inserted it into the 2018-10-05
date partition of my_table
, deleting any existing data that was in my_table
's 2018-10-05
date partition. We can assume that source_table
and my_table
have the same schemas, and that my_table
is partitioned on the day
column, which is of type DATE
.
sql google-bigquery
I often want to load one day's worth of data into a date-partitioned BigQuery table, replacing any data that's already there. I know how to do this for 'old-style' data partitioned tables (the ones that have a _PARTITIONTIME
field) but don't know how to do this with the new-style date-partitioned tables (which use a normal date/timestamp column to specify the partitioning because they don't allow one to use the $
decorator.
Let's say I want to do this on my_table
. With old-style date-partitioned tables, I accomplished this using a load job that utilized the $
decorator and the WRITE_TRUNCATE
write disposition -- e.g., I'd set the destination table to be my_table$20181005
.
However, I'm not sure how to perform the equivalent operation using a DML. I find myself performing separate DELETE
and INSERT
commands. This isn't great because it increases complexity, the number of queries, and the operation isn't atomic.
I want to know how to do this using the MERGE
command to keep this all contained within a single, atomic operation. However I can't wrap my head around the MERGE command's syntax and haven't found an example for this use case. Does anyone know how this should be done?
The ideal answer would be a DML statement that selected all columns from source_table
and inserted it into the 2018-10-05
date partition of my_table
, deleting any existing data that was in my_table
's 2018-10-05
date partition. We can assume that source_table
and my_table
have the same schemas, and that my_table
is partitioned on the day
column, which is of type DATE
.
sql google-bigquery
sql google-bigquery
edited Nov 7 at 16:08
asked Nov 7 at 8:58
conradlee
3,59263562
3,59263562
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
because they don't allow one to use the $ decorator
But they do--you can use table_name$YYYYMMDD
when you load into column-based partitioned table as well. For example, I made a partitioned table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
Then I loaded into a specific partition:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
I tried to load into the wrong partition for the input data, and received an error:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181105" ./row.csv
Some rows belong to different partitions rather than destination partition 20181105
I then replaced the data for the partition:
$ echo "2,0.11,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
Yes, you can use MERGE
as a way of replacing data for a partitioned table's partition, but you can also use a load job.
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
because they don't allow one to use the $ decorator
But they do--you can use table_name$YYYYMMDD
when you load into column-based partitioned table as well. For example, I made a partitioned table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
Then I loaded into a specific partition:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
I tried to load into the wrong partition for the input data, and received an error:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181105" ./row.csv
Some rows belong to different partitions rather than destination partition 20181105
I then replaced the data for the partition:
$ echo "2,0.11,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
Yes, you can use MERGE
as a way of replacing data for a partitioned table's partition, but you can also use a load job.
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
add a comment |
up vote
2
down vote
accepted
because they don't allow one to use the $ decorator
But they do--you can use table_name$YYYYMMDD
when you load into column-based partitioned table as well. For example, I made a partitioned table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
Then I loaded into a specific partition:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
I tried to load into the wrong partition for the input data, and received an error:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181105" ./row.csv
Some rows belong to different partitions rather than destination partition 20181105
I then replaced the data for the partition:
$ echo "2,0.11,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
Yes, you can use MERGE
as a way of replacing data for a partitioned table's partition, but you can also use a load job.
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
because they don't allow one to use the $ decorator
But they do--you can use table_name$YYYYMMDD
when you load into column-based partitioned table as well. For example, I made a partitioned table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
Then I loaded into a specific partition:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
I tried to load into the wrong partition for the input data, and received an error:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181105" ./row.csv
Some rows belong to different partitions rather than destination partition 20181105
I then replaced the data for the partition:
$ echo "2,0.11,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
Yes, you can use MERGE
as a way of replacing data for a partitioned table's partition, but you can also use a load job.
because they don't allow one to use the $ decorator
But they do--you can use table_name$YYYYMMDD
when you load into column-based partitioned table as well. For example, I made a partitioned table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
Then I loaded into a specific partition:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
I tried to load into the wrong partition for the input data, and received an error:
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181105" ./row.csv
Some rows belong to different partitions rather than destination partition 20181105
I then replaced the data for the partition:
$ echo "2,0.11,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable$20181107" ./row.csv
Yes, you can use MERGE
as a way of replacing data for a partitioned table's partition, but you can also use a load job.
answered Nov 7 at 18:18
Elliott Brossard
15.3k2933
15.3k2933
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
add a comment |
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
Thanks! I went back and looked at why I thought this wasn't possible, and it was because of an error message I got from Airflow (which in this case doesn't accurately reflect the BigQuery API). This solution is much better than DML for my use-case because it handles the case where schemas are additively updated over time (and thus inconsistent), which I can't seem to handle with the DML because it requires me to explicitly list out all the columns I'm going to insert into a table.
– conradlee
Nov 8 at 10:11
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53186198%2fbigquery-atomically-replace-a-date-partition-using-dml%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown