How to query a table partitioned on a column in AWS Athena that uses Presto












0















If I have created a table like this in AWS Athena:



CREATE EXTERNAL TABLE table (
`timestamp` BIGINT,
`id` STRING,
)PARTITIONED BY (
date_column STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://bucket/key' TBLPROPERTIES ( 'parquet.compress'='SNAPPY', 'CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'classification'='parquet')


And after adding data, date_column looks like this:



date_column
date=2018102300
date=2018091500 //(so Sept 15, 2018)


I want to get data only for the month of September but unable to frame the correct query:



So far I have this which throws date format error:



SELECT * FROM table 
where date_parse(date_column, 'date=%Y%m%d') >= date_parse('date=2018090100', 'date=%Y%m%d') and date_parse(date_column, 'date=%Y%m%d') < date_parse('date=2018100100', 'date=%Y%m%d')









share|improve this question























  • Why do you store "date=2018102300" instead of "2018102300"?

    – j.b.gorski
    Nov 17 '18 at 23:25
















0















If I have created a table like this in AWS Athena:



CREATE EXTERNAL TABLE table (
`timestamp` BIGINT,
`id` STRING,
)PARTITIONED BY (
date_column STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://bucket/key' TBLPROPERTIES ( 'parquet.compress'='SNAPPY', 'CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'classification'='parquet')


And after adding data, date_column looks like this:



date_column
date=2018102300
date=2018091500 //(so Sept 15, 2018)


I want to get data only for the month of September but unable to frame the correct query:



So far I have this which throws date format error:



SELECT * FROM table 
where date_parse(date_column, 'date=%Y%m%d') >= date_parse('date=2018090100', 'date=%Y%m%d') and date_parse(date_column, 'date=%Y%m%d') < date_parse('date=2018100100', 'date=%Y%m%d')









share|improve this question























  • Why do you store "date=2018102300" instead of "2018102300"?

    – j.b.gorski
    Nov 17 '18 at 23:25














0












0








0








If I have created a table like this in AWS Athena:



CREATE EXTERNAL TABLE table (
`timestamp` BIGINT,
`id` STRING,
)PARTITIONED BY (
date_column STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://bucket/key' TBLPROPERTIES ( 'parquet.compress'='SNAPPY', 'CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'classification'='parquet')


And after adding data, date_column looks like this:



date_column
date=2018102300
date=2018091500 //(so Sept 15, 2018)


I want to get data only for the month of September but unable to frame the correct query:



So far I have this which throws date format error:



SELECT * FROM table 
where date_parse(date_column, 'date=%Y%m%d') >= date_parse('date=2018090100', 'date=%Y%m%d') and date_parse(date_column, 'date=%Y%m%d') < date_parse('date=2018100100', 'date=%Y%m%d')









share|improve this question














If I have created a table like this in AWS Athena:



CREATE EXTERNAL TABLE table (
`timestamp` BIGINT,
`id` STRING,
)PARTITIONED BY (
date_column STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://bucket/key' TBLPROPERTIES ( 'parquet.compress'='SNAPPY', 'CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'classification'='parquet')


And after adding data, date_column looks like this:



date_column
date=2018102300
date=2018091500 //(so Sept 15, 2018)


I want to get data only for the month of September but unable to frame the correct query:



So far I have this which throws date format error:



SELECT * FROM table 
where date_parse(date_column, 'date=%Y%m%d') >= date_parse('date=2018090100', 'date=%Y%m%d') and date_parse(date_column, 'date=%Y%m%d') < date_parse('date=2018100100', 'date=%Y%m%d')






sql amazon-athena prestodb






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 21:46









AtihskaAtihska

9951434




9951434













  • Why do you store "date=2018102300" instead of "2018102300"?

    – j.b.gorski
    Nov 17 '18 at 23:25



















  • Why do you store "date=2018102300" instead of "2018102300"?

    – j.b.gorski
    Nov 17 '18 at 23:25

















Why do you store "date=2018102300" instead of "2018102300"?

– j.b.gorski
Nov 17 '18 at 23:25





Why do you store "date=2018102300" instead of "2018102300"?

– j.b.gorski
Nov 17 '18 at 23:25












1 Answer
1






active

oldest

votes


















0














The parameters which you are passing to function date_parse() are incorrect.It should be in below format to fetch correct timestamp format



select date_parse('2018091500', '%Y%m%d%H') will fetch you 2018-09-15 00:00:00.000


You can rewrite your query to fetch results for September



select * from  table where date_parse(date_column, '%Y%m%d%H') between date '2018-09-01' and date '2018-09-30'





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53345867%2fhow-to-query-a-table-partitioned-on-a-column-in-aws-athena-that-uses-presto%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The parameters which you are passing to function date_parse() are incorrect.It should be in below format to fetch correct timestamp format



    select date_parse('2018091500', '%Y%m%d%H') will fetch you 2018-09-15 00:00:00.000


    You can rewrite your query to fetch results for September



    select * from  table where date_parse(date_column, '%Y%m%d%H') between date '2018-09-01' and date '2018-09-30'





    share|improve this answer




























      0














      The parameters which you are passing to function date_parse() are incorrect.It should be in below format to fetch correct timestamp format



      select date_parse('2018091500', '%Y%m%d%H') will fetch you 2018-09-15 00:00:00.000


      You can rewrite your query to fetch results for September



      select * from  table where date_parse(date_column, '%Y%m%d%H') between date '2018-09-01' and date '2018-09-30'





      share|improve this answer


























        0












        0








        0







        The parameters which you are passing to function date_parse() are incorrect.It should be in below format to fetch correct timestamp format



        select date_parse('2018091500', '%Y%m%d%H') will fetch you 2018-09-15 00:00:00.000


        You can rewrite your query to fetch results for September



        select * from  table where date_parse(date_column, '%Y%m%d%H') between date '2018-09-01' and date '2018-09-30'





        share|improve this answer













        The parameters which you are passing to function date_parse() are incorrect.It should be in below format to fetch correct timestamp format



        select date_parse('2018091500', '%Y%m%d%H') will fetch you 2018-09-15 00:00:00.000


        You can rewrite your query to fetch results for September



        select * from  table where date_parse(date_column, '%Y%m%d%H') between date '2018-09-01' and date '2018-09-30'






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 21 '18 at 18:29









        bdcloudbdcloud

        422410




        422410






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53345867%2fhow-to-query-a-table-partitioned-on-a-column-in-aws-athena-that-uses-presto%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini