Spark Streaming: Writing number of rows read from a Kafka topic











up vote
1
down vote

favorite












Spark streaming job is reading events from a busy kafka topic. To get a sense of how much data is coming in per trigger interval, I want to just output count of rows read from the topic. I tried multiple ways of doing that but could not figure it out.



Dataset<Row> stream = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaBootstrapServersString)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.option("enable.auto.commit", false)
// .option("failOnDataLoss", false)
// .option("maxOffsetsPerTrigger", 10000)
.load();
stream.selectExpr("topic").agg(count("topic")).as("count");
//stream.selectExpr("topic").groupBy("topic").agg(count(col("topic")).as("count"));
stream.writeStream()
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("10 seconds"))
.start();









share|improve this question




















  • 1




    stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
    – cricket_007
    Nov 9 at 16:07








  • 1




    Thank you very much. This fixed the issue. Somehow I missed it.
    – Himanshu Yadav
    Nov 9 at 17:12















up vote
1
down vote

favorite












Spark streaming job is reading events from a busy kafka topic. To get a sense of how much data is coming in per trigger interval, I want to just output count of rows read from the topic. I tried multiple ways of doing that but could not figure it out.



Dataset<Row> stream = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaBootstrapServersString)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.option("enable.auto.commit", false)
// .option("failOnDataLoss", false)
// .option("maxOffsetsPerTrigger", 10000)
.load();
stream.selectExpr("topic").agg(count("topic")).as("count");
//stream.selectExpr("topic").groupBy("topic").agg(count(col("topic")).as("count"));
stream.writeStream()
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("10 seconds"))
.start();









share|improve this question




















  • 1




    stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
    – cricket_007
    Nov 9 at 16:07








  • 1




    Thank you very much. This fixed the issue. Somehow I missed it.
    – Himanshu Yadav
    Nov 9 at 17:12













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Spark streaming job is reading events from a busy kafka topic. To get a sense of how much data is coming in per trigger interval, I want to just output count of rows read from the topic. I tried multiple ways of doing that but could not figure it out.



Dataset<Row> stream = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaBootstrapServersString)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.option("enable.auto.commit", false)
// .option("failOnDataLoss", false)
// .option("maxOffsetsPerTrigger", 10000)
.load();
stream.selectExpr("topic").agg(count("topic")).as("count");
//stream.selectExpr("topic").groupBy("topic").agg(count(col("topic")).as("count"));
stream.writeStream()
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("10 seconds"))
.start();









share|improve this question















Spark streaming job is reading events from a busy kafka topic. To get a sense of how much data is coming in per trigger interval, I want to just output count of rows read from the topic. I tried multiple ways of doing that but could not figure it out.



Dataset<Row> stream = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaBootstrapServersString)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.option("enable.auto.commit", false)
// .option("failOnDataLoss", false)
// .option("maxOffsetsPerTrigger", 10000)
.load();
stream.selectExpr("topic").agg(count("topic")).as("count");
//stream.selectExpr("topic").groupBy("topic").agg(count(col("topic")).as("count"));
stream.writeStream()
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("10 seconds"))
.start();






java apache-spark apache-kafka spark-structured-streaming






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 16:07









cricket_007

78.3k1142109




78.3k1142109










asked Nov 9 at 14:37









Himanshu Yadav

5,87733116219




5,87733116219








  • 1




    stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
    – cricket_007
    Nov 9 at 16:07








  • 1




    Thank you very much. This fixed the issue. Somehow I missed it.
    – Himanshu Yadav
    Nov 9 at 17:12














  • 1




    stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
    – cricket_007
    Nov 9 at 16:07








  • 1




    Thank you very much. This fixed the issue. Somehow I missed it.
    – Himanshu Yadav
    Nov 9 at 17:12








1




1




stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
– cricket_007
Nov 9 at 16:07






stream.selectExpr returns a new Dataset, which you're ignoring, so it's just writing what is being consumed
– cricket_007
Nov 9 at 16:07






1




1




Thank you very much. This fixed the issue. Somehow I missed it.
– Himanshu Yadav
Nov 9 at 17:12




Thank you very much. This fixed the issue. Somehow I missed it.
– Himanshu Yadav
Nov 9 at 17:12












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










Looks like you need



stream = stream.selectExpr("topic").agg(count("topic")).as("count");


And then you can print that






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53227798%2fspark-streaming-writing-number-of-rows-read-from-a-kafka-topic%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Looks like you need



    stream = stream.selectExpr("topic").agg(count("topic")).as("count");


    And then you can print that






    share|improve this answer

























      up vote
      2
      down vote



      accepted










      Looks like you need



      stream = stream.selectExpr("topic").agg(count("topic")).as("count");


      And then you can print that






      share|improve this answer























        up vote
        2
        down vote



        accepted







        up vote
        2
        down vote



        accepted






        Looks like you need



        stream = stream.selectExpr("topic").agg(count("topic")).as("count");


        And then you can print that






        share|improve this answer












        Looks like you need



        stream = stream.selectExpr("topic").agg(count("topic")).as("count");


        And then you can print that







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 9 at 18:13









        cricket_007

        78.3k1142109




        78.3k1142109






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53227798%2fspark-streaming-writing-number-of-rows-read-from-a-kafka-topic%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini