Get all text in an lxml node











up vote
2
down vote

favorite












I am using the following approach to print all the text (not html, but actual text contained) within an element node:



''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())


Is there a cleaner approach to doing this?










share|improve this question






















  • You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
    – Andersson
    Nov 8 at 12:35















up vote
2
down vote

favorite












I am using the following approach to print all the text (not html, but actual text contained) within an element node:



''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())


Is there a cleaner approach to doing this?










share|improve this question






















  • You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
    – Andersson
    Nov 8 at 12:35













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am using the following approach to print all the text (not html, but actual text contained) within an element node:



''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())


Is there a cleaner approach to doing this?










share|improve this question













I am using the following approach to print all the text (not html, but actual text contained) within an element node:



''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())


Is there a cleaner approach to doing this?







python lxml






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 7 at 18:51









David542

32.3k89245446




32.3k89245446












  • You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
    – Andersson
    Nov 8 at 12:35


















  • You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
    – Andersson
    Nov 8 at 12:35
















You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35




You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You could use XPath's string() function.



If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.



Example of all three (yours and my two)...



Python



from lxml import etree

xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""

tree = etree.fromstring(xml)

print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

print(tree.xpath('string(//div[@class="title_wrapper"])'))

print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))


Output



Some text. Some more text. 
Even more text!

Some text. Some more text.
Even more text!

Some text. Some more text. Even more text!





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195927%2fget-all-text-in-an-lxml-node%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    You could use XPath's string() function.



    If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.



    Example of all three (yours and my two)...



    Python



    from lxml import etree

    xml = """<doc>
    <div class="title_wrapper">Some text. Some <span>more</span> text.
    <span>Even <span>m<span>o</span>re</span> text!</span>
    </div>
    </doc>"""

    tree = etree.fromstring(xml)

    print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

    print(tree.xpath('string(//div[@class="title_wrapper"])'))

    print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))


    Output



    Some text. Some more text. 
    Even more text!

    Some text. Some more text.
    Even more text!

    Some text. Some more text. Even more text!





    share|improve this answer

























      up vote
      1
      down vote



      accepted










      You could use XPath's string() function.



      If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.



      Example of all three (yours and my two)...



      Python



      from lxml import etree

      xml = """<doc>
      <div class="title_wrapper">Some text. Some <span>more</span> text.
      <span>Even <span>m<span>o</span>re</span> text!</span>
      </div>
      </doc>"""

      tree = etree.fromstring(xml)

      print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

      print(tree.xpath('string(//div[@class="title_wrapper"])'))

      print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))


      Output



      Some text. Some more text. 
      Even more text!

      Some text. Some more text.
      Even more text!

      Some text. Some more text. Even more text!





      share|improve this answer























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        You could use XPath's string() function.



        If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.



        Example of all three (yours and my two)...



        Python



        from lxml import etree

        xml = """<doc>
        <div class="title_wrapper">Some text. Some <span>more</span> text.
        <span>Even <span>m<span>o</span>re</span> text!</span>
        </div>
        </doc>"""

        tree = etree.fromstring(xml)

        print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

        print(tree.xpath('string(//div[@class="title_wrapper"])'))

        print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))


        Output



        Some text. Some more text. 
        Even more text!

        Some text. Some more text.
        Even more text!

        Some text. Some more text. Even more text!





        share|improve this answer












        You could use XPath's string() function.



        If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.



        Example of all three (yours and my two)...



        Python



        from lxml import etree

        xml = """<doc>
        <div class="title_wrapper">Some text. Some <span>more</span> text.
        <span>Even <span>m<span>o</span>re</span> text!</span>
        </div>
        </doc>"""

        tree = etree.fromstring(xml)

        print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

        print(tree.xpath('string(//div[@class="title_wrapper"])'))

        print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))


        Output



        Some text. Some more text. 
        Even more text!

        Some text. Some more text.
        Even more text!

        Some text. Some more text. Even more text!






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 7 at 19:55









        Daniel Haley

        38.3k45180




        38.3k45180






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195927%2fget-all-text-in-an-lxml-node%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings