Get all text in an lxml node

up vote
2
down vote

favorite

I am using the following approach to print all the text (not html, but actual text contained) within an element node:

''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())

Is there a cleaner approach to doing this?

asked Nov 7 at 18:51

David542

32.3k89245446

You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35

add a comment |

up vote
2
down vote

favorite

I am using the following approach to print all the text (not html, but actual text contained) within an element node:

''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())

Is there a cleaner approach to doing this?

asked Nov 7 at 18:51

David542

32.3k89245446

You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35

add a comment |

up vote
2
down vote

favorite

I am using the following approach to print all the text (not html, but actual text contained) within an element node:

''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())

Is there a cleaner approach to doing this?

asked Nov 7 at 18:51

David542

32.3k89245446

I am using the following approach to print all the text (not html, but actual text contained) within an element node:

''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())

Is there a cleaner approach to doing this?

python lxml

asked Nov 7 at 18:51

David542

32.3k89245446

asked Nov 7 at 18:51

David542

32.3k89245446

asked Nov 7 at 18:51

David542

32.3k89245446

asked Nov 7 at 18:51

David542

32.3k89245446

asked Nov 7 at 18:51

David542

32.3k89245446

You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35

add a comment |

You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35

You can also try node.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You could use XPath's string() function.

If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.

Example of all three (yours and my two)...

Python

from lxml import etree



xml = """<doc>

    <div class="title_wrapper">Some text. Some <span>more</span> text. 

    <span>Even <span>m<span>o</span>re</span> text!</span>

    </div>

</doc>"""



tree = etree.fromstring(xml)



print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))



print(tree.xpath('string(//div[@class="title_wrapper"])'))



print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

Output

Some text. Some more text. 

    Even more text!



Some text. Some more text. 

    Even more text!



Some text. Some more text. Even more text!

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195927%2fget-all-text-in-an-lxml-node%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You could use XPath's string() function.

If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.

Example of all three (yours and my two)...

Python

from lxml import etree



xml = """<doc>

    <div class="title_wrapper">Some text. Some <span>more</span> text. 

    <span>Even <span>m<span>o</span>re</span> text!</span>

    </div>

</doc>"""



tree = etree.fromstring(xml)



print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))



print(tree.xpath('string(//div[@class="title_wrapper"])'))



print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

Output

Some text. Some more text. 

    Even more text!



Some text. Some more text. 

    Even more text!



Some text. Some more text. Even more text!

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

add a comment |

up vote
1
down vote

accepted

You could use XPath's string() function.

If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.

Example of all three (yours and my two)...

Python

from lxml import etree



xml = """<doc>

    <div class="title_wrapper">Some text. Some <span>more</span> text. 

    <span>Even <span>m<span>o</span>re</span> text!</span>

    </div>

</doc>"""



tree = etree.fromstring(xml)



print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))



print(tree.xpath('string(//div[@class="title_wrapper"])'))



print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

Output

Some text. Some more text. 

    Even more text!



Some text. Some more text. 

    Even more text!



Some text. Some more text. Even more text!

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

add a comment |

up vote
1
down vote

accepted

You could use XPath's string() function.

If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.

Example of all three (yours and my two)...

Python

from lxml import etree



xml = """<doc>

    <div class="title_wrapper">Some text. Some <span>more</span> text. 

    <span>Even <span>m<span>o</span>re</span> text!</span>

    </div>

</doc>"""



tree = etree.fromstring(xml)



print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))



print(tree.xpath('string(//div[@class="title_wrapper"])'))



print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

Output

Some text. Some more text. 

    Even more text!



Some text. Some more text. 

    Even more text!



Some text. Some more text. Even more text!

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

You could use XPath's string() function.

If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.

Example of all three (yours and my two)...

Python

from lxml import etree



xml = """<doc>

    <div class="title_wrapper">Some text. Some <span>more</span> text. 

    <span>Even <span>m<span>o</span>re</span> text!</span>

    </div>

</doc>"""



tree = etree.fromstring(xml)



print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))



print(tree.xpath('string(//div[@class="title_wrapper"])'))



print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

Output

Some text. Some more text. 

    Even more text!



Some text. Some more text. 

    Even more text!



Some text. Some more text. Even more text!

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

answered Nov 7 at 19:55

Daniel Haley

38.3k45180

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk