Get all text in an lxml node
up vote
2
down vote
favorite
I am using the following approach to print all the text (not html, but actual text contained) within an element node:
''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())
Is there a cleaner approach to doing this?
python lxml
add a comment |
up vote
2
down vote
favorite
I am using the following approach to print all the text (not html, but actual text contained) within an element node:
''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())
Is there a cleaner approach to doing this?
python lxml
You can also trynode.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am using the following approach to print all the text (not html, but actual text contained) within an element node:
''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())
Is there a cleaner approach to doing this?
python lxml
I am using the following approach to print all the text (not html, but actual text contained) within an element node:
''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())
Is there a cleaner approach to doing this?
python lxml
python lxml
asked Nov 7 at 18:51
David542
32.3k89245446
32.3k89245446
You can also trynode.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35
add a comment |
You can also trynode.xpath('//div[@class="title_wrapper"]')[0].text_content()
– Andersson
Nov 8 at 12:35
You can also try
node.xpath('//div[@class="title_wrapper"]')[0].text_content()– Andersson
Nov 8 at 12:35
You can also try
node.xpath('//div[@class="title_wrapper"]')[0].text_content()– Andersson
Nov 8 at 12:35
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You could use XPath's string() function.
If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.
Example of all three (yours and my two)...
Python
from lxml import etree
xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))
print(tree.xpath('string(//div[@class="title_wrapper"])'))
print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))
Output
Some text. Some more text.
Even more text!
Some text. Some more text.
Even more text!
Some text. Some more text. Even more text!
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could use XPath's string() function.
If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.
Example of all three (yours and my two)...
Python
from lxml import etree
xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))
print(tree.xpath('string(//div[@class="title_wrapper"])'))
print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))
Output
Some text. Some more text.
Even more text!
Some text. Some more text.
Even more text!
Some text. Some more text. Even more text!
add a comment |
up vote
1
down vote
accepted
You could use XPath's string() function.
If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.
Example of all three (yours and my two)...
Python
from lxml import etree
xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))
print(tree.xpath('string(//div[@class="title_wrapper"])'))
print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))
Output
Some text. Some more text.
Even more text!
Some text. Some more text.
Even more text!
Some text. Some more text. Even more text!
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could use XPath's string() function.
If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.
Example of all three (yours and my two)...
Python
from lxml import etree
xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))
print(tree.xpath('string(//div[@class="title_wrapper"])'))
print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))
Output
Some text. Some more text.
Even more text!
Some text. Some more text.
Even more text!
Some text. Some more text. Even more text!
You could use XPath's string() function.
If you have large chunks of whitespace from mixed content, you could use XPath's normalize-space() function.
Example of all three (yours and my two)...
Python
from lxml import etree
xml = """<doc>
<div class="title_wrapper">Some text. Some <span>more</span> text.
<span>Even <span>m<span>o</span>re</span> text!</span>
</div>
</doc>"""
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))
print(tree.xpath('string(//div[@class="title_wrapper"])'))
print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))
Output
Some text. Some more text.
Even more text!
Some text. Some more text.
Even more text!
Some text. Some more text. Even more text!
answered Nov 7 at 19:55
Daniel Haley
38.3k45180
38.3k45180
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195927%2fget-all-text-in-an-lxml-node%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You can also try
node.xpath('//div[@class="title_wrapper"]')[0].text_content()– Andersson
Nov 8 at 12:35