Using BeautifulSoup to extract text from div
up vote
0
down vote
favorite
I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:
<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>
This is the code that I am attempting to use
import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')
avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})
print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))
However, it does not include the price between the divs... the result looks like:
<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>
Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.
python html parsing beautifulsoup
add a comment |
up vote
0
down vote
favorite
I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:
<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>
This is the code that I am attempting to use
import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')
avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})
print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))
However, it does not include the price between the divs... the result looks like:
<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>
Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.
python html parsing beautifulsoup
1
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
The values are generated by executing some js codes, and hasn't been included inr.text
. If you can only userequests
, make all the same requests as a browser does.
– halfelf
Nov 8 at 2:53
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:
<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>
This is the code that I am attempting to use
import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')
avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})
print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))
However, it does not include the price between the divs... the result looks like:
<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>
Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.
python html parsing beautifulsoup
I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:
<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>
This is the code that I am attempting to use
import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')
avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})
print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))
However, it does not include the price between the divs... the result looks like:
<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>
Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.
python html parsing beautifulsoup
python html parsing beautifulsoup
asked Nov 8 at 2:43
ramseys1990
455
455
1
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
The values are generated by executing some js codes, and hasn't been included inr.text
. If you can only userequests
, make all the same requests as a browser does.
– halfelf
Nov 8 at 2:53
add a comment |
1
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
The values are generated by executing some js codes, and hasn't been included inr.text
. If you can only userequests
, make all the same requests as a browser does.
– halfelf
Nov 8 at 2:53
1
1
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
The values are generated by executing some js codes, and hasn't been included in
r.text
. If you can only use requests
, make all the same requests as a browser does.– halfelf
Nov 8 at 2:53
The values are generated by executing some js codes, and hasn't been included in
r.text
. If you can only use requests
, make all the same requests as a browser does.– halfelf
Nov 8 at 2:53
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False
when you use fiddler cert.
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
Output
{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
add a comment |
up vote
0
down vote
Try
avgPrice[0].text
For the rest, do the same.
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
add a comment |
up vote
0
down vote
You can strip out the text using the text
attribute:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False
when you use fiddler cert.
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
Output
{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
add a comment |
up vote
1
down vote
accepted
of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False
when you use fiddler cert.
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
Output
{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False
when you use fiddler cert.
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
Output
{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}
of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False
when you use fiddler cert.
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
Output
{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}
answered Nov 8 at 5:33
kcorlidy
1,531317
1,531317
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
add a comment |
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
this is a MUCH better way of doing it than what I was looking at! Thanks!
– ramseys1990
Nov 8 at 5:59
add a comment |
up vote
0
down vote
Try
avgPrice[0].text
For the rest, do the same.
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
add a comment |
up vote
0
down vote
Try
avgPrice[0].text
For the rest, do the same.
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
add a comment |
up vote
0
down vote
up vote
0
down vote
Try
avgPrice[0].text
For the rest, do the same.
Try
avgPrice[0].text
For the rest, do the same.
answered Nov 8 at 2:52
Chris D'mello
356
356
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
add a comment |
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
– Kamikaze_goldfish
Nov 8 at 3:01
add a comment |
up vote
0
down vote
You can strip out the text using the text
attribute:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
add a comment |
up vote
0
down vote
You can strip out the text using the text
attribute:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
add a comment |
up vote
0
down vote
up vote
0
down vote
You can strip out the text using the text
attribute:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
You can strip out the text using the text
attribute:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
edited yesterday
answered Nov 8 at 2:52
MrBear
1032
1032
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200807%2fusing-beautifulsoup-to-extract-text-from-div%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51
The values are generated by executing some js codes, and hasn't been included in
r.text
. If you can only userequests
, make all the same requests as a browser does.– halfelf
Nov 8 at 2:53