Using BeautifulSoup to extract text from div











up vote
0
down vote

favorite












I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:



<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>


This is the code that I am attempting to use



import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')

avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})

print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))


However, it does not include the price between the divs... the result looks like:



<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>


Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.










share|improve this question


















  • 1




    Selenium because javascript
    – Kamikaze_goldfish
    Nov 8 at 2:51










  • The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
    – halfelf
    Nov 8 at 2:53

















up vote
0
down vote

favorite












I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:



<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>


This is the code that I am attempting to use



import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')

avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})

print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))


However, it does not include the price between the divs... the result looks like:



<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>


Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.










share|improve this question


















  • 1




    Selenium because javascript
    – Kamikaze_goldfish
    Nov 8 at 2:51










  • The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
    – halfelf
    Nov 8 at 2:53















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:



<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>


This is the code that I am attempting to use



import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')

avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})

print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))


However, it does not include the price between the divs... the result looks like:



<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>


Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.










share|improve this question













I am using the following snippet and attempting to parse a section of html from the link below, namely the div appears like:



<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>


This is the code that I am attempting to use



import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')

avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})

print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))


However, it does not include the price between the divs... the result looks like:



<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>


Any ideas? I'm sure i'm overlooking something small but i'm at wits end right now haha.







python html parsing beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 8 at 2:43









ramseys1990

455




455








  • 1




    Selenium because javascript
    – Kamikaze_goldfish
    Nov 8 at 2:51










  • The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
    – halfelf
    Nov 8 at 2:53
















  • 1




    Selenium because javascript
    – Kamikaze_goldfish
    Nov 8 at 2:51










  • The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
    – halfelf
    Nov 8 at 2:53










1




1




Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51




Selenium because javascript
– Kamikaze_goldfish
Nov 8 at 2:51












The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
– halfelf
Nov 8 at 2:53






The values are generated by executing some js codes, and hasn't been included in r.text. If you can only use requests, make all the same requests as a browser does.
– halfelf
Nov 8 at 2:53














3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False when you use fiddler cert.



import requests 

with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]

cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

url = "https://herf.io/bids/search/open"

price = "https://herf.io/bids/search/stats"

req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)


Output




{"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}







share|improve this answer





















  • this is a MUCH better way of doing it than what I was looking at! Thanks!
    – ramseys1990
    Nov 8 at 5:59


















up vote
0
down vote













Try



avgPrice[0].text 


For the rest, do the same.






share|improve this answer





















  • Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
    – Kamikaze_goldfish
    Nov 8 at 3:01


















up vote
0
down vote













You can strip out the text using the text attribute:



print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200807%2fusing-beautifulsoup-to-extract-text-from-div%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
    In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False when you use fiddler cert.



    import requests 

    with requests.Session() as se:
    se.headers = {
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
    "Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Accept-Encoding":"gzip, deflate, br",
    }
    data = [
    "search=tatuaje+tattoo",
    "types=",
    "sites=",
    ]

    cookies = {
    "Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

    url = "https://herf.io/bids/search/open"

    price = "https://herf.io/bids/search/stats"

    req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
    print(req.text)


    Output




    {"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}







    share|improve this answer





















    • this is a MUCH better way of doing it than what I was looking at! Thanks!
      – ramseys1990
      Nov 8 at 5:59















    up vote
    1
    down vote



    accepted










    of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
    In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False when you use fiddler cert.



    import requests 

    with requests.Session() as se:
    se.headers = {
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
    "Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Accept-Encoding":"gzip, deflate, br",
    }
    data = [
    "search=tatuaje+tattoo",
    "types=",
    "sites=",
    ]

    cookies = {
    "Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

    url = "https://herf.io/bids/search/open"

    price = "https://herf.io/bids/search/stats"

    req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
    print(req.text)


    Output




    {"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}







    share|improve this answer





















    • this is a MUCH better way of doing it than what I was looking at! Thanks!
      – ramseys1990
      Nov 8 at 5:59













    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
    In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False when you use fiddler cert.



    import requests 

    with requests.Session() as se:
    se.headers = {
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
    "Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Accept-Encoding":"gzip, deflate, br",
    }
    data = [
    "search=tatuaje+tattoo",
    "types=",
    "sites=",
    ]

    cookies = {
    "Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

    url = "https://herf.io/bids/search/open"

    price = "https://herf.io/bids/search/stats"

    req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
    print(req.text)


    Output




    {"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}







    share|improve this answer












    of course you can, but only when the data did not need to calculate by javascrip. IS NOW!
    In this website you can use fiddler to figure out which url did javascrip use to load data, then you can get json or other from it. This is an easy example, after i using fiddler to find out where data came from. Remember you need to set verify=False when you use fiddler cert.



    import requests 

    with requests.Session() as se:
    se.headers = {
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
    "Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Accept-Encoding":"gzip, deflate, br",
    }
    data = [
    "search=tatuaje+tattoo",
    "types=",
    "sites=",
    ]

    cookies = {
    "Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

    url = "https://herf.io/bids/search/open"

    price = "https://herf.io/bids/search/stats"

    req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
    print(req.text)


    Output




    {"bottomQuarter":4.4,"topQuarter":3.31,"median":3.8,"mean":4.03,"stddev":1.44,"moe":0.08,"good":2.59,"great":1.14,"poor":5.47,"bad":6.91,"best":0.2,"worst":15.98,"count":1121}








    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 8 at 5:33









    kcorlidy

    1,531317




    1,531317












    • this is a MUCH better way of doing it than what I was looking at! Thanks!
      – ramseys1990
      Nov 8 at 5:59


















    • this is a MUCH better way of doing it than what I was looking at! Thanks!
      – ramseys1990
      Nov 8 at 5:59
















    this is a MUCH better way of doing it than what I was looking at! Thanks!
    – ramseys1990
    Nov 8 at 5:59




    this is a MUCH better way of doing it than what I was looking at! Thanks!
    – ramseys1990
    Nov 8 at 5:59












    up vote
    0
    down vote













    Try



    avgPrice[0].text 


    For the rest, do the same.






    share|improve this answer





















    • Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
      – Kamikaze_goldfish
      Nov 8 at 3:01















    up vote
    0
    down vote













    Try



    avgPrice[0].text 


    For the rest, do the same.






    share|improve this answer





















    • Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
      – Kamikaze_goldfish
      Nov 8 at 3:01













    up vote
    0
    down vote










    up vote
    0
    down vote









    Try



    avgPrice[0].text 


    For the rest, do the same.






    share|improve this answer












    Try



    avgPrice[0].text 


    For the rest, do the same.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 8 at 2:52









    Chris D'mello

    356




    356












    • Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
      – Kamikaze_goldfish
      Nov 8 at 3:01


















    • Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
      – Kamikaze_goldfish
      Nov 8 at 3:01
















    Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
    – Kamikaze_goldfish
    Nov 8 at 3:01




    Problem is that he can't scrape the data in the first place. Unless you know how to scrape javascript ran data without selenium?
    – Kamikaze_goldfish
    Nov 8 at 3:01










    up vote
    0
    down vote













    You can strip out the text using the text attribute:



    print("Average Price: {}".format(avgPrice.text))
    print("Low Price: {}".format(lowPrice.text))
    print("High Price: {}".format(highPrice.text))





    share|improve this answer



























      up vote
      0
      down vote













      You can strip out the text using the text attribute:



      print("Average Price: {}".format(avgPrice.text))
      print("Low Price: {}".format(lowPrice.text))
      print("High Price: {}".format(highPrice.text))





      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        You can strip out the text using the text attribute:



        print("Average Price: {}".format(avgPrice.text))
        print("Low Price: {}".format(lowPrice.text))
        print("High Price: {}".format(highPrice.text))





        share|improve this answer














        You can strip out the text using the text attribute:



        print("Average Price: {}".format(avgPrice.text))
        print("Low Price: {}".format(lowPrice.text))
        print("High Price: {}".format(highPrice.text))






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday

























        answered Nov 8 at 2:52









        MrBear

        1032




        1032






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200807%2fusing-beautifulsoup-to-extract-text-from-div%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Xamarin.form Move up view when keyboard appear

            Post-Redirect-Get with Spring WebFlux and Thymeleaf

            Anylogic : not able to use stopDelay()