Python scraping 'things to do' from tripadvisor





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















From this page, I want to scrape the list 'Types of Things to Do in Miami' (you can find it near the end of the page). Here's what I have so far:



import requests
from bs4 import BeautifulSoup

# Define header to prevent errors
user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"

headers = {'User-Agent': user_agent}

new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
# Get response from url
response = requests.get(new_url, headers = headers)
# Encode response for parsing
html = response.text.encode('utf-8')
# Soupify response
soup = BeautifulSoup(html, "lxml")

tag_elements = soup.findAll("a", {"class":"attractions-attraction-overview-main-Pill__pill--23S2Q"})

# Iterate over tag_elements and exctract strings
tags_list =
for i in tag_elements:
tags_list.append(i.string)


The problem is, I get values like 'Good for Couples (201)', 'Good for Big Groups (130)', 'Good for Kids (100)' which are from the 'Commonly Searched For in Miami' area of the page which is below the "Types of Things..." part of the page. I also don't get some of the values that I need like "Traveler Resources (7)", "Day Trips (7)" etc. The class names for both these lists "Things to do..." and "Commonly searched..." are same and I'm using class in soup.findAll() which might be the cause of this problem I guess. What is the correct way to do this? Is there some other approach that I should take?










share|improve this question





























    1















    From this page, I want to scrape the list 'Types of Things to Do in Miami' (you can find it near the end of the page). Here's what I have so far:



    import requests
    from bs4 import BeautifulSoup

    # Define header to prevent errors
    user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"

    headers = {'User-Agent': user_agent}

    new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
    # Get response from url
    response = requests.get(new_url, headers = headers)
    # Encode response for parsing
    html = response.text.encode('utf-8')
    # Soupify response
    soup = BeautifulSoup(html, "lxml")

    tag_elements = soup.findAll("a", {"class":"attractions-attraction-overview-main-Pill__pill--23S2Q"})

    # Iterate over tag_elements and exctract strings
    tags_list =
    for i in tag_elements:
    tags_list.append(i.string)


    The problem is, I get values like 'Good for Couples (201)', 'Good for Big Groups (130)', 'Good for Kids (100)' which are from the 'Commonly Searched For in Miami' area of the page which is below the "Types of Things..." part of the page. I also don't get some of the values that I need like "Traveler Resources (7)", "Day Trips (7)" etc. The class names for both these lists "Things to do..." and "Commonly searched..." are same and I'm using class in soup.findAll() which might be the cause of this problem I guess. What is the correct way to do this? Is there some other approach that I should take?










    share|improve this question

























      1












      1








      1








      From this page, I want to scrape the list 'Types of Things to Do in Miami' (you can find it near the end of the page). Here's what I have so far:



      import requests
      from bs4 import BeautifulSoup

      # Define header to prevent errors
      user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"

      headers = {'User-Agent': user_agent}

      new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
      # Get response from url
      response = requests.get(new_url, headers = headers)
      # Encode response for parsing
      html = response.text.encode('utf-8')
      # Soupify response
      soup = BeautifulSoup(html, "lxml")

      tag_elements = soup.findAll("a", {"class":"attractions-attraction-overview-main-Pill__pill--23S2Q"})

      # Iterate over tag_elements and exctract strings
      tags_list =
      for i in tag_elements:
      tags_list.append(i.string)


      The problem is, I get values like 'Good for Couples (201)', 'Good for Big Groups (130)', 'Good for Kids (100)' which are from the 'Commonly Searched For in Miami' area of the page which is below the "Types of Things..." part of the page. I also don't get some of the values that I need like "Traveler Resources (7)", "Day Trips (7)" etc. The class names for both these lists "Things to do..." and "Commonly searched..." are same and I'm using class in soup.findAll() which might be the cause of this problem I guess. What is the correct way to do this? Is there some other approach that I should take?










      share|improve this question














      From this page, I want to scrape the list 'Types of Things to Do in Miami' (you can find it near the end of the page). Here's what I have so far:



      import requests
      from bs4 import BeautifulSoup

      # Define header to prevent errors
      user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"

      headers = {'User-Agent': user_agent}

      new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
      # Get response from url
      response = requests.get(new_url, headers = headers)
      # Encode response for parsing
      html = response.text.encode('utf-8')
      # Soupify response
      soup = BeautifulSoup(html, "lxml")

      tag_elements = soup.findAll("a", {"class":"attractions-attraction-overview-main-Pill__pill--23S2Q"})

      # Iterate over tag_elements and exctract strings
      tags_list =
      for i in tag_elements:
      tags_list.append(i.string)


      The problem is, I get values like 'Good for Couples (201)', 'Good for Big Groups (130)', 'Good for Kids (100)' which are from the 'Commonly Searched For in Miami' area of the page which is below the "Types of Things..." part of the page. I also don't get some of the values that I need like "Traveler Resources (7)", "Day Trips (7)" etc. The class names for both these lists "Things to do..." and "Commonly searched..." are same and I'm using class in soup.findAll() which might be the cause of this problem I guess. What is the correct way to do this? Is there some other approach that I should take?







      python web-scraping beautifulsoup tripadvisor






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 20:58









      Vishesh ShrivastavVishesh Shrivastav

      1,2852824




      1,2852824
























          4 Answers
          4






          active

          oldest

          votes


















          1














          To get only the contents within Types of Things to Do in Miami headers is a little bit tricky. To do so you need to define the selectors in an organized manner like I did below. The following script should click on the See all buton under the aforesaid headers. Once the click is initiated, the script will parse the relevant content you look for:



          from selenium import webdriver
          from selenium.webdriver.support import ui
          from bs4 import BeautifulSoup

          driver = webdriver.Chrome()
          wait = ui.WebDriverWait(driver, 10)
          driver.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")

          show_more = wait.until(lambda driver: driver.find_element_by_css_selector("[class='ui_container'] div:nth-of-type(1) .caret-down"))
          driver.execute_script("arguments[0].click();",show_more)
          soup = BeautifulSoup(driver.page_source,"lxml")
          items = [item.text for item in soup.select("[class='ui_container'] div:nth-of-type(1) a[href^='/Attractions-']")]
          print(items)
          driver.quit()


          The output It produces:



          ['Tours (277)', 'Outdoor Activities (255)', 'Boat Tours & Water Sports (184)', 'Shopping (126)', 'Nightlife (126)', 'Spas & Wellness (109)', 'Fun & Games (67)', 'Transportation (66)', 'Museums (61)', 'Sights & Landmarks (54)', 'Nature & Parks (54)', 'Food & Drink (27)', 'Concerts & Shows (25)', 'Classes & Workshops (22)', 'Zoos & Aquariums (7)', 'Traveler Resources (7)', 'Day Trips (7)', 'Water & Amusement Parks (5)', 'Casinos & Gambling (3)', 'Events (2)']





          share|improve this answer

































            3














            This is pretty straightforward to do in the browser:



            filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")





            share|improve this answer































              2














              Looks like you'll need to use selenium. The problem is the dropdown doesn't show the remaining options until after you click it.



              from selenium import webdriver
              from selenium.webdriver.chrome.options import Options
              from bs4 import BeautifulSoup
              from selenium.webdriver.common.by import By
              from selenium.webdriver.support.ui import WebDriverWait
              from selenium.webdriver.support import expected_conditions as EC

              options = Options()
              driver = webdriver.Chrome(options=options)
              driver.get('https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html')

              WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span')))


              driver.execute_script("arguments[0].scrollIntoView();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))
              driver.execute_script("arguments[0].click();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))


              html = driver.page_source
              soup = BeautifulSoup(html, 'lxml')

              items = soup.findAll('a', {'class':'attractions-attraction-overview-main-Pill__pill--23S2Q'})
              #You could use this to not just get text but also the ['href'] too.

              for item in items:
              print(item.get_text())


              driver.quit()





              share|improve this answer































                2














                I think you need to be able to click the show more to see all the available. So use something like selenium. This includes waits to ensure all elements are present and for drop down to be clickable.



                from selenium import webdriver
                from selenium.webdriver.support.ui import WebDriverWait
                from selenium.webdriver.support import expected_conditions as EC
                from selenium.webdriver.common.by import By

                d = webdriver.Chrome()
                d.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")
                WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a")))
                WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#taplc_attraction_filters_clarity_0 span.ui_icon.caret-down"))).click()
                tag_elements = WebDriverWait(d,5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".filter_list_0 div a")))
                tags_list = [i.text for i in tag_elements]
                print(tags_list)
                d.quit()




                enter image description here





                Without selenium I only get 15 items



                import requests
                from bs4 import BeautifulSoup

                user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"
                headers = {'User-Agent': user_agent}
                new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
                response = requests.get(new_url, headers = headers)
                soup = BeautifulSoup(response.content, "lxml")
                tag_elements = soup.select('#component_3 > div > div > div:nth-of-type(12) > div:nth-of-type(1) > div > div a')

                tags_list = [i.text for i in tag_elements]
                print(tags_list)





                share|improve this answer


























                • The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                  – Vishesh Shrivastav
                  Nov 24 '18 at 2:36











                • Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                  – QHarr
                  Nov 24 '18 at 5:52














                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452863%2fpython-scraping-things-to-do-from-tripadvisor%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                To get only the contents within Types of Things to Do in Miami headers is a little bit tricky. To do so you need to define the selectors in an organized manner like I did below. The following script should click on the See all buton under the aforesaid headers. Once the click is initiated, the script will parse the relevant content you look for:



                from selenium import webdriver
                from selenium.webdriver.support import ui
                from bs4 import BeautifulSoup

                driver = webdriver.Chrome()
                wait = ui.WebDriverWait(driver, 10)
                driver.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")

                show_more = wait.until(lambda driver: driver.find_element_by_css_selector("[class='ui_container'] div:nth-of-type(1) .caret-down"))
                driver.execute_script("arguments[0].click();",show_more)
                soup = BeautifulSoup(driver.page_source,"lxml")
                items = [item.text for item in soup.select("[class='ui_container'] div:nth-of-type(1) a[href^='/Attractions-']")]
                print(items)
                driver.quit()


                The output It produces:



                ['Tours (277)', 'Outdoor Activities (255)', 'Boat Tours & Water Sports (184)', 'Shopping (126)', 'Nightlife (126)', 'Spas & Wellness (109)', 'Fun & Games (67)', 'Transportation (66)', 'Museums (61)', 'Sights & Landmarks (54)', 'Nature & Parks (54)', 'Food & Drink (27)', 'Concerts & Shows (25)', 'Classes & Workshops (22)', 'Zoos & Aquariums (7)', 'Traveler Resources (7)', 'Day Trips (7)', 'Water & Amusement Parks (5)', 'Casinos & Gambling (3)', 'Events (2)']





                share|improve this answer






























                  1














                  To get only the contents within Types of Things to Do in Miami headers is a little bit tricky. To do so you need to define the selectors in an organized manner like I did below. The following script should click on the See all buton under the aforesaid headers. Once the click is initiated, the script will parse the relevant content you look for:



                  from selenium import webdriver
                  from selenium.webdriver.support import ui
                  from bs4 import BeautifulSoup

                  driver = webdriver.Chrome()
                  wait = ui.WebDriverWait(driver, 10)
                  driver.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")

                  show_more = wait.until(lambda driver: driver.find_element_by_css_selector("[class='ui_container'] div:nth-of-type(1) .caret-down"))
                  driver.execute_script("arguments[0].click();",show_more)
                  soup = BeautifulSoup(driver.page_source,"lxml")
                  items = [item.text for item in soup.select("[class='ui_container'] div:nth-of-type(1) a[href^='/Attractions-']")]
                  print(items)
                  driver.quit()


                  The output It produces:



                  ['Tours (277)', 'Outdoor Activities (255)', 'Boat Tours & Water Sports (184)', 'Shopping (126)', 'Nightlife (126)', 'Spas & Wellness (109)', 'Fun & Games (67)', 'Transportation (66)', 'Museums (61)', 'Sights & Landmarks (54)', 'Nature & Parks (54)', 'Food & Drink (27)', 'Concerts & Shows (25)', 'Classes & Workshops (22)', 'Zoos & Aquariums (7)', 'Traveler Resources (7)', 'Day Trips (7)', 'Water & Amusement Parks (5)', 'Casinos & Gambling (3)', 'Events (2)']





                  share|improve this answer




























                    1












                    1








                    1







                    To get only the contents within Types of Things to Do in Miami headers is a little bit tricky. To do so you need to define the selectors in an organized manner like I did below. The following script should click on the See all buton under the aforesaid headers. Once the click is initiated, the script will parse the relevant content you look for:



                    from selenium import webdriver
                    from selenium.webdriver.support import ui
                    from bs4 import BeautifulSoup

                    driver = webdriver.Chrome()
                    wait = ui.WebDriverWait(driver, 10)
                    driver.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")

                    show_more = wait.until(lambda driver: driver.find_element_by_css_selector("[class='ui_container'] div:nth-of-type(1) .caret-down"))
                    driver.execute_script("arguments[0].click();",show_more)
                    soup = BeautifulSoup(driver.page_source,"lxml")
                    items = [item.text for item in soup.select("[class='ui_container'] div:nth-of-type(1) a[href^='/Attractions-']")]
                    print(items)
                    driver.quit()


                    The output It produces:



                    ['Tours (277)', 'Outdoor Activities (255)', 'Boat Tours & Water Sports (184)', 'Shopping (126)', 'Nightlife (126)', 'Spas & Wellness (109)', 'Fun & Games (67)', 'Transportation (66)', 'Museums (61)', 'Sights & Landmarks (54)', 'Nature & Parks (54)', 'Food & Drink (27)', 'Concerts & Shows (25)', 'Classes & Workshops (22)', 'Zoos & Aquariums (7)', 'Traveler Resources (7)', 'Day Trips (7)', 'Water & Amusement Parks (5)', 'Casinos & Gambling (3)', 'Events (2)']





                    share|improve this answer















                    To get only the contents within Types of Things to Do in Miami headers is a little bit tricky. To do so you need to define the selectors in an organized manner like I did below. The following script should click on the See all buton under the aforesaid headers. Once the click is initiated, the script will parse the relevant content you look for:



                    from selenium import webdriver
                    from selenium.webdriver.support import ui
                    from bs4 import BeautifulSoup

                    driver = webdriver.Chrome()
                    wait = ui.WebDriverWait(driver, 10)
                    driver.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")

                    show_more = wait.until(lambda driver: driver.find_element_by_css_selector("[class='ui_container'] div:nth-of-type(1) .caret-down"))
                    driver.execute_script("arguments[0].click();",show_more)
                    soup = BeautifulSoup(driver.page_source,"lxml")
                    items = [item.text for item in soup.select("[class='ui_container'] div:nth-of-type(1) a[href^='/Attractions-']")]
                    print(items)
                    driver.quit()


                    The output It produces:



                    ['Tours (277)', 'Outdoor Activities (255)', 'Boat Tours & Water Sports (184)', 'Shopping (126)', 'Nightlife (126)', 'Spas & Wellness (109)', 'Fun & Games (67)', 'Transportation (66)', 'Museums (61)', 'Sights & Landmarks (54)', 'Nature & Parks (54)', 'Food & Drink (27)', 'Concerts & Shows (25)', 'Classes & Workshops (22)', 'Zoos & Aquariums (7)', 'Traveler Resources (7)', 'Day Trips (7)', 'Water & Amusement Parks (5)', 'Casinos & Gambling (3)', 'Events (2)']






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 24 '18 at 5:35

























                    answered Nov 24 '18 at 5:30









                    SIMSIM

                    10.9k31148




                    10.9k31148

























                        3














                        This is pretty straightforward to do in the browser:



                        filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")





                        share|improve this answer




























                          3














                          This is pretty straightforward to do in the browser:



                          filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")





                          share|improve this answer


























                            3












                            3








                            3







                            This is pretty straightforward to do in the browser:



                            filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")





                            share|improve this answer













                            This is pretty straightforward to do in the browser:



                            filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 23 '18 at 23:52









                            pguardiariopguardiario

                            37k980118




                            37k980118























                                2














                                Looks like you'll need to use selenium. The problem is the dropdown doesn't show the remaining options until after you click it.



                                from selenium import webdriver
                                from selenium.webdriver.chrome.options import Options
                                from bs4 import BeautifulSoup
                                from selenium.webdriver.common.by import By
                                from selenium.webdriver.support.ui import WebDriverWait
                                from selenium.webdriver.support import expected_conditions as EC

                                options = Options()
                                driver = webdriver.Chrome(options=options)
                                driver.get('https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html')

                                WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span')))


                                driver.execute_script("arguments[0].scrollIntoView();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))
                                driver.execute_script("arguments[0].click();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))


                                html = driver.page_source
                                soup = BeautifulSoup(html, 'lxml')

                                items = soup.findAll('a', {'class':'attractions-attraction-overview-main-Pill__pill--23S2Q'})
                                #You could use this to not just get text but also the ['href'] too.

                                for item in items:
                                print(item.get_text())


                                driver.quit()





                                share|improve this answer




























                                  2














                                  Looks like you'll need to use selenium. The problem is the dropdown doesn't show the remaining options until after you click it.



                                  from selenium import webdriver
                                  from selenium.webdriver.chrome.options import Options
                                  from bs4 import BeautifulSoup
                                  from selenium.webdriver.common.by import By
                                  from selenium.webdriver.support.ui import WebDriverWait
                                  from selenium.webdriver.support import expected_conditions as EC

                                  options = Options()
                                  driver = webdriver.Chrome(options=options)
                                  driver.get('https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html')

                                  WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span')))


                                  driver.execute_script("arguments[0].scrollIntoView();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))
                                  driver.execute_script("arguments[0].click();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))


                                  html = driver.page_source
                                  soup = BeautifulSoup(html, 'lxml')

                                  items = soup.findAll('a', {'class':'attractions-attraction-overview-main-Pill__pill--23S2Q'})
                                  #You could use this to not just get text but also the ['href'] too.

                                  for item in items:
                                  print(item.get_text())


                                  driver.quit()





                                  share|improve this answer


























                                    2












                                    2








                                    2







                                    Looks like you'll need to use selenium. The problem is the dropdown doesn't show the remaining options until after you click it.



                                    from selenium import webdriver
                                    from selenium.webdriver.chrome.options import Options
                                    from bs4 import BeautifulSoup
                                    from selenium.webdriver.common.by import By
                                    from selenium.webdriver.support.ui import WebDriverWait
                                    from selenium.webdriver.support import expected_conditions as EC

                                    options = Options()
                                    driver = webdriver.Chrome(options=options)
                                    driver.get('https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html')

                                    WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span')))


                                    driver.execute_script("arguments[0].scrollIntoView();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))
                                    driver.execute_script("arguments[0].click();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))


                                    html = driver.page_source
                                    soup = BeautifulSoup(html, 'lxml')

                                    items = soup.findAll('a', {'class':'attractions-attraction-overview-main-Pill__pill--23S2Q'})
                                    #You could use this to not just get text but also the ['href'] too.

                                    for item in items:
                                    print(item.get_text())


                                    driver.quit()





                                    share|improve this answer













                                    Looks like you'll need to use selenium. The problem is the dropdown doesn't show the remaining options until after you click it.



                                    from selenium import webdriver
                                    from selenium.webdriver.chrome.options import Options
                                    from bs4 import BeautifulSoup
                                    from selenium.webdriver.common.by import By
                                    from selenium.webdriver.support.ui import WebDriverWait
                                    from selenium.webdriver.support import expected_conditions as EC

                                    options = Options()
                                    driver = webdriver.Chrome(options=options)
                                    driver.get('https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html')

                                    WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span')))


                                    driver.execute_script("arguments[0].scrollIntoView();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))
                                    driver.execute_script("arguments[0].click();", driver.find_element_by_xpath('//*[@id="component_3"]/div/div/div[12]/div[1]/div/div/div/div[1]/span'))


                                    html = driver.page_source
                                    soup = BeautifulSoup(html, 'lxml')

                                    items = soup.findAll('a', {'class':'attractions-attraction-overview-main-Pill__pill--23S2Q'})
                                    #You could use this to not just get text but also the ['href'] too.

                                    for item in items:
                                    print(item.get_text())


                                    driver.quit()






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 23 '18 at 22:48









                                    Kamikaze_goldfishKamikaze_goldfish

                                    498311




                                    498311























                                        2














                                        I think you need to be able to click the show more to see all the available. So use something like selenium. This includes waits to ensure all elements are present and for drop down to be clickable.



                                        from selenium import webdriver
                                        from selenium.webdriver.support.ui import WebDriverWait
                                        from selenium.webdriver.support import expected_conditions as EC
                                        from selenium.webdriver.common.by import By

                                        d = webdriver.Chrome()
                                        d.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")
                                        WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a")))
                                        WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#taplc_attraction_filters_clarity_0 span.ui_icon.caret-down"))).click()
                                        tag_elements = WebDriverWait(d,5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".filter_list_0 div a")))
                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)
                                        d.quit()




                                        enter image description here





                                        Without selenium I only get 15 items



                                        import requests
                                        from bs4 import BeautifulSoup

                                        user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"
                                        headers = {'User-Agent': user_agent}
                                        new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
                                        response = requests.get(new_url, headers = headers)
                                        soup = BeautifulSoup(response.content, "lxml")
                                        tag_elements = soup.select('#component_3 > div > div > div:nth-of-type(12) > div:nth-of-type(1) > div > div a')

                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)





                                        share|improve this answer


























                                        • The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                          – Vishesh Shrivastav
                                          Nov 24 '18 at 2:36











                                        • Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                          – QHarr
                                          Nov 24 '18 at 5:52


















                                        2














                                        I think you need to be able to click the show more to see all the available. So use something like selenium. This includes waits to ensure all elements are present and for drop down to be clickable.



                                        from selenium import webdriver
                                        from selenium.webdriver.support.ui import WebDriverWait
                                        from selenium.webdriver.support import expected_conditions as EC
                                        from selenium.webdriver.common.by import By

                                        d = webdriver.Chrome()
                                        d.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")
                                        WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a")))
                                        WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#taplc_attraction_filters_clarity_0 span.ui_icon.caret-down"))).click()
                                        tag_elements = WebDriverWait(d,5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".filter_list_0 div a")))
                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)
                                        d.quit()




                                        enter image description here





                                        Without selenium I only get 15 items



                                        import requests
                                        from bs4 import BeautifulSoup

                                        user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"
                                        headers = {'User-Agent': user_agent}
                                        new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
                                        response = requests.get(new_url, headers = headers)
                                        soup = BeautifulSoup(response.content, "lxml")
                                        tag_elements = soup.select('#component_3 > div > div > div:nth-of-type(12) > div:nth-of-type(1) > div > div a')

                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)





                                        share|improve this answer


























                                        • The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                          – Vishesh Shrivastav
                                          Nov 24 '18 at 2:36











                                        • Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                          – QHarr
                                          Nov 24 '18 at 5:52
















                                        2












                                        2








                                        2







                                        I think you need to be able to click the show more to see all the available. So use something like selenium. This includes waits to ensure all elements are present and for drop down to be clickable.



                                        from selenium import webdriver
                                        from selenium.webdriver.support.ui import WebDriverWait
                                        from selenium.webdriver.support import expected_conditions as EC
                                        from selenium.webdriver.common.by import By

                                        d = webdriver.Chrome()
                                        d.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")
                                        WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a")))
                                        WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#taplc_attraction_filters_clarity_0 span.ui_icon.caret-down"))).click()
                                        tag_elements = WebDriverWait(d,5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".filter_list_0 div a")))
                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)
                                        d.quit()




                                        enter image description here





                                        Without selenium I only get 15 items



                                        import requests
                                        from bs4 import BeautifulSoup

                                        user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"
                                        headers = {'User-Agent': user_agent}
                                        new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
                                        response = requests.get(new_url, headers = headers)
                                        soup = BeautifulSoup(response.content, "lxml")
                                        tag_elements = soup.select('#component_3 > div > div > div:nth-of-type(12) > div:nth-of-type(1) > div > div a')

                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)





                                        share|improve this answer















                                        I think you need to be able to click the show more to see all the available. So use something like selenium. This includes waits to ensure all elements are present and for drop down to be clickable.



                                        from selenium import webdriver
                                        from selenium.webdriver.support.ui import WebDriverWait
                                        from selenium.webdriver.support import expected_conditions as EC
                                        from selenium.webdriver.common.by import By

                                        d = webdriver.Chrome()
                                        d.get("https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html")
                                        WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a")))
                                        WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#taplc_attraction_filters_clarity_0 span.ui_icon.caret-down"))).click()
                                        tag_elements = WebDriverWait(d,5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".filter_list_0 div a")))
                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)
                                        d.quit()




                                        enter image description here





                                        Without selenium I only get 15 items



                                        import requests
                                        from bs4 import BeautifulSoup

                                        user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"
                                        headers = {'User-Agent': user_agent}
                                        new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
                                        response = requests.get(new_url, headers = headers)
                                        soup = BeautifulSoup(response.content, "lxml")
                                        tag_elements = soup.select('#component_3 > div > div > div:nth-of-type(12) > div:nth-of-type(1) > div > div a')

                                        tags_list = [i.text for i in tag_elements]
                                        print(tags_list)






                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Nov 23 '18 at 22:52

























                                        answered Nov 23 '18 at 21:59









                                        QHarrQHarr

                                        37.9k82245




                                        37.9k82245













                                        • The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                          – Vishesh Shrivastav
                                          Nov 24 '18 at 2:36











                                        • Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                          – QHarr
                                          Nov 24 '18 at 5:52





















                                        • The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                          – Vishesh Shrivastav
                                          Nov 24 '18 at 2:36











                                        • Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                          – QHarr
                                          Nov 24 '18 at 5:52



















                                        The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                        – Vishesh Shrivastav
                                        Nov 24 '18 at 2:36





                                        The line WebDriverWait(d,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".filter_list_0 div a"))) results in a TimeoutException: Message: with no message displayed. I changed the time to 10 and 20 but it results in the same.

                                        – Vishesh Shrivastav
                                        Nov 24 '18 at 2:36













                                        Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                        – QHarr
                                        Nov 24 '18 at 5:52







                                        Odd. What happens if you comment out that line and increase the wait on the next line to 10? You can always execute_script on the dropdown otherwise to move things along.

                                        – QHarr
                                        Nov 24 '18 at 5:52




















                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452863%2fpython-scraping-things-to-do-from-tripadvisor%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        這個網誌中的熱門文章

                                        Xamarin.form Move up view when keyboard appear

                                        Post-Redirect-Get with Spring WebFlux and Thymeleaf

                                        Anylogic : not able to use stopDelay()