BeautifulSoup Practice Question (DS240623)

zk6290254938 · August 18, 2023, 4:07pm

https://quotes.toscrape.com/

Use the above link:-
1.to scrape the html content
2.parse through html content
3.title
4.title name
5.title string
6.title parent name
7. investigate first hyperlink
8.extract the quotes using find and find_all()
9.extract all authors name
10.extract all tags, return the output in array form.

dilip.dilip22 · August 19, 2023, 8:31am

import requests
from bs4 import BeautifulSoup
url = “https://quotes.toscrape.com/”
response = requests.get(url)
html_content = response.text
soup = BeautifulSoup(html_content, “html.parser”)
title = soup.title
title_name = title.text
title_string = title.string
title_parent_name = title.parent.name
first_hyperlink = soup.find(“a”)
first_hyperlink_text = first_hyperlink.text
first_hyperlink_href = first_hyperlink[“href”]
quotes = soup.find_all(“span”, class_=“text”)
quote_texts = [quote.get_text() for quote in quotes]
authors = soup.find_all(“small”, class_=“author”)
author_names = [author.get_text() for author in authors]
tags = soup.find_all(“div”, class_=“tags”)
tag_list = [tag.find_all(“a”, class_=“tag”) for tag in tags]
tag_names = [[tag_name.get_text() for tag_name in tag] for tag in tag_list]
print(“Title:”, title_name)
print(“Title String:”, title_string)
print(“Title Parent Name:”, title_parent_name)
print(“First Hyperlink Text:”, first_hyperlink_text)
print(“First Hyperlink Href:”, first_hyperlink_href)
print(“Quotes:”, quote_texts)
print(“Authors:”, author_names)
print(“Tags:”, tag_names)