2024-09-09 web, development, javascript
Web Scraping with Cheerio
By O. Wolfson
In this tutorial we'll be fetching quotes by Rodney Dangerfield from the AZQuotes website as an example.
Prerequisites:
- Basic knowledge of JavaScript and Node.js
- Node.js installed on your computer
Step 1: Install required packages
First, create a new folder for your project and navigate to it using the command line. Then, run the following command to initialize a new Node.js project:
Next, install Axios and Cheerio using the following command:
Step 2: Create a web scraping script
Create a new file called scrape_quotes.js and paste the code below into the file.
The code consists of several functions:
fetchQuotes(url)
: An async function that fetches quotes from a single page, tailored to the specific structure of the target website. It uses Axios to fetch the HTML content of the page and Cheerio to parse the HTML and extract the quotes. To create a function like this, one needs to investigate the particular HTML structure and elements' attributes of the site being scraped using a web browser's developer tools to inspect the page. This function relies on that knowledge to use jQuery-like selectors effectively, targeting and extracting the desired information from the web page.
fetchAllQuotes(baseURL, totalPages)
: An async function that iterates through multiple pages and fetches quotes from each page using the fetchQuotes function.
saveQuotesToFile(quotes, fileName)
: A function that saves the quotes array to a JSON file with the specified file name.
main()
: An async main function that fetches quotes from the AZQuotes website and saves them to a JSON file named quotes.json.
Step 3: Run the web scraping script
Run the web scraping script using the following command:
After the script finishes running, you should see a quotes.json file created in your project folder with the scraped quotes.
Step 4: Load quotes from the JSON file (optional)
If you want to load the saved quotes from the JSON file into a JavaScript object later, you can create another script called load_quotes.js with the following code:
This script defines a loadQuotesFromFile function that reads the JSON data from the specified file and parses it into a JavaScript object. The main function calls this function to load the quotes and logs them to the console.
Run the script with the command:
This should output the quotes as a JavaScript object in the console.
That's it! You've successfully created a web scraping script to fetch quotes using Cheerio and Node.js, and saved the quotes to a JSON file for later use.