OWolf

AboutBlogProjects
©2025 OWolf.com

Privacy

Contact

Web Development

How to Download Images from Google Images Using Puppeteer and Node.js

July 31, 2024

O Wolfson

In this article, we'll explore how to create a script that automates the process of downloading images from Google Images using Puppeteer, Node.js, and some helper functions. Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It's commonly used for web scraping, automating web pages, and running headless browsers.

Prerequisites

To follow along, you should have Node.js and npm installed on your system. You also need to install Puppeteer and Axios by running the following command:

bash
npm install puppeteer axios

Project Structure

Here's a quick overview of the files involved in this project:

  • index.js: The main script that handles the image downloading process.
  • search-google-images.js: A helper module to perform the Google Images search based on user input.

The Main Script (index.js)

This script launches a Puppeteer browser instance, navigates to the Google Images search results page, filters out irrelevant URLs, and downloads high-resolution images.

Step-by-Step Breakdown

  1. Import Necessary Modules:

    javascript
    const puppeteer = require("puppeteer");
    const fs = require("node:fs");
    const path = require("node:path");
    const axios = require("axios");
    const searchGoogleImages = require("./search-google-images");
    
  2. Function to Download Images:

    The downloadImage function uses Axios to stream and save images to the local file system.

    javascript
    const downloadImage = async (url, filepath) => {
      const writer = fs.createWriteStream(filepath);
      const response = await axios({
        url,
        method: "GET",
        responseType: "stream",
      });
      response.data.pipe(writer);
      return new Promise((resolve, reject) => {
        writer.on("finish", resolve);
        writer.on("error", reject);
      });
    };
    
  3. Ensure Directory Existence:

    This utility function checks if a directory exists and creates it if it doesn't.

    javascript
    const ensureDirectoryExistence = (dir) => {
      if (!fs.existsSync(dir)) {
        fs.mkdirSync(dir, { recursive: true });
      }
    };
    
  4. Main Function:

    The main function launches the Puppeteer browser, navigates to the Google Images search results page, extracts image URLs, and downloads the images.

    javascript
    (async () => {
      const browser = await puppeteer.launch({ headless: false });
      const page = await browser.newPage();
    
      const url = await searchGoogleImages();
      await page.goto(url, { waitUntil: "networkidle2" });
    
      const urls = await page.evaluate(() => {
        const anchors = Array.from(document.querySelectorAll("a"));
        return anchors
          .map((anchor) => anchor.href)
          .filter((href) => href && !href.includes("google"));
      });
    
      console.log("Filtered URLs found:", urls);
      fs.writeFileSync("filtered_urls.txt", urls.join("\n"), "utf-8");
      console.log("Filtered URLs have been saved to filtered_urls.txt");
    
      const imagesDir = path.resolve(__dirname, );
      (imagesDir);
    
        = ;
        = ;
    
       ( i = ; i < urls.; i++) {
         imagePage =  browser.();
         {
           imagePage.(urls[i], { :  });
           imageUrls =  imagePage.(
             {
               images = .(.());
               images
                .(
                  
                    img. >=  && img. >= 
                )
                .( img.)
                .( src?.());
            },
            ,
            
          );
    
           ( imageUrl  imageUrls) {
             imageFilename = path.( (imageUrl).);
             imageFilepath = path.(imagesDir, imageFilename);
             (imageUrl, imageFilepath);
            .();
          }
        }  (error) {
          .(, error);
        }  {
           imagePage.();
        }
      }
    
       browser.();
    })();
    

The Helper Module (search-google-images.js)

This module prompts the user for a search term, navigates to the Google Images search results page, and returns the final URL.

Step-by-Step Breakdown

  1. Import Necessary Modules:

    javascript
    const puppeteer = require("puppeteer");
    const readline = require("node:readline");
    
  2. Function to Get User Input:

    This function prompts the user for a search term.

    javascript
    const getUserInput = (query) => {
      const rl = readline.createInterface({
        input: process.stdin,
        output: process.stdout,
      });
      return new Promise((resolve) =>
        rl.question(query, (ans) => {
          rl.close();
          resolve(ans);
        })
      );
    };
    
  3. Search Google Images:

    This function launches a Puppeteer browser, navigates to the Google Images search results page based on the user input, and returns the final URL.

    javascript
    const searchGoogleImages = async () => {
      const searchTerm = await getUserInput("Enter the search term: ");
    
      const browser = await puppeteer.launch({ headless: false });
      const page =  browser.();
    
       searchUrl = ;
       page.(searchUrl, { :  });
    
       finalUrl = page.();
      .(, finalUrl);
    
       browser.();
    
       finalUrl;
    };
    
    . = searchGoogleImages;
    

Conclusion

In this article, we've walked through the process of creating a Node.js script that uses Puppeteer to search for images on Google Images and download high-resolution images. This script can be customized and extended to suit various web scraping and automation needs. The combination of Puppeteer and Node.js offers a powerful and flexible way to interact with web pages programmatically.

Feel free to experiment with the code and adapt it for your own projects! Happy coding!


▊
"images"
ensureDirectoryExistence
const
MIN_WIDTH
800
const
MIN_HEIGHT
600
for
let
0
length
const
await
newPage
try
await
goto
waitUntil
"networkidle2"
const
await
evaluate
(MIN_WIDTH, MIN_HEIGHT) =>
const
Array
from
document
querySelectorAll
"img"
return
filter
(img) =>
naturalWidth
MIN_WIDTH
naturalHeight
MIN_HEIGHT
map
(img) =>
src
filter
(src) =>
startsWith
"http"
MIN_WIDTH
MIN_HEIGHT
for
const
of
const
basename
new
URL
pathname
const
resolve
await
downloadImage
console
log
`Downloaded: ${imageFilepath}`
catch
console
error
`Failed to process ${urls[i]}:`
finally
await
close
await
close
await
newPage
const
`https://www.google.com/search?tbm=isch&q=${encodeURIComponent( searchTerm )}`
await
goto
waitUntil
"networkidle2"
const
url
console
log
"Final URL:"
await
close
return
module
exports