How to use Puppeteer: installation and quick start

Contents

1. How to use Puppeteer: installation and quick start

1.1 What is Puppeteer and what is it used for

1.2 How to Install Puppeteer

1.2.1 How to install Puppeteer on Debian, Ubuntu, Linux Mint, Kali Linux and derivatives

1.2.2 How to install Puppeteer on Arch Linux Manjaro, BlackArch and derivatives

1.2.3 How to update Puppeteer

1.3 How to Run Puppeteer

1.4 How to take a screenshot of a page using Puppeteer

1.5 How to change the size of the web browser window in Puppeteer

1.6 How to take a screenshot of the entire page in Puppeteer

1.7 How to save a screenshot in Puppeteer as a JPG

1.8 How to save a web page to PDF in Puppeteer

1.9 How to change User Agent in Puppeteer

1.10 How to emulate phones and other mobile devices in Puppeteer

1.11 How to take a screenshot in WEBP (Web/P image) format in Puppeteer. What screenshot formats does Puppeteer support

1.12 How to take a screenshot of a portion of the screen in Puppeteer

1.13 How to take a screenshot of a specific element in Puppeteer (by ID, by class name, by tag name)

2. Interacting with DOM in Puppeteer: how to get HTML code and extract various tags (text, images, links)

3. Advanced interacting with DOM in Puppeteer: disabling JavaScript, loading HTML without visiting the site, error handling, delaying and scrolling the page

4.

5.

6.


1.1 What is Puppeteer and what is it used for

Puppeteer is a JavaScript library that provides a high-level API for controlling Chrome or Firefox via DevTools Protocol or WebDriver BiDi. By default, Puppeteer works headless (without a visible user interface).

This is the official description, which is not so easy to understand. From a practical point of view, Puppeteer allows you to program interaction with web pages as if they were opened by a regular browser, but all actions are described by JavaScript scripts and allow you to work with websites exclusively from the console, without opening a web browser window. With JavaScript code, you can program a variety of actions: extract text or specific tags, click buttons, enter data into text fields, authenticate on sites (as if you entered your login and password and then clicked the “Login” button); you can get and set cookies, change the resolution of the web browser window, change the User Agent string and many other settings; you can also execute arbitrary JavaScript code on the page and output the result to the console or save it to a file.

In short, with Puppeteer you can do everything that you can with a regular web browser, but at the same time you have unlimited possibilities for automating interaction with websites and web pages.

Puppeteer is a headless web browser for Chrome and Firefox with the full functionality of full-fledged web browsers. That is, Puppeteer is not an imitation or something very similar to a web browser, it is a full-fledged web browser that has all the features of Chrome and Firefox, but works without opening a graphical window.

We will start with trivial examples, gradually moving on to more complex and more practical cases of using Puppeteer. This instruction will have several parts.

1.2 How to Install Puppeteer

1.2.1 How to install Puppeteer on Debian, Ubuntu, Linux Mint, Kali Linux and derivatives

Start by installing the JavaScript package manager – npm:

sudo apt install npm

Then run the following command:

npm i puppeteer

1.2.2 How to install Puppeteer on Arch Linux Manjaro, BlackArch and derivatives

Start by installing the JavaScript package manager – npm:

sudo pacman -S npm

Then run the following command:

npm i puppeteer

1.2.3 How to update Puppeteer

To update Puppeteer on any distribution, run the following command:

npm update

1.3 How to Run Puppeteer

The code snippets shown later in this guide should be saved to a file with the .js extension and run using node.

I'll start by creating a folder for tests and going into it – you can choose any other folder name or work directly in the home one – it's all up to you:

mkdir -p /home/mial/bin/tests/puppeteer
cd /home/mial/bin/tests/puppeteer

You can choose any name for the file with the .js extension – the instructions and examples usually use the name index.js, but you can choose anything else. We will not create a new project with the following command, as you can see in some other instructions:

npm init

For our purposes, this is not necessary.

When running the file, you can specify the extension or skip it - the following two commands are identical:

node index.js
node index

1.4 How to take a screenshot of a page using Puppeteer

Puppeteer allows you to get HTML code, as well as any DOM element – we will do all this later. But we will start with a visual example – getting a screenshot of a web page. This will help you understand how close Puppeteer's capabilities are to a regular web browser.

Create a directory called “screenshots” – screenshots will be saved there:

mkdir screenshots

Now create a file called screenshots.js and copy the following content into it:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
  
	await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4.png' });
  
	browser.close();
}

run();

Run the file like this:

node screenshots.js

After the script finishes running, a screenshot will appear in the “screenshots” directory.

I'm sure you can easily figure out how to change the URL and file name in the code above – you can experiment with different websites.

1.5 How to change the size of the web browser window in Puppeteer

If you ran my example, you might have noticed that only a part of the web page was captured, while the main content remained outside the field of view. This is because by default Puppeteer uses the system default settings for the height and width of the web browser window, and apparently this size is not enough for this page. In any case, you can set the desired size of the web browser window.

Create a file screenshots-viewport.js and copy the following content into it:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
  
	//await page.setViewport({width: 3440, height: 1440});
	await page.setViewport({width: 1440, height: 3440});
	await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-big.png' });
  
	browser.close();
}

run();

Run the file like this:

node screenshots-viewport

Again, I'm sure you've already guessed that the “width” and “height” options in the previous code snippet can be set using the width and height values. You can change them arbitrarily – feel free to experiment with this.

Documentation:

1.6 How to take a screenshot of the entire page in Puppeteer

The above shows how to change the size of the web browser window and, accordingly, the captured area of ​​the page when creating a screenshot. But you do not have to change the size of the web browser screen for each site – there is an option that allows you to take a screenshot of the entire page regardless of the size of this web page, and also regardless of the selected browser window size.

Create a file screenshot-fullpage.js and copy the following code into it:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
  
	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-full.png', fullPage: true });
  
	browser.close();
}

run();

Note that the option has been added:

fullPage: true

Run the file like this:

node screenshot-fullpage

You will get a screenshot of the entire page, regardless of its length, height, width, etc.

Note: The selected browser window size can affect how the page will look, since many websites change their design depending on the window size.

In the following example, we set the browser window wider than the default settings, which makes the site look different in the screenshot:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-full-wide.png', fullPage: true });

	browser.close();
}

run();

1.7 How to save a screenshot in Puppeteer as a JPG

To save a screenshot of a file as a JPG instead of a PNG as shown above, simply change the file extension in the “page.screenshot()” properties:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
  
	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-full.jpg', fullPage: true });
  
	browser.close();
}

run();

If desired, you can select the image quality for the JPG format (the PNG format does not support quality settings, since, unlike JPG, it is a lossless format, that is, it stores images without loss of quality).

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-high-quality.jpg', quality: 100, fullPage: true });

	browser.close();
}

run();

The image quality is set using the “quality” setting. Its maximum value is 100 (meaning maximum quality and maximum file size). The minimum value is 0.

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-low-quality.jpg', quality: 10, fullPage: true });

	browser.close();
}

run();

1.8 How to save a web page to PDF in Puppeteer

You can save a web page to PDF format. This will save the entire page, regardless of its width and height. The PDF file will also have a text layer, where you can select and copy text, as well as click on links (they will open in the default web browser).

Create a pdf.js file and copy the following code into it:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
  
	await page.goto('https://suip.biz/?act=client-tls-fingerprinting', {
		waitUntil: 'networkidle2',
	});
	await page.pdf({
		path: 'ja4.pdf',
		format: 'letter',
	});
  
	browser.close();
}

run();

Run the file as follows:

node pdf

Note the “format” setting – you can choose a different value. You can see the list of available values ​​here: https://pptr.dev/api/puppeteer.paperformat

Documentation:

1.9 How to change User Agent in Puppeteer

I think you noticed that in the screenshots above the following string was received as User Agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36

The “HeadlessChrome” string betrays us – that is, the server knows that we are using Puppeteer.

You can change the User Agent in Puppeteer to any value. I would recommend that instead of googling User Agent examples, just take the real User Agent string for your web browser. To find out your User Agent, go to the following page: https://suip.biz/?act=my-user-agent

For example, I got the following string for the current version of Chrome on Windows:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36

Now create a file user-agent.js with the following content:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	const customUserAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36';
  
	await page.setViewport({width: 1440, height: 3440});
	await page.setUserAgent(customUserAgent);
	await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ua-spoofed.png' });
  
	browser.close();
}

run();

Run the file like this:

node user-agent

Now the User Agent is not suspicious:

Documentation:

1.10 How to emulate phones and other mobile devices in Puppeteer

The Page.emulate() method is a shorthand for calling two methods: Page.setUserAgent() and Page.setViewport().

That is, this method will set the User Agent string and screen size specific to the selected device. The list of devices you can choose from is at this link: https://pptr.dev/api/puppeteer.knowndevices

This method will resize the page. Many websites do not expect phones to resize, so you should emulate this before navigating to the page.

Create an iphone.js file with the following content:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	
	await page.emulate(puppeteer.KnownDevices['iPhone 15 Pro']);
	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/iphone.png', fullPage: true });
  
	browser.close();
}

run();

And run it like this:

node iphone

This is how iPhone 15 Pro users see the page:

Documentation:

Note: at the time of writing, both examples in the official documentation are not working! And this instruction gives a working example.

1.11 How to take a screenshot in WEBP (Web/P image) format in Puppeteer. What screenshot formats does Puppeteer support

Puppeteer supports three screenshot formats:

  • png
  • jpeg
  • webp

You can specify the screenshot type either using the file extension or using the “type” setting. WEBP files support the "quality" setting.

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	await page.screenshot({ path: 'screenshots/ja4-high-quality.webp', quality: 100, type: 'webp', fullPage: true });

	browser.close();
}

run();

Documentation:

1.12 How to take a screenshot of a portion of the screen in Puppeteer

With Puppeteer's screenshot capture method, you can specify a specific area using coordinates. You must define the coordinates of the area you want to capture, which are based on the pixel position from the top left corner of the page. You will need four values:

  • x: The horizontal coordinate
  • y: The vertical coordinate
  • width: How wide the area should be
  • height: How tall the area should be

Once you have defined these coordinates, you can define the clipping region for the screenshot. Replace the values​​below with the ones you defined.

Once you have defined the clipping region, you can now pass that object to the screen() method to capture only the specified area.

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	const clip = {
		x: 850,
		y: 1400,
		width: 900,
		height: 700
	};
	await page.screenshot({ path: 'screenshots/ja4-clip.png', clip: clip });

	browser.close();
}

run();

1.13 How to take a screenshot of a specific element in Puppeteer (by ID, by class name, by tag name)

This question already relates to interaction with the DOM (HTML) of a web page, as well as the use of selectors – we will consider all this, but later. For now, I just want to demonstrate the flexibility and capabilities of Puppeteer. For example, the following code will grab only the data that is relevant to me from a web page:

const puppeteer = require('puppeteer');

async function run() {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.setViewport({ width: 2560, height: 1440 });

	await page.goto('https://suip.biz/?act=client-tls-fingerprinting');
	
	const element = await page.$$('pre');
	await element[1].screenshot({ path: 'screenshots/element.png' });

	browser.close();
}

run();

In the previous example, I used the tag name “pre” as a selector, and since there are two such tags on the page, I explicitly indicated that I am interested in the second tag (numbering starts from 0):

const element = await page.$$('pre');
await element[1].screenshot({ path: 'screenshots/element.png' });

You can also take screenshots of elements by class name or by unique id:

const element = await page.$('#unique-element-id');
await element.screenshot({ path: 'element.png' });

Continue reading: Interacting with DOM in Puppeteer: how to get HTML code and extract various tags (text, images, links)

Recommended for you:

Leave a Reply

Your email address will not be published. Required fields are marked *