How to use Puppeteer: installation and quick start
Contents
1. How to use Puppeteer: installation and quick start
1.1 What is Puppeteer and what is it used for
1.2.1 How to install Puppeteer on Debian, Ubuntu, Linux Mint, Kali Linux and derivatives
1.2.2 How to install Puppeteer on Arch Linux Manjaro, BlackArch and derivatives
1.4 How to take a screenshot of a page using Puppeteer
1.5 How to change the size of the web browser window in Puppeteer
1.6 How to take a screenshot of the entire page in Puppeteer
1.7 How to save a screenshot in Puppeteer as a JPG
1.8 How to save a web page to PDF in Puppeteer
1.9 How to change User Agent in Puppeteer
1.10 How to emulate phones and other mobile devices in Puppeteer
1.12 How to take a screenshot of a portion of the screen in Puppeteer
1.13 How to take a screenshot of a specific element in Puppeteer (by ID, by class name, by tag name)
3. Advanced interacting with DOM in Puppeteer: disabling JavaScript, loading HTML without visiting the site, error handling, delaying and scrolling the page
4.
5.
6.
1.1 What is Puppeteer and what is it used for
Puppeteer is a JavaScript library that provides a high-level API for controlling Chrome or Firefox via DevTools Protocol or WebDriver BiDi. By default, Puppeteer works headless (without a visible user interface).
This is the official description, which is not so easy to understand. From a practical point of view, Puppeteer allows you to program interaction with web pages as if they were opened by a regular browser, but all actions are described by JavaScript scripts and allow you to work with websites exclusively from the console, without opening a web browser window. With JavaScript code, you can program a variety of actions: extract text or specific tags, click buttons, enter data into text fields, authenticate on sites (as if you entered your login and password and then clicked the “Login” button); you can get and set cookies, change the resolution of the web browser window, change the User Agent string and many other settings; you can also execute arbitrary JavaScript code on the page and output the result to the console or save it to a file.
In short, with Puppeteer you can do everything that you can with a regular web browser, but at the same time you have unlimited possibilities for automating interaction with websites and web pages.
Puppeteer is a headless web browser for Chrome and Firefox with the full functionality of full-fledged web browsers. That is, Puppeteer is not an imitation or something very similar to a web browser, it is a full-fledged web browser that has all the features of Chrome and Firefox, but works without opening a graphical window.
We will start with trivial examples, gradually moving on to more complex and more practical cases of using Puppeteer. This instruction will have several parts.
1.2 How to Install Puppeteer
1.2.1 How to install Puppeteer on Debian, Ubuntu, Linux Mint, Kali Linux and derivatives
Start by installing the JavaScript package manager – npm:
sudo apt install npm
Then run the following command:
npm i puppeteer
1.2.2 How to install Puppeteer on Arch Linux Manjaro, BlackArch and derivatives
Start by installing the JavaScript package manager – npm:
sudo pacman -S npm
Then run the following command:
npm i puppeteer
1.2.3 How to update Puppeteer
To update Puppeteer on any distribution, run the following command:
npm update
1.3 How to Run Puppeteer
The code snippets shown later in this guide should be saved to a file with the .js extension and run using node.
I'll start by creating a folder for tests and going into it – you can choose any other folder name or work directly in the home one – it's all up to you:
mkdir -p /home/mial/bin/tests/puppeteer cd /home/mial/bin/tests/puppeteer
You can choose any name for the file with the .js extension – the instructions and examples usually use the name index.js, but you can choose anything else. We will not create a new project with the following command, as you can see in some other instructions:
npm init
For our purposes, this is not necessary.
When running the file, you can specify the extension or skip it - the following two commands are identical:
node index.js node index
1.4 How to take a screenshot of a page using Puppeteer
Puppeteer allows you to get HTML code, as well as any DOM element – we will do all this later. But we will start with a visual example – getting a screenshot of a web page. This will help you understand how close Puppeteer's capabilities are to a regular web browser.
Create a directory called “screenshots” – screenshots will be saved there:
mkdir screenshots
Now create a file called screenshots.js and copy the following content into it:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4.png' }); browser.close(); } run();
Run the file like this:
node screenshots.js
After the script finishes running, a screenshot will appear in the “screenshots” directory.
I'm sure you can easily figure out how to change the URL and file name in the code above – you can experiment with different websites.
1.5 How to change the size of the web browser window in Puppeteer
If you ran my example, you might have noticed that only a part of the web page was captured, while the main content remained outside the field of view. This is because by default Puppeteer uses the system default settings for the height and width of the web browser window, and apparently this size is not enough for this page. In any case, you can set the desired size of the web browser window.
Create a file screenshots-viewport.js and copy the following content into it:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); //await page.setViewport({width: 3440, height: 1440}); await page.setViewport({width: 1440, height: 3440}); await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-big.png' }); browser.close(); } run();
Run the file like this:
node screenshots-viewport
Again, I'm sure you've already guessed that the “width” and “height” options in the previous code snippet can be set using the width and height values. You can change them arbitrarily – feel free to experiment with this.
Documentation:
1.6 How to take a screenshot of the entire page in Puppeteer
The above shows how to change the size of the web browser window and, accordingly, the captured area of the page when creating a screenshot. But you do not have to change the size of the web browser screen for each site – there is an option that allows you to take a screenshot of the entire page regardless of the size of this web page, and also regardless of the selected browser window size.
Create a file screenshot-fullpage.js and copy the following code into it:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-full.png', fullPage: true }); browser.close(); } run();
Note that the option has been added:
fullPage: true
Run the file like this:
node screenshot-fullpage
You will get a screenshot of the entire page, regardless of its length, height, width, etc.
Note: The selected browser window size can affect how the page will look, since many websites change their design depending on the window size.
In the following example, we set the browser window wider than the default settings, which makes the site look different in the screenshot:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-full-wide.png', fullPage: true }); browser.close(); } run();
1.7 How to save a screenshot in Puppeteer as a JPG
To save a screenshot of a file as a JPG instead of a PNG as shown above, simply change the file extension in the “page.screenshot()” properties:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-full.jpg', fullPage: true }); browser.close(); } run();
If desired, you can select the image quality for the JPG format (the PNG format does not support quality settings, since, unlike JPG, it is a lossless format, that is, it stores images without loss of quality).
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-high-quality.jpg', quality: 100, fullPage: true }); browser.close(); } run();
The image quality is set using the “quality” setting. Its maximum value is 100 (meaning maximum quality and maximum file size). The minimum value is 0.
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-low-quality.jpg', quality: 10, fullPage: true }); browser.close(); } run();
1.8 How to save a web page to PDF in Puppeteer
You can save a web page to PDF format. This will save the entire page, regardless of its width and height. The PDF file will also have a text layer, where you can select and copy text, as well as click on links (they will open in the default web browser).
Create a pdf.js file and copy the following code into it:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://suip.biz/?act=client-tls-fingerprinting', { waitUntil: 'networkidle2', }); await page.pdf({ path: 'ja4.pdf', format: 'letter', }); browser.close(); } run();
Run the file as follows:
node pdf
Note the “format” setting – you can choose a different value. You can see the list of available values here: https://pptr.dev/api/puppeteer.paperformat
Documentation:
- https://github.com/puppeteer/puppeteer/blob/main/examples/pdf.js
- https://pptr.dev/api/puppeteer.page.pdf
- https://pptr.dev/api/puppeteer.pdfoptions
- https://pptr.dev/api/puppeteer.paperformat
1.9 How to change User Agent in Puppeteer
I think you noticed that in the screenshots above the following string was received as User Agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36
The “HeadlessChrome” string betrays us – that is, the server knows that we are using Puppeteer.
You can change the User Agent in Puppeteer to any value. I would recommend that instead of googling User Agent examples, just take the real User Agent string for your web browser. To find out your User Agent, go to the following page: https://suip.biz/?act=my-user-agent
For example, I got the following string for the current version of Chrome on Windows:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
Now create a file user-agent.js with the following content:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); const customUserAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'; await page.setViewport({width: 1440, height: 3440}); await page.setUserAgent(customUserAgent); await page.goto('https://w-e-b.site/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ua-spoofed.png' }); browser.close(); } run();
Run the file like this:
node user-agent
Now the User Agent is not suspicious:
Documentation:
1.10 How to emulate phones and other mobile devices in Puppeteer
The Page.emulate() method is a shorthand for calling two methods: Page.setUserAgent() and Page.setViewport().
That is, this method will set the User Agent string and screen size specific to the selected device. The list of devices you can choose from is at this link: https://pptr.dev/api/puppeteer.knowndevices
This method will resize the page. Many websites do not expect phones to resize, so you should emulate this before navigating to the page.
Create an iphone.js file with the following content:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.emulate(puppeteer.KnownDevices['iPhone 15 Pro']); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/iphone.png', fullPage: true }); browser.close(); } run();
And run it like this:
node iphone
This is how iPhone 15 Pro users see the page:
Documentation:
Note: at the time of writing, both examples in the official documentation are not working! And this instruction gives a working example.
1.11 How to take a screenshot in WEBP (Web/P image) format in Puppeteer. What screenshot formats does Puppeteer support
Puppeteer supports three screenshot formats:
- png
- jpeg
- webp
You can specify the screenshot type either using the file extension or using the “type” setting. WEBP files support the "quality" setting.
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); await page.screenshot({ path: 'screenshots/ja4-high-quality.webp', quality: 100, type: 'webp', fullPage: true }); browser.close(); } run();
Documentation:
1.12 How to take a screenshot of a portion of the screen in Puppeteer
With Puppeteer's screenshot capture method, you can specify a specific area using coordinates. You must define the coordinates of the area you want to capture, which are based on the pixel position from the top left corner of the page. You will need four values:
- x: The horizontal coordinate
- y: The vertical coordinate
- width: How wide the area should be
- height: How tall the area should be
Once you have defined these coordinates, you can define the clipping region for the screenshot. Replace the valuesbelow with the ones you defined.
Once you have defined the clipping region, you can now pass that object to the screen() method to capture only the specified area.
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); const clip = { x: 850, y: 1400, width: 900, height: 700 }; await page.screenshot({ path: 'screenshots/ja4-clip.png', clip: clip }); browser.close(); } run();
1.13 How to take a screenshot of a specific element in Puppeteer (by ID, by class name, by tag name)
This question already relates to interaction with the DOM (HTML) of a web page, as well as the use of selectors – we will consider all this, but later. For now, I just want to demonstrate the flexibility and capabilities of Puppeteer. For example, the following code will grab only the data that is relevant to me from a web page:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setViewport({ width: 2560, height: 1440 }); await page.goto('https://suip.biz/?act=client-tls-fingerprinting'); const element = await page.$$('pre'); await element[1].screenshot({ path: 'screenshots/element.png' }); browser.close(); } run();
In the previous example, I used the tag name “pre” as a selector, and since there are two such tags on the page, I explicitly indicated that I am interested in the second tag (numbering starts from 0):
const element = await page.$$('pre'); await element[1].screenshot({ path: 'screenshots/element.png' });
You can also take screenshots of elements by class name or by unique id:
const element = await page.$('#unique-element-id'); await element.screenshot({ path: 'element.png' });
Continue reading: Interacting with DOM in Puppeteer: how to get HTML code and extract various tags (text, images, links)
Related articles:
- Interacting with DOM in Puppeteer: how to get HTML code and extract various tags (text, images, links) (100%)
- How to make changes in browser Developer Tools persist after page reload (56.2%)
- How to install normal Firefox in Kali Linux (54%)
- Errors in Kali Linux ‘W: Failed to fetch’ and ‘W: Some index files failed to download. They have been ignored, or old ones used instead.’ (SOLVED) (53.4%)
- Online Kali Linux programs (FREE) (50.5%)
- How to install snap in Kali Linux (RANDOM - 50%)