Home Code Tutorials How to turn web pages into PDFs with Puppeteer and NodeJS – Bestgamingpro

How to turn web pages into PDFs with Puppeteer and NodeJS – Bestgamingpro

10 min read

As an internet developer, you will have wished to generate a PDF file of an internet web page to share along with your purchasers, use it in shows, or add it as a brand new function in your internet app. Irrespective of your cause, Puppeteer, Google’s Node API for headless Chrome and Chromium, makes the duty fairly easy for you.

On this tutorial, we’ll see convert internet pages into PDF with Puppeteer and Node.js. Let’s begin the work with a fast introduction to what Puppeteer is.

What’s Puppeteer, and why is it superior?

In Google’s personal phrases, Puppeteer is, “A Node library which gives a high-level API to manage headless Chrome or Chromium over the DevTools Protocol.”

[Read: Meet the 4 scale-ups using data to save the planet]

What’s a headless browser?

In case you are unfamiliar with the time period headless browsers, it’s merely a browser with out a GUI. In that sense, a headless browser is just simply one other browser that understands render HTML internet pages and course of JavaScript. Because of the lack of a GUI, the interactions with a headless browser happen over a command line.

Although Puppeteer is especially a headless browser, you may configure and use it as non-headless Chrome or Chromium.

What are you able to do with Puppeteer?

Puppeteer’s highly effective browser-capabilities make it an ideal candidate for internet app testing and internet scraping.

To call just a few use circumstances the place Puppeteer gives the proper functionalities for internet builders,

  • Generate PDFs and screenshots of internet pages
  • Automate kind submission
  • Scrape internet pages
  • Carry out automated UI assessments whereas protecting the take a look at atmosphere up-to-date.
  • Producing pre-rendered content material for Single Web page Functions (SPAs)

Arrange the venture atmosphere

You need to use Puppeteer on the backend and frontend to generate PDFs. On this tutorial, we’re utilizing a Node backend for the duty.

Initialize NPM and arrange the standard Specific server to get began with the tutorial.

Ensure that to put in the Puppeteer NPM package deal with the next command earlier than you begin.

Convert internet pages to PDF

Now we get to the thrilling a part of the tutorial. With Puppeteer, we solely want just a few traces of code to transform internet pages into PDF.

First, create a browser occasion utilizing Puppeteer’s launch perform.

Then, we create a brand new web page occasion and go to the given web page URL utilizing Puppeteer.

We have now set the waitUntil choice to networkidle0. Once we use networkidle0 choice, Puppeteer waits till there aren’t any new community connections throughout the final 500 ms. It’s a strategy to decide whether or not the positioning has completed loading. It’s not precise, and Puppeteer affords different choices, however it is likely one of the most dependable for many circumstances.

Lastly, we create the PDF from the crawled web page content material and reserve it to our system.

The print to PDF perform is sort of difficult and permits for lots of customization, which is improbable. Listed here are a number of the choices we used:

  • printBackground: When this feature is ready to true, Puppeteer prints any background colours or pictures you have got used on the internet web page to the PDF.
  • path: Path specifies the place to avoid wasting the generated PDF file. You can even retailer it right into a reminiscence stream to keep away from writing to disk.
  • format: You may set the PDF format to one of many given choices: Letter, A4, A3, A2, and so on.
  • margin: You may specify a margin for the generated PDF with this feature.

When the PDF creation is over, shut the browser reference to browser.shut().

Construct an API to generate and reply PDFs from URLs

With the data we collect thus far, we are able to now create a brand new endpoint that may obtain a URL as a question string, after which it’s going to stream again to the consumer the generated PDF.

Right here is the code:

If you happen to begin the server and go to the /pdf route, with a goal question param containing the URL we need to convert. The server will serve the generated PDF straight with out ever storing it on disk.

URL instance: http://localhost:3000/pdf?goal=https://google.com

Which can generate the next PDF because it seems to be on the picture: