Create a PDF from HTML with Open-Source Software

Let's say you want to write a document or a book. You can use Microsoft Word or LibreOffice Writer to write the content and export it into a PDF. If you want to manage your work with git, you can resort to writing the document in LaTeX instead. But there is another option that works well with git and doesn't require any LaTeX knowledge — write the document in HTML, style it with CSS, and then generate a PDF from it using one of the many open-source tools.

This guide demonstrates how to create a document with HTML and convert it into a PDF using free, open-source, tools. I use Fedora Linux, but the tools that I use should also work on macOS and maybe even on Windows.

Created
June 26, 2023
A title image showing a PDF document on top of a text editor with HTML code

Motivation

Years ago, I wrote my resume in LibreOffice Writer and saved it in the .odt (ODF Text Document) format. I used git to manage versions of my resume, and I quickly found that binary .odt files aren't ideal for a typical git workflow. So, I looked for ways to save my resume in a git-friendly text format.

I came up with these options:

I didn't give the .fodt format much thought, and I didn't know LaTeX well enough to design my own layout. So I went with HTML and CSS, which I am familiar with.

Step 1: Create a Document in HTML

First, create a document in HTML. In a text editor of your choice (such as VS Code or GNOME Text Editor), write the content of the document. Annotate the text with HTML tags. Save the file with a .html suffix.

In this guide, I use a simplified version of my resume as an example. I wrote the HTML content in a resume.html file:

resume.html
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>David Sebek's Resume</title>
        <link rel="stylesheet" href="resume.css">
        <meta charset="utf-8">
    </head>

    <body>
        <header>
            <h1>David Sebek</h1>
            <p>email@example.org | (123) 456-7890 | linkedin.com/in/davidsebek</p>
        </header>

        <section>
            <h2>Education</h2>

            <h3>Georgia Institute of Technology <span>Atlanta, GA, USA</span></h3>
            <p>M.S. in Computer Science, Computing Systems specialization <span>December 2021</span></p>
            <ul>
                <li>GPA: 4.0/4.0</li>
                <li>Naumann-Etienne Foundation scholarship recipient</li>
            </ul>

            <h3>Czech Technical University in Prague, Faculty of Information Technology <span>Prague, Czech Republic</span></h3>
            <p>Bc. in Computer Security and Information Technology <span>June 2019</span></p>
            <ul>
                <li>Graduated with Honors</li>
                <li>Thesis: Comparison of software defined storage with a classical enterprise disk array</li>
                <li>Studied for one year at the Milwaukee School of Engineering, Milwaukee, WI, USA</li>
            </ul>
        </section>

        <section>
            <h2>Experience</h2>

            <h3>EDJX, Inc. <span>Raleigh, NC, USA</span></h3>
            <p>Software Engineer <span>August 2021&ndash;January 2023</span></p>
            <ul>
                <li>Designed and implemented EDJX C++ SDK</li>
                <li>Worked on implementation of data streaming in Rust backend and C++ and Rust apps</li>
                <li>Fixed UEFI boot freeze problem in FreeBSD-based EdjOS</li>
                <li>Improved, simplified, and automated the build of EdjOS</li>
                <li>Created tutorials and sample C++ and Rust apps for EDJX platform</li>
            </ul>

            <h3>Zonky s.r.o. <span>Prague, Czech Republic</span></h3>
            <p>Java backend developer <span>July 2019&ndash;November 2019</span></p>
            <ul>
                <li>Worked on Java backend, refactored existing code, implemented new features</li>
            </ul>
        </section>

        <section>
            <h2>Skills</h2>
            <dl class="skills">

                <dt>Programming</dt>
                    <dd>C, C++, Rust, Java; Bash scripting; git; Dynamic and static code analysis (Valgrind, GDB, IDA)</dd>
                    <dd>HTML, CSS</dd>

                <dt>Systems</dt>
                    <dd>Linux (Fedora, CentOS, Debian), Unix (FreeBSD, macOS), Windows; Virtualization (KVM, QEMU, VirtualBox); Linux containers (Podman, Docker); WebAssembly (Wasm)</dd>
                    <dd>Networking, router configuration; Web server configuration (Nginx, Apache)</dd>

                <dt>Languages</dt>
                    <dd>English, Czech</dd>
            </dl>
        </section>

        <section>
            <h2>Projects</h2>

            <h3>Bachelor Project</h3>
            <p>Comparison of software defined storage with a&nbsp;classical enterprise disk array <span>Prague, 2019</span></p>
            <ul>
                <li>Built a Ceph storage cluster, and compared its architecture, performance, and security with 3PAR</li>
            </ul>

            <h3>Open-source Contributions</h3>
            <dl class="os-contributions">

                <dt>Haiku OS</dt>
                    <dd>Implemented TRIM support for SCSI and SATA storage, revised and extended the fstrim utility</dd>

                <dt>FreeBSD</dt>
                    <dd>Fixed UEFI boot freeze problem, my patch was merged into FreeBSD 12-STABLE</dd>

                <dt>Linux</dt>
                    <dd>Identified why the TRIM mode of an external USB hard drive was detected incorrectly</dd>
                    <dd>Reporting bugs encountered in Fedora Linux</dd>
            </dl>
        </section>

    </body>
</html>

Style the HTML document with CSS. The @page rule can configure different aspects of the page for printing.

I wrote the CSS code in a separate file named resume.css:

resume.css
@page {
    size: letter portrait;
    margin: 0.6in 0.75in 0.6in;

    /* Page number - for illustration, not needed on a resume */
    @bottom-center {
        content: "Page " counter(page) " of " counter(pages);
    }
}

html {
    font-size: 10pt;
    font-family: "Open Sans", sans-serif;
}

body {
    margin: 0;
    line-height: 1.4;
}

p {
    margin: 0;
}

header {
    text-align: center;
}

header h1 {
    font-size: 24pt;
    font-weight: bold;
    margin-top: 0;
    margin-bottom: 0.04in;
}

section h2 {
    font-size: 1rem;
    text-transform: uppercase;
    font-weight: bold;
    margin-top: 0.1in;
    margin-bottom: 0.02in;
    border-bottom: 0.8pt solid black;
}

section h3 {
    display: flex;
    justify-content: space-between;
    font-size: 1rem;
    font-weight: bold;
    margin-top: 0.01in;
    margin-bottom: 0;
}

section h3:first-of-type {
    margin-top: 0;
}

section p {
    display: flex;
    justify-content: space-between;
}

section ul {
    list-style: none;
    padding-left: 0;
    margin-top: 0;
    margin-bottom: 0;
}

section ul li {
    margin-left: 3em;
}

section ul li::before {
    display: inline-block;
    content: "\2022";
    font-weight: bold;
    text-align: center;
    width: 3em;
    margin-left: -3em;
}

section dl {
    display: grid;
    grid-template-columns: max-content auto;
    margin: 0;
    padding: 0;
}

section dl dt {
    grid-column: 1;
    margin: 0;
    padding: 0;
}

section dl.skills dt {
    font-weight: bold;
}

section dl.skills dt::after {
    content: ":";
}

section dl dd {
    grid-column: 2;
    align-self: end;
    margin: 0;
    padding: 0;
    margin-left: 1.5em;
}

section dl dd::before {
    display: inline-block;
    content: "\2022";
    font-weight: bold;
    text-align: center;
    width: 1.5em;
    margin-left: -1.5em;
}

You can open the HTML file in a web browser. Note that the web browser will most likely ignore some or all @page rules.

A screenshot of the Firefox web browser with the resume.html file open

Step 2: Convert the Document into PDF

Use one of the methods mentioned below to export the HTML document into a PDF. I also recommend checking out this website, it mentions additional options: https://print-css.rocks.

I recommend installing the tools in a container and building the document from there. To create a new container, run:

podman run --rm -it --name html-to-pdf registry.fedoraproject.org/fedora

My document uses the Open Sans font, so I make sure that the font is installed:

dnf install open-sans-fonts.noarch

If you don't use Linux, no problem. The tools that I use are open-source, and many of them are multi-platform.

Option 1: Print from a Web Browser

The easiest way to generate a PDF from the HTML document is to open the HTML document in a web browser and print the document. Choose PDF as the destination instead of a printer.

This option is easy, but the web browser most likely doesn't support advanced @page CSS rules. Also, different web browsers may calculate the spacing and dimensions differently — a page printed from Chrome may look slightly different than the same page printed from Firefox.

A screenshot of the Firefox web browser with resume.html open and a print dialog shown

Option 2: Print from Chrome or Chromium in Headless Mode

This method is similar to the previous method. The Chrome and Chromium web browsers can run in headless mode and let us print pages from the command line. For more command-line arguments, see this summary.

Install Chrome or Chromium:

flatpak install flathub org.chromium.Chromium

Print HTML into PDF:

flatpak run org.chromium.Chromium --headless --disable-gpu --run-all-compositor-stages-before-draw --print-to-pdf-no-header --print-to-pdf=resume.pdf ./resume.html

This method lets us use the latest HTML and CSS features, but the resulting PDF doesn't contain a document outline or page numbers.

Option 3: wkhtmltopdf (Not Recommended)

wkhtmltopdf used to be a popular HTML to PDF converter, but it is not actively maintained anymore. See https://wkhtmltopdf.org/status.html for the current status. The official version has a statically-linked patched old QT library and doesn't support the latest HTML/CSS features, such as a flex or grid layout. The version shipped in Fedora's repositories supports the flex layout, but the document measurements are way off and no document outline is generated. The @page properties seem to be ignored. I also had issues with TTF font rendering bugs.

Install wkhtmltopdf from Fedora repositories:

dnf install wkhtmltopdf

Or download and install the upstream version from https://wkhtmltopdf.org/downloads.html:

curl -L -O 'https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6.1-3/wkhtmltox-0.12.6.1-3.fedora37.x86_64.rpm'
dnf install ./wkhtmltox-0.12.6.1-3.fedora37.x86_64.rpm

Generate a PDF:

wkhtmltopdf --encoding utf8 --page-size Letter --disable-smart-shrinking -B 0.79in -L 0.79in -R 0.79in -T 0.79in --allow resume.css ./resume.html ./resume.pdf

Option 4: WeasyPrint

WeasyPrint uses its own HTML engine. A PDF generated by WeasyPrint visually perfectly matches a PDF generated by LibreOffice, and it supports page numbers. However, WeasyPrint does not support some modern CSS features, such as the grid layout.

Install WeasyPrint using Fedora's packages:

dnf install weasyprint

Or use upstream packages. See the official installation instructions.

dnf install python-pip pango
pip install weasyprint

Generate a PDF:

weasyprint ./resume.html ./resume.pdf

Option 5: Paged.js

Paged.js uses the Chromium web browser to render the document, which means that it most likely supports the latest HTML and CSS features. Spacing between text elements in the exported PDF very slightly differs from LibreOffice and WeasyPrint, which I think is a "feature" of the Chromium web browser.

Install Paged.js:

dnf install chromium #installs needed libraries
dnf install nodejs npm
npm install -g pagedjs-cli pagedjs

Generate a PDF:

pagedjs-cli --blockRemote --browserArgs --no-sandbox --outline-tags "h1,h2,h3,dt" ./resume.html -o ./resume.pdf

The Result

Paged.js gives the best results for my use case. Because of the use of the Chromium web browser, it supports modern HTML and CSS features. It also supports @page properties, including page numbers, well.

Another option worth considering is WeasyPrint. It doesn't support all modern HTML and CSS features, but the proportions of the rendered page matched a PDF exported from LibreOffice perfectly.

The PDF generated by Paged.js looks like this:

A picture of the PDF document rendered by Paged.js

Appendix

As a bonus, I mention some tools and commands that I found useful when writing this guide.

Escape HTML Tags

This method, which I found at Serverfault.com, lets me easily sanitize text before I copy-paste it into an HTML document:

python3 -c 'import html,sys; print(html.escape(sys.stdin.read()), end="")'
python3 -c 'import html,sys; print(html.unescape(sys.stdin.read()), end="")'

Convert PDF into an Image

ImageMagick is a great tool that can convert a PDF document into a PNG image. ImageMagick supports a plethora of other image formats. Generating an image with a higher resolution and then downscaling it to the desired resolution gives nice results.

convert -density 384 -quality 100 -alpha remove resume.pdf resume_384dpi.png
convert -scale 25% resume_384dpi.png resume_96dpi.png