Using Web Technologies To Print A Book

I recently finished the second draft of my first novel and needed a way to prepare a decent-looking PDF to print and send to people so they could write all over it with red pens. Web technologies and a few Linux tools made this a fairly painless process.

The workflow goes:

  1. Write the book in Markdown
  2. Generate HTML from that Markdown
  3. Style the HTML with CSS
  4. Make the cover(s) with HTML and CSS
  5. Generate PDFs from that HTML + CSS
  6. Combine those PDFs into one

And here’s what you’ll need:

  • A book written in Markdown
  • MultiMarkdown, Pandoc, or similar
  • A CSS stylesheet or two
  • wkhtmltopdf
  • Ghostscript
  • A scripting language (I used Ruby)

The result will be a single PDF with numbered pages and an unnumbered front cover. Adding an unnumbered back cover is left as an exercise.

Step 1: Write the book in Markdown

I can’t help you much with this one. Writing a book is hard. But I believe in your abilities.

One issue I encountered with mine was needing three modes for the body text:

  1. the normal mode
  2. one for text a character had written, which would be indented and in a different font
  3. a variant on (2) where line breaks needed to be preserved

So you might want to keep this in mind. I ended up using Markdown’s blockquote syntax for mode 2 and its code block syntax for mode 3. This made it easy to target those blocks with CSS.

Another issue that came up was section breaks—how to format breaks in the text without using chapter or sub-section headers. In the Markdown, I used a single % character on a line by itself. So after the HTML is generated, it can be piped through sed to add a custom CSS class, e.g., to replace <p>%</p> with <p class="section-break">%</p>.

Step 2: Generate HTML from that Markdown

I used MultiMarkdown but Pandoc would also make a great choice.

Depending on the way your book is split into files, you might want to start writing a build script. Here’s an example:

parts = [ "Talitha", "Imal", "Aunauf", "Empress", "Astronauts",
] parts.each do |part| system("multimarkdown -s ../#{part}/ | sed -E 's/_([^_]+)_/<em>\\1<\\/em>/g' | sed -E 's/<h1 .+<\\/h1>//g' | sed 's/<p>%<\\/p>/<p class=\"section-break\">%<\\/p>/g' > output-#{part}.html")

Those sed commands (1) add intra-word italics (like Salinger does), (2) remove redundant h1 headers (I added one to each file/chapter for reasons I can’t remember right now), and (3) fix the section-breaks.

To get the page numbers right, you’ll want the HTML for every chapter in the same file. The loop above produces a different file for each chapter, but you could also replace the output redirection with something like >> combined.html.

Step 3: Style the HTML with CSS

Unless you add custom classes, the HTML generated from the markdown should not include classes, so your CSS will mostly need to target tag names—h1, p, blockquote, etc.

If you want to add a page break between chapters, the chapter titles will need a consistent target (I used h2 tags) and this rule: page-break-before: always;.

Step 4: Make the cover(s) with HTML and CSS

To make the front cover, follow the same process as with the book’s body: make the HTML, style it with CSS. You could use Markdown for this but the HTML might be simple enough that writing it by hand is an agreeable option.

Step 5: Generate PDFs from that HTML + CSS

You’ll want a version of wkhtmltopdf with patched QT. If the version packaged for your distribution doesn’t have the patched QT, then you’ll want to download and install it yourself. You can check for the patch with the -V option:

$ wkhtmltopdf -V
wkhtmltopdf 0.12.4
$ wkhtmltox/bin/wkhtmltopdf -V
wkhtmltopdf 0.12.4 (with patched qt)

You can specify page size, top, bottom, left and right margins, stylesheet, and, for page numbers, a footer file:

wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css --footer-html footer.html combined.html body.pdf

A footer file should look something like:

<html> <head> <script> function subst() { var vars={}; var'&'); for(var i in x) {var z=x[i].split('=',2);vars[z[0]] = unescape(z[1]);} var x=['frompage','topage','page','webpage','section','subsection','subsubsection']; for(var i in x) { var y = document.getElementsByClassName(x[i]); for(var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]]; } } </script> <link rel="stylesheet" type="text/css" href="style.css" /> </head> <body onload="subst()"> <span class="page"></span> </body>

The Javascript called onload chomps through the variables passed to the file during processing and fills their values into the elements with matching class names. You can style those elements in the CSS.

Do something similar (but leave out the footer) to generate the cover:

wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css title.html title.pdf

Then combine the PDFs with Ghostscript:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=final.pdf title.pdf body.pdf

So, to put this all together in a build script:

$ cat make
#!/usr/bin/ruby wkhtmltopdf_cmd = "~/wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css" parts = [ "Talitha", "Imal", "Aunauf", "Empress", "Astronauts",
] parts.each do |part| system("multimarkdown -s ../#{part}/ | sed -E 's/_([^_]+)_/<em>\\1<\\/em>/g' | sed -E 's/<h1 .+<\\/h1>//g' | sed 's/<p>%<\\/p>/<p class=\"break\">%<\\/p>/g' > output-#{part}.html")
end htmls = parts.reduce("") { |acc,val| "#{acc} output-#{val}.html" }
system("cat #{htmls} | #{wkhtmltopdf_cmd} --footer-html footer.html - body.pdf") system("#{wkhtmltopdf_cmd} title.html title.pdf") system("gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=final.pdf title.pdf body.pdf")

You probably wouldn’t want to use this process for a final draft but it should work for all the ones you’re going to mark up anyway.