SAFE Stack のドキュメント (link) がかなり充実している。 確か、バージョンが V3 になった直後は一時的に情報が少なくなっていたと思うのだけど、以前以上のボリュームになっている気がする。

不明点や疑問点をいきなり自分で試行錯誤するよりも、ドキュメントに当たれば解決することも色々ありそう。 そして、気になるページを丁寧に読み込む前に、サイトにあるドキュメント全体を流し読みしたくなった。 そして^2、流し読みするなら PDF や印刷したものでパラパラとめくりながら、書き込みをしていきたい。

というわけで、複数の URI から PDF を作れるようにした。 途中で回り道をしてしまったけど、結局、 wkhtmltopdf (link) がとてもよく出来ていて、やりたいことをできた。 wkhtmltopdf に複数の URI を渡すとそれらを一まとめの PDF にしてくれる。 しかも、リンク付きの目次も作れるし、各ページのヘッダやフッタも指定できる。 これまで wkhtmltopdf のことを Web ブラウザの印刷で PDF 化するような手動操作を自動化してくれるもの、くらい思っていたけど、もっと高機能で柔軟だった。

重要な機能は wkhtmltopdf が提供してくれる。 ただ使い方がやや煩雑になるのでラッパになるスクリプトを作った。

uri2pdf.rb (wkhtmltopdf のラッパ)

ソース

#!/usr/bin/env ruby

CMD_NAME = File.basename $0

# DEBUG = true
DEBUG = false

def debug (opt_hash = {})
  return (DEBUG || opt_hash[:debug])
end

def abort_after_help (msg)
    puts msg
    puts
    puts @help_msg
    abort
end

require "pp"
require "optparse"


def main ()
  pp [:argv_before_parse, ARGV]  if debug()
  opt_hash = parse_options(ARGV)
  pp [:opt_hash, opt_hash]       if debug(opt_hash)
  pp [:argv_after_parse,  ARGV]  if debug(opt_hash)

  # check commands
  puts "ToDo: check: wkhtmltopdf commands"  if debug(opt_hash)

  # check output file
  outfile = opt_hash[:outfile]
  if outfile.nil? then
    abort_after_help "#{CMD_NAME}: outfile is not specified"
  end
  unless FileTest.writable? File.dirname(outfile) then
    abort_after_help "#{CMD_NAME}: ${outfile}: can not create file"
  end

  # check input and get its content
  infile = opt_hash[:infile]
  uris =
    if infile.nil? then
      uris = ARGV.dup
    else
      uris = IO.readlines(infile).map{ |x| x.chomp.strip }
    end
  uris.reject! { |x| x.empty? }
  if uris.empty? then
    abort_after_help "#{CMD_NAME}: URI is not specified"
  end
  # pp [:uris,  uris]  if debug(opt_hash)

  # create PDF files in temporally directory
  global_opts  = opt_hash[:wkhtmltopdf_global_opts]
  page_opts    = opt_hash[:wkhtmltopdf_page_opts]
  hdr_ftr_opts = opt_hash[:wkhtmltopdf_hdr_ftr_opts]
  toc_opts     = opt_hash[:wkhtmltopdf_toc_opts]

  require 'tmpdir'
  tmp_dir = nil
  begin
    tmp_dir = Dir.mktmpdir("#{CMD_NAME}_")

    # build command image
    cmd_img = "wkhtmltopdf"

    cache_dir_path = opt_hash[:cache_dir]
    if cache_dir_path.nil? || cache_dir_path.empty? then
      cache_dir_path = "#{tmp_dir}/cache"
    end
    Dir.mkdir(cache_dir_path)  unless FileTest.directory? cache_dir_path
    cmd_img += " --cache-dir #{cache_dir_path}"
    cmd_img += " --load-error-handling abort"

    paper_size = opt_hash[:paper_size]
    if paper_size then
      cmd_img += " --page-size #{paper_size}"
    end

    orientation = opt_hash[:orientation]
    if orientation then
      cmd_img += " --orientation #{orientation}"
    end

    style = opt_hash[:style]
    if style then
      style_path = "#{tmp_dir}/style.css"
      open(style_path, 'w') { |io| io.puts style }
      cmd_img += " --user-style-sheet #{style_path}"
    end

    cmd_img += " #{global_opts} #{page_opts} #{hdr_ftr_opts}"

    cmd_img += " toc #{toc_opts}"
    uris.each do |uri|
      escaped_uri = shesc uri
      cmd_img += " page #{escaped_uri}"
    end
    cmd_img += " #{outfile}"

    # try to page fetch until cache is stabilized
    max_try_count = 5
    cur_try_count = 0
    latest_cache = latest_file(cache_dir_path)
    loop do
      cur_try_count += 1
      command_run(opt_hash, cmd_img)
      prev_latest_cache = latest_cache
      pp [:prev_latest_cache, prev_latest_cache]  if debug(opt_hash)
      latest_cache = latest_file(cache_dir_path)
      pp [:latest_cache, latest_cache]  if debug(opt_hash)
      if latest_cache[:time] == prev_latest_cache[:time] then
        break
      else
        if cur_try_count < max_try_count then
          puts "TRY NEXT (#{cur_try_count})"  if debug(opt_hash)
        else
          puts "EXCEEDS MAX TRY COUNT (#{cur_try_count})"  if debug(opt_hash)
          abort
        end
      end
    end
  ensure
    if tmp_dir then
      if opt_hash[:keep_tmpdir] then
        puts "keep temporal directory: #{tmp_dir}"
      else
        FileUtils.remove_entry(tmp_dir, force = true)
      end
    end
  end
end


class MyRuntimeError < RuntimeError
  def initialize (arg = nil)
    super
    @arg = arg
  end
  def name ()
    return "runtime error"
  end
  def desc ()
    return name + (@arg ? ": #{@arg}" : "")
  end
end

class InvalidNumberFormat < MyRuntimeError
  def name ()
    return "invalid number format"
  end
end

class InvalidCommandPath < MyRuntimeError
  def name ()
    return "invalid command path"
  end
end

def parse_options (argv)
  opt = OptionParser.new

  opt.summary_indent = " " * 2
  opt.summary_width = 36

  opt.banner = [
    "Usage:",
    "    #{File.basename($0)} -o FILE URI...",
    "    #{File.basename($0)} -o FILE -i FILE",
  ].join("\n")

  opt.separator ""
  opt.separator "Options:"

  infile_default = nil
  infile = infile_default
  infile_desc =  ["input file"]
  opt.on("-i", "--infile FILE", *infile_desc) do |x|
    infile = x
  end

  outfile_default = nil
  outfile = outfile_default
  outfile_desc =  ["output file"]
  opt.on("-o", "--outfile FILE", *outfile_desc) do |x|
    outfile = x
  end

  cache_dir_default = nil
  cache_dir = cache_dir_default
  cache_dir_desc = ["cache directory"]
  opt.on("-c", "--cache-dir DIR", *cache_dir_desc) do |x|
    cache_dir = x
  end

  keep_tmpdir_default = nil
  keep_tmpdir = keep_tmpdir_default
  keep_tmpdir_desc = ["keep temporally file"]
  opt.on("-k", "--keep-tmpdir", *keep_tmpdir_desc) do
    keep_tmpdir = true
  end

  paper_size_default = nil
  paper_size = paper_size_default
  paper_size_desc = ["paper size: A4, B5, Letter, etc"]
  opt.on("--paper-size SIZE", *paper_size_desc) do |x|
    paper_size = x
  end

  orientation_default = nil
  orientation = orientation_default
  landscape_desc = ["landscape orientation"]
  opt.on("--landscape", *landscape_desc) do
    orientation = "landscape"
  end
  portrait_desc = ["portrait orientation"]
  opt.on("--portrait", *portrait_desc) do
    orientation = "portrait"
  end

  additional_style_default = nil
  additional_style = additional_style_default
  additional_style_desc = ["additinal user style"]
  opt.on("-s", "--style STYLE", *additional_style_desc) do |x|
    additional_style = x
  end

  wkhtmltopdf_global_opts_default = ""
  wkhtmltopdf_global_opts = wkhtmltopdf_global_opts_default
  wkhtmltopdf_global_opts_desc =  ["wkhtmltopdf global options"]
  opt.on("-g", "--wkhtmltopdf-global-opts OPTS", *wkhtmltopdf_global_opts_desc) do |x|
    wkhtmltopdf_global_opts = x
  end

  wkhtmltopdf_page_opts_default = "--default-header"
  wkhtmltopdf_page_opts = wkhtmltopdf_page_opts_default
  wkhtmltopdf_page_opts_desc =  ["wkhtmltopdf page options (default: #{wkhtmltopdf_page_opts_default})"]
  opt.on("-p", "--wkhtmltopdf-page-opts OPTS", *wkhtmltopdf_page_opts_desc) do |x|
    wkhtmltopdf_page_opts = x
  end

  wkhtmltopdf_hdr_ftr_opts_default = ""
  wkhtmltopdf_hdr_ftr_opts = wkhtmltopdf_hdr_ftr_opts_default
  wkhtmltopdf_hdr_ftr_opts_desc =  ["wkhtmltopdf header and footer options"]
  opt.on("-r", "--wkhtmltopdf-hdr-ftr-opts OPTS", *wkhtmltopdf_hdr_ftr_opts_desc) do |x|
    wkhtmltopdf_hdr_ftr_opts = x
  end

  wkhtmltopdf_toc_opts_default = "--disable-dotted-lines"
  wkhtmltopdf_toc_opts = wkhtmltopdf_toc_opts_default
  wkhtmltopdf_toc_opts_desc =  ["wkhtmltopdf toc options"]
  opt.on("-t", "--wkhtmltopdf-toc-opts OPTS", *wkhtmltopdf_toc_opts_desc) do |x|
    wkhtmltopdf_toc_opts = x
  end

  debug_default = false
  debug = debug_default
  debug_desc = "debug mode (default: #{debug_default})"
  opt.on("-d", "--debug", debug_desc) do |v|
    debug = v
  end

  opt.separator ""

  @help_msg = opt.help

  begin
    opt.parse!(argv)
  rescue MyRuntimeError => evar
    puts "Error: #{evar.desc}"
    puts
    puts @help_msg
    exit 1
  rescue OptionParser::ParseError => evar
    puts "Error: #{evar.message}"
    puts
    puts @help_msg
    exit 1
  rescue OptionParser::InvalidOption => evar
    puts "Error: invalid option: #{evar.args.join(' ')}"
    puts
    puts @help_msg
    exit 1
  rescue => evar
    puts "Error: unexpected (#{evar.inspect})"
    abort
  end

  opt_hash = {
    :infile => infile,
    :outfile => outfile,
    :cache_dir => cache_dir,
    :keep_tmpdir => keep_tmpdir,
    :paper_size => paper_size,
    :orientation => orientation,
    :style => additional_style,
    :wkhtmltopdf_global_opts => wkhtmltopdf_global_opts,
    :wkhtmltopdf_page_opts => wkhtmltopdf_page_opts,
    :wkhtmltopdf_hdr_ftr_opts => wkhtmltopdf_hdr_ftr_opts,
    :wkhtmltopdf_toc_opts => wkhtmltopdf_toc_opts,
    :debug => debug,
  }

  return opt_hash
end


require 'find'
def latest_file (dir_path)
  latest_file_info = {:name => nil, :time => Time.utc(1970,1,1,0,0,0)}
  Find.find(dir_path) do |file|
    next  unless FileTest.file? file
    tm = File.mtime(file)
    if tm > latest_file_info[:time] then
      latest_file_info = {:name => file, :time => tm}
    end
  end
  return latest_file_info
end


require 'shellwords'

def shesc (s, allow_nil: false)
  return nil  if allow_nil && s.nil?
  raise "Unexpected class: #{s.class}"  unless s.is_a? String
  return (if s.empty? then s else Shellwords.shellescape s end)
end


def command_run (opt_hash, *cmd_img)
  (result, status) = command_status_and_output_of(opt_hash, *cmd_img)
  check_exitstatus(opt_hash, cmd_img, status)
end


def command_output_of (opt_hash, *cmd_img)
  (result, status) = command_status_and_output_of(opt_hash, *cmd_img)
  check_exitstatus(opt_hash, cmd_img, status)
  return result
end


def command_status_of (opt_hash, *cmd_img)
  (result, status) = command_status_and_output_of(opt_hash, *cmd_img)
  return status
end


def command_status_and_output_of (opt_hash, *cmd_img)
  # pp [:cmd_img, cmd_img]  if debug(opt_hash)
  result = `#{cmd_img.join(' ')}`
  return [result, $?.exitstatus]
end


def check_exitstatus (opt_hash, cmd_img, exitstatus)
  unless exitstatus == 0 then
    cmd_name = File.basename $0
    abort "#{cmd_name}: command #{cmd_img.inspect} is failed (#{exitstatus})"
  end
end


main

ヘルプ

uri2pdf: outfile is not specified

Usage:
    uri2pdf -o FILE URI...
    uri2pdf -o FILE -i FILE

Options:
  -i, --infile FILE                    input file
  -o, --outfile FILE                   output file
  -c, --cache-dir DIR                  cache directory
  -k, --keep-tmpdir                    keep temporally file
      --paper-size SIZE                paper size: A4, B5, Letter, etc
      --landscape                      landscape orientation
      --portrait                       portrait orientation
  -s, --style STYLE                    additinal user style
  -g, --wkhtmltopdf-global-opts OPTS   wkhtmltopdf global options
  -p, --wkhtmltopdf-page-opts OPTS     wkhtmltopdf page options (default: --default-header)
  -r, --wkhtmltopdf-hdr-ftr-opts OPTS  wkhtmltopdf header and footer options
  -t, --wkhtmltopdf-toc-opts OPTS      wkhtmltopdf toc options
  -d, --debug                          debug mode (default: false)

使用例

例1: 7 インチテーブル向けの PDF を作成

uri2pdf \
  --paper-size A6 \
  --landscape \
  -p "--header-font-size 9 \
      --header-left '[title]' \
      --header-right '[page]/[toPage]' \
      --header-line \
      --footer-line \
      --footer-font-size 9 \
      --footer-left '[webpage]' \
      --margin-top 0.9cm \
      --margin-bottom 0.9cm \
      --margin-left 0.5cm \
      --margin-right 0.5cm" \
  -s '* {line-height: 200%};' \
  -c ./cache \
  -i __URI_LIST_FILE__ \
  -o __OUTPUT_PDF_FILE__

例2: A4 印刷向けの PDF を作成

コマンドの上では用紙サイズを A5 にしているけど、自分の視力だとこれを A4 に印刷すると丁度よくなる。 眼の良い人にとっては文字サイズが大き過ぎそう。

uri2pdf \
  --paper-size A5 \
  --portrait \
  -p "--header-font-size 9 \
      --header-left '[title]' \
      --header-right '[page]/[toPage]' \
      --header-line \
      --footer-line \
      --footer-font-size 9 \
      --footer-left '[webpage]' \
      --margin-top 0.9cm \
      --margin-bottom 0.9cm \
      --margin-left 1.2cm \
      --margin-right 1.2cm" \
  -s '* {line-height: 200%};' \
  -c ./cache \
  -i __URI_LIST_FILE__ \
  -o __OUTPUT_PDF_FILE__