複数の Web ページを綺麗な PDF に整形する
SAFE Stack のドキュメント (link) がかなり充実している。 確か、バージョンが V3 になった直後は一時的に情報が少なくなっていたと思うのだけど、以前以上のボリュームになっている気がする。
不明点や疑問点をいきなり自分で試行錯誤するよりも、ドキュメントに当たれば解決することも色々ありそう。 そして、気になるページを丁寧に読み込む前に、サイトにあるドキュメント全体を流し読みしたくなった。 そして^2、流し読みするなら PDF や印刷したものでパラパラとめくりながら、書き込みをしていきたい。
というわけで、複数の URI から PDF を作れるようにした。 途中で回り道をしてしまったけど、結局、 wkhtmltopdf (link) がとてもよく出来ていて、やりたいことをできた。 wkhtmltopdf に複数の URI を渡すとそれらを一まとめの PDF にしてくれる。 しかも、リンク付きの目次も作れるし、各ページのヘッダやフッタも指定できる。 これまで wkhtmltopdf のことを Web ブラウザの印刷で PDF 化するような手動操作を自動化してくれるもの、くらい思っていたけど、もっと高機能で柔軟だった。
重要な機能は wkhtmltopdf が提供してくれる。 ただ使い方がやや煩雑になるのでラッパになるスクリプトを作った。
uri2pdf.rb (wkhtmltopdf のラッパ)
ソース
#!/usr/bin/env ruby
CMD_NAME = File.basename $0
# DEBUG = true
DEBUG = false
def debug (opt_hash = {})
return (DEBUG || opt_hash[:debug])
end
def abort_after_help (msg)
puts msg
puts
puts @help_msg
abort
end
require "pp"
require "optparse"
def main ()
pp [:argv_before_parse, ARGV] if debug()
opt_hash = parse_options(ARGV)
pp [:opt_hash, opt_hash] if debug(opt_hash)
pp [:argv_after_parse, ARGV] if debug(opt_hash)
# check commands
puts "ToDo: check: wkhtmltopdf commands" if debug(opt_hash)
# check output file
outfile = opt_hash[:outfile]
if outfile.nil? then
abort_after_help "#{CMD_NAME}: outfile is not specified"
end
unless FileTest.writable? File.dirname(outfile) then
abort_after_help "#{CMD_NAME}: ${outfile}: can not create file"
end
# check input and get its content
infile = opt_hash[:infile]
uris =
if infile.nil? then
uris = ARGV.dup
else
uris = IO.readlines(infile).map{ |x| x.chomp.strip }
end
uris.reject! { |x| x.empty? }
if uris.empty? then
abort_after_help "#{CMD_NAME}: URI is not specified"
end
# pp [:uris, uris] if debug(opt_hash)
# create PDF files in temporally directory
global_opts = opt_hash[:wkhtmltopdf_global_opts]
page_opts = opt_hash[:wkhtmltopdf_page_opts]
hdr_ftr_opts = opt_hash[:wkhtmltopdf_hdr_ftr_opts]
toc_opts = opt_hash[:wkhtmltopdf_toc_opts]
require 'tmpdir'
tmp_dir = nil
begin
tmp_dir = Dir.mktmpdir("#{CMD_NAME}_")
# build command image
cmd_img = "wkhtmltopdf"
cache_dir_path = opt_hash[:cache_dir]
if cache_dir_path.nil? || cache_dir_path.empty? then
cache_dir_path = "#{tmp_dir}/cache"
end
Dir.mkdir(cache_dir_path) unless FileTest.directory? cache_dir_path
cmd_img += " --cache-dir #{cache_dir_path}"
cmd_img += " --load-error-handling abort"
paper_size = opt_hash[:paper_size]
if paper_size then
cmd_img += " --page-size #{paper_size}"
end
orientation = opt_hash[:orientation]
if orientation then
cmd_img += " --orientation #{orientation}"
end
style = opt_hash[:style]
if style then
style_path = "#{tmp_dir}/style.css"
open(style_path, 'w') { |io| io.puts style }
cmd_img += " --user-style-sheet #{style_path}"
end
cmd_img += " #{global_opts} #{page_opts} #{hdr_ftr_opts}"
cmd_img += " toc #{toc_opts}"
uris.each do |uri|
escaped_uri = shesc uri
cmd_img += " page #{escaped_uri}"
end
cmd_img += " #{outfile}"
# try to page fetch until cache is stabilized
max_try_count = 5
cur_try_count = 0
latest_cache = latest_file(cache_dir_path)
loop do
cur_try_count += 1
command_run(opt_hash, cmd_img)
prev_latest_cache = latest_cache
pp [:prev_latest_cache, prev_latest_cache] if debug(opt_hash)
latest_cache = latest_file(cache_dir_path)
pp [:latest_cache, latest_cache] if debug(opt_hash)
if latest_cache[:time] == prev_latest_cache[:time] then
break
else
if cur_try_count < max_try_count then
puts "TRY NEXT (#{cur_try_count})" if debug(opt_hash)
else
puts "EXCEEDS MAX TRY COUNT (#{cur_try_count})" if debug(opt_hash)
abort
end
end
end
ensure
if tmp_dir then
if opt_hash[:keep_tmpdir] then
puts "keep temporal directory: #{tmp_dir}"
else
FileUtils.remove_entry(tmp_dir, force = true)
end
end
end
end
class MyRuntimeError < RuntimeError
def initialize (arg = nil)
super
@arg = arg
end
def name ()
return "runtime error"
end
def desc ()
return name + (@arg ? ": #{@arg}" : "")
end
end
class InvalidNumberFormat < MyRuntimeError
def name ()
return "invalid number format"
end
end
class InvalidCommandPath < MyRuntimeError
def name ()
return "invalid command path"
end
end
def parse_options (argv)
opt = OptionParser.new
opt.summary_indent = " " * 2
opt.summary_width = 36
opt.banner = [
"Usage:",
" #{File.basename($0)} -o FILE URI...",
" #{File.basename($0)} -o FILE -i FILE",
].join("\n")
opt.separator ""
opt.separator "Options:"
infile_default = nil
infile = infile_default
infile_desc = ["input file"]
opt.on("-i", "--infile FILE", *infile_desc) do |x|
infile = x
end
outfile_default = nil
outfile = outfile_default
outfile_desc = ["output file"]
opt.on("-o", "--outfile FILE", *outfile_desc) do |x|
outfile = x
end
cache_dir_default = nil
cache_dir = cache_dir_default
cache_dir_desc = ["cache directory"]
opt.on("-c", "--cache-dir DIR", *cache_dir_desc) do |x|
cache_dir = x
end
keep_tmpdir_default = nil
keep_tmpdir = keep_tmpdir_default
keep_tmpdir_desc = ["keep temporally file"]
opt.on("-k", "--keep-tmpdir", *keep_tmpdir_desc) do
keep_tmpdir = true
end
paper_size_default = nil
paper_size = paper_size_default
paper_size_desc = ["paper size: A4, B5, Letter, etc"]
opt.on("--paper-size SIZE", *paper_size_desc) do |x|
paper_size = x
end
orientation_default = nil
orientation = orientation_default
landscape_desc = ["landscape orientation"]
opt.on("--landscape", *landscape_desc) do
orientation = "landscape"
end
portrait_desc = ["portrait orientation"]
opt.on("--portrait", *portrait_desc) do
orientation = "portrait"
end
additional_style_default = nil
additional_style = additional_style_default
additional_style_desc = ["additinal user style"]
opt.on("-s", "--style STYLE", *additional_style_desc) do |x|
additional_style = x
end
wkhtmltopdf_global_opts_default = ""
wkhtmltopdf_global_opts = wkhtmltopdf_global_opts_default
wkhtmltopdf_global_opts_desc = ["wkhtmltopdf global options"]
opt.on("-g", "--wkhtmltopdf-global-opts OPTS", *wkhtmltopdf_global_opts_desc) do |x|
wkhtmltopdf_global_opts = x
end
wkhtmltopdf_page_opts_default = "--default-header"
wkhtmltopdf_page_opts = wkhtmltopdf_page_opts_default
wkhtmltopdf_page_opts_desc = ["wkhtmltopdf page options (default: #{wkhtmltopdf_page_opts_default})"]
opt.on("-p", "--wkhtmltopdf-page-opts OPTS", *wkhtmltopdf_page_opts_desc) do |x|
wkhtmltopdf_page_opts = x
end
wkhtmltopdf_hdr_ftr_opts_default = ""
wkhtmltopdf_hdr_ftr_opts = wkhtmltopdf_hdr_ftr_opts_default
wkhtmltopdf_hdr_ftr_opts_desc = ["wkhtmltopdf header and footer options"]
opt.on("-r", "--wkhtmltopdf-hdr-ftr-opts OPTS", *wkhtmltopdf_hdr_ftr_opts_desc) do |x|
wkhtmltopdf_hdr_ftr_opts = x
end
wkhtmltopdf_toc_opts_default = "--disable-dotted-lines"
wkhtmltopdf_toc_opts = wkhtmltopdf_toc_opts_default
wkhtmltopdf_toc_opts_desc = ["wkhtmltopdf toc options"]
opt.on("-t", "--wkhtmltopdf-toc-opts OPTS", *wkhtmltopdf_toc_opts_desc) do |x|
wkhtmltopdf_toc_opts = x
end
debug_default = false
debug = debug_default
debug_desc = "debug mode (default: #{debug_default})"
opt.on("-d", "--debug", debug_desc) do |v|
debug = v
end
opt.separator ""
@help_msg = opt.help
begin
opt.parse!(argv)
rescue MyRuntimeError => evar
puts "Error: #{evar.desc}"
puts
puts @help_msg
exit 1
rescue OptionParser::ParseError => evar
puts "Error: #{evar.message}"
puts
puts @help_msg
exit 1
rescue OptionParser::InvalidOption => evar
puts "Error: invalid option: #{evar.args.join(' ')}"
puts
puts @help_msg
exit 1
rescue => evar
puts "Error: unexpected (#{evar.inspect})"
abort
end
opt_hash = {
:infile => infile,
:outfile => outfile,
:cache_dir => cache_dir,
:keep_tmpdir => keep_tmpdir,
:paper_size => paper_size,
:orientation => orientation,
:style => additional_style,
:wkhtmltopdf_global_opts => wkhtmltopdf_global_opts,
:wkhtmltopdf_page_opts => wkhtmltopdf_page_opts,
:wkhtmltopdf_hdr_ftr_opts => wkhtmltopdf_hdr_ftr_opts,
:wkhtmltopdf_toc_opts => wkhtmltopdf_toc_opts,
:debug => debug,
}
return opt_hash
end
require 'find'
def latest_file (dir_path)
latest_file_info = {:name => nil, :time => Time.utc(1970,1,1,0,0,0)}
Find.find(dir_path) do |file|
next unless FileTest.file? file
tm = File.mtime(file)
if tm > latest_file_info[:time] then
latest_file_info = {:name => file, :time => tm}
end
end
return latest_file_info
end
require 'shellwords'
def shesc (s, allow_nil: false)
return nil if allow_nil && s.nil?
raise "Unexpected class: #{s.class}" unless s.is_a? String
return (if s.empty? then s else Shellwords.shellescape s end)
end
def command_run (opt_hash, *cmd_img)
(result, status) = command_status_and_output_of(opt_hash, *cmd_img)
check_exitstatus(opt_hash, cmd_img, status)
end
def command_output_of (opt_hash, *cmd_img)
(result, status) = command_status_and_output_of(opt_hash, *cmd_img)
check_exitstatus(opt_hash, cmd_img, status)
return result
end
def command_status_of (opt_hash, *cmd_img)
(result, status) = command_status_and_output_of(opt_hash, *cmd_img)
return status
end
def command_status_and_output_of (opt_hash, *cmd_img)
# pp [:cmd_img, cmd_img] if debug(opt_hash)
result = `#{cmd_img.join(' ')}`
return [result, $?.exitstatus]
end
def check_exitstatus (opt_hash, cmd_img, exitstatus)
unless exitstatus == 0 then
cmd_name = File.basename $0
abort "#{cmd_name}: command #{cmd_img.inspect} is failed (#{exitstatus})"
end
end
main
ヘルプ
uri2pdf: outfile is not specified Usage: uri2pdf -o FILE URI... uri2pdf -o FILE -i FILE Options: -i, --infile FILE input file -o, --outfile FILE output file -c, --cache-dir DIR cache directory -k, --keep-tmpdir keep temporally file --paper-size SIZE paper size: A4, B5, Letter, etc --landscape landscape orientation --portrait portrait orientation -s, --style STYLE additinal user style -g, --wkhtmltopdf-global-opts OPTS wkhtmltopdf global options -p, --wkhtmltopdf-page-opts OPTS wkhtmltopdf page options (default: --default-header) -r, --wkhtmltopdf-hdr-ftr-opts OPTS wkhtmltopdf header and footer options -t, --wkhtmltopdf-toc-opts OPTS wkhtmltopdf toc options -d, --debug debug mode (default: false)
使用例
例1: 7 インチテーブル向けの PDF を作成
uri2pdf \ --paper-size A6 \ --landscape \ -p "--header-font-size 9 \ --header-left '[title]' \ --header-right '[page]/[toPage]' \ --header-line \ --footer-line \ --footer-font-size 9 \ --footer-left '[webpage]' \ --margin-top 0.9cm \ --margin-bottom 0.9cm \ --margin-left 0.5cm \ --margin-right 0.5cm" \ -s '* {line-height: 200%};' \ -c ./cache \ -i __URI_LIST_FILE__ \ -o __OUTPUT_PDF_FILE__
例2: A4 印刷向けの PDF を作成
コマンドの上では用紙サイズを A5 にしているけど、自分の視力だとこれを A4 に印刷すると丁度よくなる。 眼の良い人にとっては文字サイズが大き過ぎそう。
uri2pdf \ --paper-size A5 \ --portrait \ -p "--header-font-size 9 \ --header-left '[title]' \ --header-right '[page]/[toPage]' \ --header-line \ --footer-line \ --footer-font-size 9 \ --footer-left '[webpage]' \ --margin-top 0.9cm \ --margin-bottom 0.9cm \ --margin-left 1.2cm \ --margin-right 1.2cm" \ -s '* {line-height: 200%};' \ -c ./cache \ -i __URI_LIST_FILE__ \ -o __OUTPUT_PDF_FILE__