I've added a command line option that can fix existing PDF files that you have generated with Simple Scan. To use run the following:
simple-scan --fix-pdf ~/Documents/*.pdf
It should be safe to run this on all PDF documents but PLEASE BACKUP FIRST. It will copy the existing document to DocumentName.pdf~ before replacing it with the fixed version so you have those in case anything goes wrong.
If you can't wait for the next simple-scan, you can also run this Python program (i.e. python fixpdf.py broken.pdf > fixed.pdf)
import sys
import re
lines = file (sys.argv[ 1]).readlines ()
xref_offset = int(lines[-2])
xref_offset = 0
for (n, line) in enumerate (lines):
# Fix PDF header and binary comment
if (n == 0 or n == 1) and line.startswith ('%%'):
xref_ offset -= 1
line = line[1:]
# Fix xref format
match = re.match ('(\d\d\ d\d\d\d\ d\d\d\d) 0000 n\n', line)
if match != None:
offset = int (match.groups ()[0])
line = '%010d 00000 n \n' % (offset + xref_offset)
# Fix xref offset
if n == len(lines) - 2:
line = '%d\n' % (int (line) + xref_offset)
# Fix EOF marker
if n == len(lines) - 1 and line.startswith ('%%%%'):
line = line[2:]
print line,
4 comments:
Hey, I just wanted to let you know that even my dad (a 50-and-something years old non-tech dad!) loves Simple Scan! ..and he's grateful he doesn't need my help anymore when scanning documents :)
Thanks for the efforts!
I <3 Simple Scan. No seriously, it rocks. No nonsense, automatic file name generation. Perfect tool.
Just notice two lines in your Python code.
xref_offset = int(lines[-2])
xref_offset = 0
I am a newbie about Python, but I guess it has something wrong with these two lines.
Is that the second one should be something like:
offset = 0
Enchanter - good catch! The first xref_offset line shouldn't be there, but it still works correctly.
Post a Comment