Code: Select all
TIFF Directory at offset 0x7c (124)
Subfile Type: multi-page document (2 = 0x2)
Image Width: 2550 Image Length: 3300
Resolution: 300, 300 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
Thresholding: bilevel art scan
FillOrder: lsb-to-msb
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 16
Planar Configuration: single image plane
ImageDescription:
Make: DOCUNET Germany
Model: SCA
Software: DACS Toolkit II
DateTime: 2009:09:21 16:28:03
Tag 33948: 0,0,0,0,64,0,0,0,136,0,0,0,136,0,0,0,12,0,0,0,64,0,0,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,72,0,0,0,72,0,0,0,64,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,72,0,0,0,2,0,0,0,150,207,62,19,11,6,216,7,239,4,2,10,8,78,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Tag 33949: 0
Group 4 Options: (0 = 0x0)
That is of course the info for one page with this test file, however I do receive on my console many:
Perhaps I am off on the speed of IO between a disc drive and other parts of the system, but I am very sure that a SATA2 drive and dual core processes running a 64-bit distribution would be able to do a 16 page documents a little faster then hours.
As for the command syntax used, there is sense to my madness, it is the command line used by OpenKM when uploading a TIFF document, just before it is submitted to Tesseract-OCR for processing. I assume that the point of the syntax is to make sure the document is maximized for OCR, however it is not working, perhaps due to the "unknown field with tag" errors, or the very long processing time.
I am using the current stable build from the website, 6.5.6-5, and I also tried the version that can be apt-get installed, both ran the same way, however I might retest the older version next.