PAID: Crop dashed area in image, modify it slightly, OCR a small part of it
Posted: 2017-06-17T20:58:12-07:00
I need a program/script that crops a shipping label off the first page of a PDF file, modifies it slightly, saves it, and gives the tracking number back through OCR.
* There are 3 shipping carriers of shipping labels the program/script needs to be able to identify itself and handle. The layout of a given shipping carrier is always the same. Two of the carriers provides the label in a normal and smaller size though.
* The shipping label will be on the first page of the PDF, but its position can vary slightly. It has a dashed border that can be looked for to find it.
* The shipping label should be cropped to the dashed border.
* The shipping label should be rotated 90 degrees clockwise, so it's upright.
* A given area needs to be made pure white. (Removing the return address, no OCR needed, just a rectangle of pixel coordinates.)
* The shipping address needs a small part of it removed. Will need to use OCR to locate a certain phrase to find where it's located so it can be removed. It will always be within a defined area of pixel coordinates, but its exact position in there can vary.
* The resulting image needs to be saved as a JPEG.
* The shipping address needs to be OCR'ed, and given as output to the program. There's defined pixel coordinates it's always in.
* The tracking number needs to be OCR'ed, and given as output to the program. There's defined pixel coordinates it's always in.
I don't care how you implement this (ImageMagick command arguments, script, your favorite language's API) as long as it works with the current version of ImageMagick, and if your solution gets compiled, I need the source. Must run on linux. If you don't use linux, I'd budge on this working on windows as long as there wasn't anything overly windows specific that couldn't be ported to run on linux. (Don't know enough about ImageMagick to know if there's any big platform differences.)
PM a request for a set of sample files, with the starting PDF's your program needs to take, and the way its outputted image should look. They have confidential addresses, so I can't just publicly post them.
The sample files have a slightly more detailed explanation of what needs to be done, which make more sense once you see the sample files.
* There are 3 shipping carriers of shipping labels the program/script needs to be able to identify itself and handle. The layout of a given shipping carrier is always the same. Two of the carriers provides the label in a normal and smaller size though.
* The shipping label will be on the first page of the PDF, but its position can vary slightly. It has a dashed border that can be looked for to find it.
* The shipping label should be cropped to the dashed border.
* The shipping label should be rotated 90 degrees clockwise, so it's upright.
* A given area needs to be made pure white. (Removing the return address, no OCR needed, just a rectangle of pixel coordinates.)
* The shipping address needs a small part of it removed. Will need to use OCR to locate a certain phrase to find where it's located so it can be removed. It will always be within a defined area of pixel coordinates, but its exact position in there can vary.
* The resulting image needs to be saved as a JPEG.
* The shipping address needs to be OCR'ed, and given as output to the program. There's defined pixel coordinates it's always in.
* The tracking number needs to be OCR'ed, and given as output to the program. There's defined pixel coordinates it's always in.
I don't care how you implement this (ImageMagick command arguments, script, your favorite language's API) as long as it works with the current version of ImageMagick, and if your solution gets compiled, I need the source. Must run on linux. If you don't use linux, I'd budge on this working on windows as long as there wasn't anything overly windows specific that couldn't be ported to run on linux. (Don't know enough about ImageMagick to know if there's any big platform differences.)
PM a request for a set of sample files, with the starting PDF's your program needs to take, and the way its outputted image should look. They have confidential addresses, so I can't just publicly post them.
The sample files have a slightly more detailed explanation of what needs to be done, which make more sense once you see the sample files.