Page 1 of 4

How to slice up image of text

Posted: 2017-03-24T12:08:46-07:00
by damonm
I have a text image that I am trying to automate slicing up into individual images. Can someone help me determine what the best way to cut blocks of text out of an image that are surrounded by blank spaces?

To be more specific, I have multiple jpg's (hundreds of pages) that have patient data grouped together in columns over multiple lines - and each patient entry is seperated by one blank line. I am trying to create a batch that creates a new image for each patient's information so that I can send that information to the patient. It is important to note that while it doesn't show in my example below there are large spaces (columns) across the page that would need to remain, I'm hoping there's a way to recognize the blank line between records and crop at that point. Each page may or may not contain the same number of records (it's random) The data is currently formated as below:

File Number Name Address Patient Data Amount Due
Date Address 2 More Patient Data Other info
City, State Zip Yet More Patient Data
------------------------------------------------------> This line would be a blank space <----------------------
File Number Name Address Patient Data Amount Due
Date Address 2 More Patient Data Other info
City, State Zip Yet More Patient Data
-----------------------------------------------------> This line would be a blank space <------------------------


I greatly appreciate any help you may have to automate this process!

Re: How to slice up image of text

Posted: 2017-03-24T12:37:25-07:00
by snibgo
Please state your platform, and version number of IM.

My "Subimage rectangles" page shows some methods for chopping images into rectangles. Fred has some bash scripts for similar work.

For specific guidance, you need to provide a sample image, but do not include real patient information.

Re: How to slice up image of text

Posted: 2017-03-24T15:05:55-07:00
by damonm
I did some pretty major blur on the image before posting, you can see a sample of what I'm talking about http://i1204.photobucket.com/albums/bb4 ... equeaz.jpg

Re: How to slice up image of text

Posted: 2017-03-24T15:19:39-07:00
by fmw42
So what exactly are you trying to extract? If you average (-scale) the image to one column and save to txt format, it will show you where the blank lines exist (pure white). You will have sections of dark separated by a few pixels of white. Use the white areas as the y locations to crop your image. You will have to write a script to extract those coordinates and compute the cropping from them. You want each spread of y values between each successive white regions.

Re: How to slice up image of text

Posted: 2017-03-24T15:34:27-07:00
by damonm
If I were to describe doing this exercise with a paper and scissors, I would cut the page into (horizontal) strips with one person's info on each strip. In the example I posted, ignoring the header, I would end up with 11 strips. I want 11 jpegs or gifs, but I'm not sure how to batch this.

I didn't post in that area but I can, depending on the rate I'd be willing to pay someone to write the script to do it.

Here's what I'd like to end up with: http://i1204.photobucket.com/albums/bb4 ... yyxbcp.jpg

Re: How to slice up image of text

Posted: 2017-03-24T15:53:44-07:00
by snibgo
This does what I think you want:

Code: Select all

call %PICTBAT%rectSubimages page1_zpsylequeaz.jpg subs.tiff White 2 0.1c
The script is shown on the page I referenced. It makes 41 outputs, such as:
Image

In essence, it uses "-connected-components" to find the areas that are at least 2% darker than white, and that have an area (in pixels) of at least 0.1% of the total image.

It is a Windows BAT script, but readily translated to bash.

It works fine on the blurred image. It would need modifying to get the cropping areas from the blurred image, but then to actually crop the unblurred image.

Re: How to slice up image of text

Posted: 2017-03-24T15:55:11-07:00
by damonm
I just modified the previous post, but just so everyone reading sees it, this is ideally what I'd like to get out of a script: http://i1204.photobucket.com/albums/bb4 ... yyxbcp.jpg

Re: How to slice up image of text

Posted: 2017-03-24T16:03:20-07:00
by damonm
That's interesting @snibgo... I assume that's one "strip" all smashed together?

Re: How to slice up image of text

Posted: 2017-03-24T16:05:44-07:00
by snibgo
damonm wrote:...I would cut the page into (horizontal) strips with one person's info on each strip.
Oh, okay, I misunderstood.

For just horizontal strips: first trim off the black mark down the right side. Then use:

Code: Select all

call %PICTBAT%guillotine page1_zpsylequeaz.jpg subs_XX.png White 2 . 1
You still haven't said what version IM you use, on what platform.

Re: How to slice up image of text

Posted: 2017-03-24T16:21:52-07:00
by damonm
Oh I apologize, I'm using Windows 7.0.5-q16

So I think the function you gave me will change as a result, right?

Re: How to slice up image of text

Posted: 2017-03-24T16:36:52-07:00
by snibgo
Yes, forget about my post above that calls rectSubimages. Instead:

Code: Select all

%IM%convert page1_zpsylequeaz.jpg -crop 95x100%%+0+0 pat_crp.png

call %PICTBAT%guillotine pat_crp.png subs_XX.png White 2 . 1
This crops off the junk at the right-hand side. You could also crop off the headings etc. Then it calls guillotine.bat, which writes png_0.png to png_13.png.

png_12.png is:
Image

If you look at the script, guillotine.bat, it calls guilFind to find the crop parameters from %1. It then uses those parameters to call guilChop that chops image %1. If you change that second %1 to %7, you can call guillotine.bat with seven parameters. The seventh will be the actual file of text that you want cropped.

Re: How to slice up image of text

Posted: 2017-03-24T16:46:15-07:00
by damonm
Alright, so now let me reveal my true Imagemagick ignorance, where do I put these commands? When I was messing with this earlier I was doing it in powershell using things like "magick ScannedImage.jpg -shave 60x40+0+40 Cropped.jpg" but those that you gave me don't work.

Re: How to slice up image of text

Posted: 2017-03-24T17:06:32-07:00
by snibgo
I don't use Powershell. I don't know how to call BAT scripts from within Powershell.

My scripts were written for IM v6. As you are using v7, you should prefix each IM command with "magick".

Re: How to slice up image of text

Posted: 2017-03-24T17:08:48-07:00
by damonm
ok great, let me give it a try!

Re: How to slice up image of text

Posted: 2017-03-24T17:24:10-07:00
by damonm
I tried to do this but I don't have any batch files in the image magick directory... Where do I find the guillotine.bat file?