I have many jpgs with random placed words/sentence, background is white and text is black. So I want to use imagemagick to find and all lines of words, isolate them and rotate them to a horizontal axis. There is plenty of white space between each item(line). Sometimes there can be an image/graphics among the words too, best would be if imagemagick ignored those but wouldn't matter that much if they get isolated too.
This is how it could look like:
data:image/s3,"s3://crabby-images/54338/54338d7a4f5a8deb0ec13a5c3582ac103ed98220" alt="Image"
Method 1
After I isolate the words i was thinking I could rotate them using this model:
1. Somehow find the corners of the text.
2. Make a rectangle of the corner points, all lines are in rectangles in shape... They won't difference in lengths that much.
3. Make a smaller horizontal rectangle.
4. Rotate the first rectangle until its completely fills the small rectangle.
data:image/s3,"s3://crabby-images/a647f/a647fdb31c5252b68d0b3761299062a91aee04e7" alt="Image"
Update:
Method 2
Maby it's easier to rotate until imagemagick finds minimum possible height, or height equals under a certain set pixels.
This was just an idea that might work?! I am very open to ANY suggestions how to accomplish this. If this can be done with only the command line it's great, otherwise i'm most comfortable with PHP....
When this process is done i'm going to do OCR with Tesseract on the text.
Thank you all, and thank you for this fantastic forum
data:image/s3,"s3://crabby-images/fff91/fff91b021592121aacbe5703ec1d59e2fad06369" alt="Smile :)"