Need to Extract Hindi Text from PDF(Image) File

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
codebox
Posts: 5
Joined: 2017-11-15T01:46:10-07:00
Authentication code: 1152

Need to Extract Hindi Text from PDF(Image) File

Post by codebox »

I am Having a PDF file from which I need to extract Text in the Hindi Font Only. The PDF seems to be Image. Please Guide how to extract this in Text/Excel file.

Sample File
https://www.dropbox.com/s/kxbgp3cxb606i ... e.pdf?dl=0

Thanks
Bonzo
Posts: 2971
Joined: 2006-05-20T08:08:19-07:00
Location: Cambridge, England

Re: Need to Extract Hindi Text from PDF(Image) File

Post by Bonzo »

Have you tried dedicated OCR software?

I tried part of your first page on http://www.i2ocr.com/free-online-hindi-ocr and it was a bit slow and was not 100% correct but I would think you could edit the output. I doubt any OCR software would be 100%.

Personally unless you have hundreds to do I would type it out manually as by the time you have checked the results are correct you could have done it.
codebox
Posts: 5
Joined: 2017-11-15T01:46:10-07:00
Authentication code: 1152

Re: Need to Extract Hindi Text from PDF(Image) File

Post by codebox »

Yes I tried that, Before posting to this forum and after your reply again.
I got error as "Invalid Input Image Type"

I chose Input Language as "Hindi"

Thanks
Bonzo
Posts: 2971
Joined: 2006-05-20T08:08:19-07:00
Location: Cambridge, England

Re: Need to Extract Hindi Text from PDF(Image) File

Post by Bonzo »

I did not download your whole file but took a screen capture and it was saved as a png - Microsoft snipping tool
codebox
Posts: 5
Joined: 2017-11-15T01:46:10-07:00
Authentication code: 1152

Re: Need to Extract Hindi Text from PDF(Image) File

Post by codebox »

OK. Will try it using a PNG file. Thanks
Post Reply