Page 1 of 1
Need to Extract Hindi Text from PDF(Image) File
Posted: 2018-10-13T07:17:23-07:00
by codebox
I am Having a PDF file from which I need to extract Text in the Hindi Font Only. The PDF seems to be Image. Please Guide how to extract this in Text/Excel file.
Sample File
https://www.dropbox.com/s/kxbgp3cxb606i ... e.pdf?dl=0
Thanks
Re: Need to Extract Hindi Text from PDF(Image) File
Posted: 2018-10-13T09:00:27-07:00
by Bonzo
Have you tried dedicated OCR software?
I tried part of your first page on
http://www.i2ocr.com/free-online-hindi-ocr and it was a bit slow and was not 100% correct but I would think you could edit the output. I doubt any OCR software would be 100%.
Personally unless you have hundreds to do I would type it out manually as by the time you have checked the results are correct you could have done it.
Re: Need to Extract Hindi Text from PDF(Image) File
Posted: 2018-10-13T09:53:26-07:00
by codebox
Yes I tried that, Before posting to this forum and after your reply again.
I got error as "Invalid Input Image Type"
I chose Input Language as "Hindi"
Thanks
Re: Need to Extract Hindi Text from PDF(Image) File
Posted: 2018-10-13T11:33:02-07:00
by Bonzo
I did not download your whole file but took a screen capture and it was saved as a png - Microsoft snipping tool
Re: Need to Extract Hindi Text from PDF(Image) File
Posted: 2018-10-13T22:56:07-07:00
by codebox
OK. Will try it using a PNG file. Thanks