OCR for ancient Greek: Difference between revisions

Revision as of 17:38, 3 August 2012

Tesseract is an ongoing Google open source project for OCR.
The Gamera toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
Bruce Robertson reports on some preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
Federico Boschetti has been experimenting with adapting/training Google's OCR engine tesseract to ancient Greek texts: http://www.himeros.eu/ (related paper)
The commercial OCR software Anagnostis (€585) can handle ancient Greek, though apparently poorly
ABBYY FineReader can be made to work with ancient Greek with extensive training
Google Docs now allows you to have it do OCR on uploaded documents in a variety of languages, and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.

AccessTEI is a service for members of the TEI for manual keying of texts which can handle ancient Greek

@@ Line 1: / Line 1: @@
+* [http://code.google.com/p/tesseract-ocr/ Tesseract] is an ongoing Google open source project for OCR.
+* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
 * Bruce Robertson reports on some preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
 * Federico Boschetti has been experimenting with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper])
 * The commercial OCR software [http://www.ideatech-online.com/index.php?option=com_content&task=view&id=23&Itemid=27 Anagnostis] (€585) can handle ancient Greek, though apparently poorly
 * [http://finereader.abbyy.com/ ABBYY FineReader] can be made to work with ancient Greek with extensive training
-* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
 * Google Docs now allows you to have it do [http://googledocs.blogspot.com/2011/02/optical-character-recognition-ocr-in-34.html OCR on uploaded documents in a variety of languages], and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.