So it learns how to correctly read Akkadian or Chinese or
Hebrew
and show it in a correct way. The first try comes out all wrong
but then as you teach it a few sentences it starts
understanding what it's supposed to read and gets better and
better.
The OCR software asks you not only questions
of lexical
grammar
but also "understanding" semantics, comprehension (like
negation),
and even pragmatics. (storing knowledge of language in
different
specialized fields).
You can choose where to start. See the Friedberg website
Genizah (and Hachi Garsinan). You'll have to know Hebrew or
Aramaic to realize what you are seeing, but I suppose there
are other sites like it for manuscripts, showing the
transliteration alongside the image and allowing you to edit
while seeing where you are.
It also searches the web for similar texts, (and asks if these
are actually useful and similar) or accepts texts like those
from you, while learning, if its getting things wrongly.
I'm just mentioning this supposedly as a halfbaked idea.
Actually, I'm in the process of doing this. Well, only at the
very beginning. Anyone who wishes to join is welcome.