Turn all images into words.
This computer-generated but also human-assisted JSON file will be useful for OCR and web searches. The IBD text file describes an image and its parts, including both low-level and high-level descriptions of what can be seen.
In any case, it includes an SVG or several
SVG depictions. These can be at different levels of detail.
For text written in the image, the IBD describes what the letters look like, the diagonal and horizontal lines curves and dots, and the color and thickness. If there the language and font, have been detected, it has that info as well. If the text has a known source it can have that too, with a URL.
If the text is understood, then the transliteration will be given. If only some of the letters are understood, those will be given. If there are doubts, the possibilities will be described.
For graphic parts of the image, there will be a rough description of the edges detected, the shapes seen in the image, the colors and "texture".
For example: (Describing a sun smiley in the top left corner)
1. Drawing: "Sun"
1.1 Yellow circle. Location: X35, Y50 out of W800xH1200) Size: radius 32px Border: 3px black. Fill: yellow
1.2 smile inside circle 1.1 etc. etc.
If the objects in the image are known, a description of them will be listed: For example: A bearded man smiling, wearing a hat. A yellow smiley. Smiling Moshe Flam [aka Pashute] standing
in park at Hebrew University (according to caption given with the image)
Low level: The sloppy handwritten letter A would be described as follows:
Section: Center of image. Dimensions: W40px, H300px. Location X367, Y220
Rows of text: 7. Columns 2.
Parts: 1. /, 2. \ or |, 3. - or / (middle hight)
Connections: /\; /-\;
Understood: A (probably not H)
Text parts: 3 sentences.
Line 1: A fox jumped over
We can then give our IBD program any manuscript, and show it some part which we do understand clearly. The program then shows us where the descriptors of that are, and we can connect them with text. Then the rest could be easily deciphered, without using image recognition, but rather by using text tools.
In other words, all images will now become words.