Halfbakery: image breakdown descriptor

Computer: Web: Image
image breakdown descriptor (0) [vote for, against]
a text file describing in various levels of detail the different parts of an image

Turn all images into words.

This computer-generated but also human-assisted JSON file will be useful for OCR and web searches. The IBD text file describes an image and its parts, including both low-level and high-level descriptions of what can be seen.

In any case, it includes an SVG or several SVG depictions. These can be at different levels of detail.

For text written in the image, the IBD describes what the letters look like, the diagonal and horizontal lines curves and dots, and the color and thickness. If there the language and font, have been detected, it has that info as well. If the text has a known source it can have that too, with a URL.

If the text is understood, then the transliteration will be given. If only some of the letters are understood, those will be given. If there are doubts, the possibilities will be described.

For graphic parts of the image, there will be a rough description of the edges detected, the shapes seen in the image, the colors and "texture".

For example: (Describing a sun smiley in the top left corner)

1. Drawing: "Sun"

1.1 Yellow circle. Location: X35, Y50 out of W800xH1200) Size: radius 32px Border: 3px black. Fill: yellow

1.2 smile inside circle 1.1 etc. etc.

If the objects in the image are known, a description of them will be listed: For example: A bearded man smiling, wearing a hat. A yellow smiley. Smiling Moshe Flam [aka Pashute] standing in park at Hebrew University (according to caption given with the image)

Low level: The sloppy handwritten letter A would be described as follows:

Section: Center of image. Dimensions: W40px, H300px. Location X367, Y220

Rows of text: 7. Columns 2.

Parts: 1. /, 2. \ or |, 3. - or / (middle hight)

Connections: /\; /-\;

Understood: A (probably not H)

Text parts: 3 sentences.

Line 1: A fox jumped over

We can then give our IBD program any manuscript, and show it some part which we do understand clearly. The program then shows us where the descriptors of that are, and we can connect them with text. Then the rest could be easily deciphered, without using image recognition, but rather by using text tools.

In other words, all images will now become words.
-- pashute, Jun 06 2023

https://xkcd.com/1897/ [hippo, Jun 06 2023]

This one is even funnier https://xkcd.com/1732/
got me to watch movie 300, and realize what the movie parody "sparta" was all about [pashute, Jun 08 2023]

Three minutes movie https://www.youtube...watch?v=I8RprTU0hXY
details noticed [pashute, Jun 08 2023]

Sp. Descriptor

... unless there's a subtle pun that I missed.
-- pertinax, Jun 06 2023

if the software doing this job suspects that an image contains for example, a tractor, it can easily verify this by taking this target image and some pictures known to contain tractors and some other pictures known to not contain tractors and presenting these as a captcha test to a random 100 people who are trying to access some secure website
-- hippo, Jun 06 2023

indeed - see link
-- hippo, Jun 06 2023

I know a person whose job is to sit and describe TV programmes for blind viewers. Its actually very difficult. I think a complete description would be impossible since context, cultural references will explode out of control. And if the description is not complete then there needs to be selection. And once you have selection then the data is no longer impartial and sensibly searchable or categoriseable.
-- pocmloc, Jun 06 2023

Yesterday I sat behind a blind student (he was first time in class, looks and dresses like Bin Laden) at a (fantastic) lesson on "Cuneiform and the beginning of writing" at Hebrew University! It was an experience. After a question I asked about one shape he asked me to draw it on his back. So here's another app. An SVG drawing robot on a blind man's hand or back.

I hope I never need it, but since I'm losing my eyesight bit by bit due to diabetes, who knows?
-- pashute, Jun 08 2023

adding SVG
-- pashute, Jun 08 2023

Thanks [A1]!

The Mohamad Ali Midjourney blunder was great. Shows that it is NOT describing the images using any algorithm, but rather using human input gathered from the web.

That information too will be stored. A good test would be to find the original image with the text description. Like in A1's linked video on Midjourney.

Of course, as [pocmloc] said, it will never be complete.

As time advances there will be better and better descriptors for different kinds of "standard" objects that the AI would notice.

There's a movie about the holocaust and the Jews of a town in Europe where each time another detail is noticed, like the buttons on the clothing, the sign over a shop, the hats on the kids' heads, the girls' presence, the height of the camera, the trees on the street, the doors of the hall people are entering, and slowly more and more memories about this community, the people in it, and the occasion being filmed, are discovered.

Image breakdown descriptor files would have an open standard for description, and the actual descriptions would be stored in repositories. (Grammarly is marking "descriptions" but not giving a suggestion why. Any suggestions why?)
-- pashute, Jun 08 2023

That's curious - but I haven't used Grammarly, and don't know why it does what it does. "Descriptions" seems fine to me, as a non-automated grammar pedant.

Wait; I think I can guess; I have heard from time to time that some people think it's bad style to use several cognate words in close proximity to each other (in this case, "descriptor", "description" and "descriptions"). Fortunately, those people are wrong.
-- pertinax, Jun 08 2023

random, halfbakery