Please log in.
Before you can vote, you need to register. Please log in or create an account.
Computer: Web: Video
Descriptive Cognitive Encoding   (+3, -1)  [vote for, against]
Video, images and audio stored and retrieved as linked memory scripts

This idea is about a detailed description that is enough to recreate the video, with described objects on the scene, and links to other similar images, to other audio bits, or to other videos, which are kept as "master keynotes" or anchors to start-off from or to end up at.

A single picture for each scene can be enough and even less is needed if we have an image from another scene or another movie from which we can reconstruct the whole thing.

For example, if we have the face of each of the actors in a play, as seen from different angles, they could be stored once, and then we can reconstruct the whole scene, using only the description of the play, along with a few parameters like the location of the heads on the screen and the angle at which the photo has been taken, the words being spoken, and the expressions (if needed some memorable features could be added such as the normal eyebrow position and changes during anger or surprise), the position or actions of the hands and feet, etc.

This would make a "script" file, which could be read and understood by humans, along with the linked images or video sections all of which are enough to reconstruct the original image or movie in good detail. Enclosed with the image are parts of extra details getting more and more particular.

Unlike a standard movie scene script, in the Descriptive Cognitive Encoding script, there is more detail about what we see and where. Instead of data in two dimensions on the display, it would save the perceived 3-dimensional distance to the objects, what they are, where they are moving to, and where the camera is.

After a first run of encoding, a comparison between the original and the result could be done, assessing what was missing, or at least what was noticeably missing, and then could be automatically corrected accordingly.

As opposed to a 3d modeling file, it will be in natural language, (with an equivalent in JSON format), that can be read and understood by a human, aided with visually open links.

A similar audio file would be constructed of some sound features like the musical instruments sampled (from the movie itself) the voices and their typical prosodies, and then besides the text or chords and notes, a bit of information about the way the sound was made, for example the temperament and accent which would allow for a fairly good reconstruction of the original "from memory".
-- pashute, Jan 14 2024

edited for clarity (I hope) and corrected the English.

Please tell me if it is unclear. And explain why you are throwing fishbones at it.
-- pashute, Jan 14 2024


I can imagine enough description and sample files for an AI to recreate the movie. Why do this, though?

More innovative movies might be less "compressible" ie requiring more words and samples to fully recreate the movie.
-- sninctown, Jan 14 2024


A "script" may be (1) a short passage of code, to be interpreted at run-time by a computer, or (2) a (probably longer) passage of text to be interpreted by a group of performance artists.

//perceived 3-dimensional distance to the objects// is normally added by the director, and is called "blocking". The absence from the script of this level of detail helps the human participants to imagine that they are very special, important creative people, and not on the verge of being replaced by software. Bless.

"Descriptive encoding" is the kind of thing XML is for (more than JSON). If you tried to do this in JSON, then you'd likely have to reinvent more wheels in the code that processed it. But that's not the important thing here.

The important thing is, you need to clarify whether you want this to work as a script in sense #1 or in sense #2. The same format is unlikely to work well in both roles. And if you're going for sense #1, then the element of human-readability is not going to add much. And if you're going for sense #2, then the element of computer- readability will be either ignored or resented by the human cast
-- pertinax, Jan 15 2024



random, halfbakery