h a l f b a k e r yWhy not imagine it in a way that works?
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
This is a combination of two existing AI technologies.
The first one is computer vision & scene understanding. That is, if you're looking at a picture of a beach, recognize features and create a textual description such as "Three palm trees line the beachfront. The sun is 27 degrees over the horizon.
Cumulonimbus clouds are seen above horizon." The level of detail in the description can vary.
The second portion of this is is a system that has pre-defined 3D models for a lot of common things such as trees, clouds, people, etc.
The system first takes the picture, creates a textual description of it, then hands this over to the second component which tries to reconstruct it. There will, of course, be ambiguities which the second AI system will have to resolve, thus creating a photorealistic rendering that _could_ have been the original thing but is actually a re-interpretation of it.
[Think about how many different pictures there are of famous historical events]
Computational Heraldry
http://mustard.tapo.../xq/xhtml.xq?id=115 Automatic generation of coat of arms from a specialized medieval language called "blazon" [cowtamer, Apr 09 2009]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Destination URL.
E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)
|
|
yes, think about that. wowie! |
|
|
Hey, if the 2nd part is baked, I'd love to have the actual link. I've heard there's some research into it, but have not seen anything about it outside a graduate level Computational Linguistics class I took once. Parsing a sentence to create a scene is still a non-trivial AI problem as far as I know. |
|
|
Could be a useful bandwidth reducing tool for video, depending on the use and the extent of the object library. |
|
|
The second part of the problem is more complicated than having an open source 3D library. It's the problem of picking the appropriate shape and deducing from the description (and the rest of the scene that has already been constructed) where it should go. The scene may have to be revised as the description is parsed. |
|
|
Consider the simple example: |
|
|
"The is an office chair behind the desk. The chair is red and has 6 wheels. The desk is in front of the window" |
|
|
The system would have to deduce |
|
|
* That the chair is between the window and the desk |
|
|
* That the desk is probably an office desk (and not a school desk) |
|
|
* That the setting is probably an office (as opposed to an outdoor scene) and might have other office-appropriate items |
|
|
* That the desk has an opening where the wheels are visible to the observer ("behind" is a relative concept that has no meaning without the location of an observer) |
|
|
I thought the idea was about translation not
interpretation... |
|
|
My daughter told me she saw that in action, baked
by
an Israeli kid inventor. The camera translates in real
time inside an image using the texture and fontsize
and replacing it with other language. |
|
|
This is now pretty much baked. A year or two ago I read of a
machine learning-based system where an image would be
converted to an internal representation and back, for the
purpose of compression. However, the output image is not the
same as the input one. It contains all the same things, but not in
quite the same arrangement, and not with quite the same
details. Unfortunately, I can't refind the article I remember, just
ones that almost exactly reproduce the input image, which is
better for compression but worse for reinterpretation. |
|
|
The only difference between what I remember seeing and this
idea is that the internal representation isn't text (instead
presumably being some vector of coefficients that is proprietary
to the individual trained instance of the system), but it wouldn't
be too hard to use image-to-text and text-to-image neural
networks instead. |
|
| |