One of the problems with conventional video conferencing
is that each person sees the other from the point of view of
the other person's camera. When each camera is directly
above the screen, and each person looks at the image of
the other, each person sees the other looking downwards.
This
idea solves that, by having each person's computer
convert the 2D video stream to a 3D video stream, then
immediately convert that 3D back to 2D, but with a
modified point of view.
The easiest way to make the 2D->3D conversion is for each
computer to have a second video camera aimed at the
user; this doesn't need to be *part of* the computer, it
could easily be a camera-phone on a tripod, or even just
propped up next to the computer. The two image streams
provide a binocular image stream, from which distance
data can be easily derived.
Each person's computer does two things with this 3D video
stream...
First, it locates the position of the face of the person
looking at the screen, and continually streams this position
to the other computer.
Second, it converts the 3D stream to 2D, using the
streamed position data from the other computer as the
perspective from which to view the scene.
This perspective video stream is then transmitted over the
internet, and displayed just like any ordinary video.
If we want to remove or replace the background, for
privacy or for fun, it's easy, since we can chop out parts of
the 3D model which are significantly further than the user's
face.
This idea was inspired by the Instagram Hyperlapse app,
which converts a 2D video stream to a 3D intermediate
form, then changes it back to 2D for output.
Prior art is [link], which, instead of actually making a 3D
stream, creates a mask to divide the 2D stream into
foreground (the user) and background, and then
recomposes them with altered parallax, based on the face
position of the viewer.