part-stereo-multiview-video: Add a section of open design questions

This commit is contained in:
Jan Schmidt 2015-05-30 01:03:46 +10:00
parent 4882cb9f37
commit b7fd8ffb76

View File

@ -37,7 +37,7 @@ encoded video for decoders to apply onto the raw video buffers they decode.
*If there ever is a need to transport multiview info for encoded data the *If there ever is a need to transport multiview info for encoded data the
same system below for raw video or some variation should work* same system below for raw video or some variation should work*
### Encoded Video: Parameters to transmit in caps ### Encoded Video: Properties that need to be encoded into caps
1. multiview-mode (called "Channel Layout" in bug 611157) 1. multiview-mode (called "Channel Layout" in bug 611157)
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo * Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
(switches between mono and stereo - mp4 can do this) (switches between mono and stereo - mp4 can do this)
@ -85,10 +85,10 @@ same system below for raw video or some variation should work*
Buffer representation for raw video Buffer representation for raw video
----------------------------------- -----------------------------------
* Transported as normal video buffers with extra metadata * Transported as normal video buffers with extra metadata
* The caps define the overall buffer width/height, with GstVideoMeta to extract the individual views. * The caps define the overall buffer width/height, with helper functions to
extract the individual views for packed formats
* pixel-aspect-ratio adjusted if needed to double the overall width/height * pixel-aspect-ratio adjusted if needed to double the overall width/height
* video sinks that don't know about multiview extensions yet should show the '0th' GstVideoMeta and * video sinks that don't know about multiview extensions yet will show the packed view as-is
therefore the 'primary' view in a backwards compatible way for simple frame-packed transports.
For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps
can disallow those transports. can disallow those transports.
* _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice * _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice
@ -129,6 +129,24 @@ not all are wanted.
* Logical labels/names and mapping to GstVideoMeta numbers * Logical labels/names and mapping to GstVideoMeta numbers
* Standard view labels LEFT/RIGHT, and non-standard ones (strings) * Standard view labels LEFT/RIGHT, and non-standard ones (strings)
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
struct GstVideoMultiviewViewInfo {
guint view_label;
guint meta_id; // id of the GstVideoMeta for this view
padding;
}
struct GstVideoMultiviewMeta {
guint n_views;
GstVideoMultiviewViewInfo *view_info;
}
The meta is optional, and probably only useful later for MVC
Outputting stereo content Outputting stereo content
------------------------- -------------------------
The initial implementation for output will be stereo content in glimagesink The initial implementation for output will be stereo content in glimagesink
@ -143,11 +161,14 @@ The initial implementation for output will be stereo content in glimagesink
## Other elements for handling multiview content ## Other elements for handling multiview content
* videooverlay interface extensions * videooverlay interface extensions
* __Q__: Should this be a new interface?
* Element message to communicate the presence of stereoscopic information to the app * Element message to communicate the presence of stereoscopic information to the app
* App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
* Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
* New API for the app to set rendering options for stereo/multiview content * New API for the app to set rendering options for stereo/multiview content
* This might be best implemented as a **multiview-disposition GstContext**, so that * This might be best implemented as a **multiview GstContext**, so that
the pipeline can share app preferences for (in particular) downmixing the pipeline can share app preferences for content interpretation and downmixing
to mono for output, or in the sink to mono for output, or in the sink and have those down as far upstream/downstream as possible.
* Converter element * Converter element
* convert different view layouts * convert different view layouts
* Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono * Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
@ -168,8 +189,90 @@ Things to do to implement MVC handling
4. generating SEI in H.264 encoder 4. generating SEI in H.264 encoder
5. Support for MPEG2 MVC extensions 5. Support for MPEG2 MVC extensions
### Relevant bugs ## Relevant bugs
[bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
[bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
[bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
## Other Information
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
## Open Questions
### Background
### Representation for GstGL
When uploading raw video frames to GL textures, the goal is to implement:
2. Split packed frames into separate GL textures when uploading, and
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
multiview-flags fields in the caps should change to reflect the conversion
from one incoming GstMemory to multiple GstGLMemory, and change the
width/height in the output info as needed.
This is (currently) targetted as 2 render passes - upload as normal
to a single stereo-packed RGBA texture, and then unpack into 2
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
2 GstGLMemory attached to one buffer. We can optimise the upload later
to go directly to 2 textures for common input formats.
Separat output textures have a few advantages:
* Filter elements can more easily apply filters in several passes to each
texture without fundamental changes to our filters to avoid mixing pixels
from separate views.
* Centralises the sampling of input video frame packings in the upload code,
which makes adding new packings in the future easier.
* Sampling multiple textures to generate various output frame-packings
for display is conceptually simpler than converting from any input packing
to any output packing.
* In implementations that support quad buffers, having separate textures
makes it trivial to do GL_LEFT/GL_RIGHT output
For either option, we'll need new glsink output API to pass more
information to applications about multiple views for the draw signal/callback.
I don't know if it's desirable to support *both* methods of representing
views. If so, that should be signalled in the caps too. That could be a
new multiview-mode for passing views in separate GstMemory objects
attached to a GstBuffer, which would not be GL specific.
### Overriding frame packing interpretation
Most sample videos available are frame packed, with no metadata
to say so. How should we override that interpretation?
* Simple answer: Use capssetter + new properties on playbin to
override the multiview fields
*Basically implemented in playbin, using a pad probe. Needs more work for completeness*
### Adding extra GstVideoMeta to buffers
There should be one GstVideoMeta for the entire video frame in packed
layouts, and one GstVideoMeta per GstGLMemory when views are attached
to a GstBuffer separately. This should be done by the buffer pool,
which knows from the caps.
### videooverlay interface extensions
GstVideoOverlay needs:
* A way to announce the presence of multiview content when it is
detected/signalled in a stream.
* A way to tell applications which output methods are supported/available
* A way to tell the sink which output method it should use
* Possibly a way to tell the sink to override the input frame
interpretation / caps - depends on the answer to the question
above about how to model overriding input interpretation.
### What's implemented
* Caps handling
* gst-plugins-base libsgstvideo pieces
* playbin caps overriding
* conversion elements - glstereomix, gl3dconvert (needs a rename),
glstereosplit.
### Possible future enhancements
* Make GLupload split to separate textures at upload time?
* Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
* Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
- current done by packing then downloading which isn't OK overhead for RGBA download
* Think about how we integrate GLstereo - do we need to do anything special,
or can the app just render to stereo/quad buffers if they're available?