When we negotiate with downstream, We should use the intersected
caps of input and output to decide the alignment and stream format.
The current code just uses the input caps which may lack the stream
format.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1837>