When we negotiate with downstream, We should use the intersected caps of input and output to decide the alignment and stream format. The current code just uses the input caps which may lack the stream format. Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1837>