The RTP payload seems to be required as it carries the frame count information. Also, gst_rtp_base_payload_allocate_output_buffer had the second argument incorrect. Strangely some devices like Shanling MP4 and Sony XM3 would still work without this while some like the Sony XM4 do not. Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1797>