Default CUDA memory allocation will cause implicit global
synchronization. This stream ordered allocation can avoid it
since memory allocation and free operations are asynchronous
and executed in the associated cuda stream context
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7427>