If we receive video buffers with non-perfect timestamps, the
caption buffers' timestamps might fall in the interval between
the end of one video buffer and the start of the next one.
Make our criteria for dropping that the caption buffer has
a timestamp older than the end of the previous video buffer,
not older than the start of the new one, unless of course
this is the first video buffer.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1207>