Otherwise we will get it again later for output, however this frame will
never actually be output so we will shift timestamps.
This is especially bad if we're handling a live stream where the first
frames are not keyframes. We would output the keyframe with the
timestamp of the first frame, and everything would be too late when
arriving in the sink.