We wait to parse a minimum number of frames (10, arbitrarily) before
emiting bitrate tags so that our early estimates are not wildly
inaccurate for streams that start with a silence. If the stream ends
before that, we just emit the tags anyway.
While it _would_ be nicer to be specify the threshold to start pushing
the tags in terms of duration, this would introduce more complexity than
this merits.
https://bugzilla.gnome.org/show_bug.cgi?id=614991