Increasing from 1 to 2 threads on an Thinkpad X60s decreased encode time in a test from ~24 s to ~19 s, so this is quite useful. Ideally we should let 0 be the default and automatically match the number of CPU cores (or something).