CPU waits are more expensive and are only required if the CPU is ever going to access the data. GPU waits perform inter-context synchronisation and are cheaper as they don't require CPU intervention.