A little self-update here.
1. It is seems to be enough to just flush TLB entries with the iothread lock held since the CPUs are stopped here.
2. The upstream version is not vulnerable to the bug due to the following code call path: ram_save_setup -> memory_global_dirty_log_start -> memory_region_transaction_commit -> tcg_commit -> CPU_FOREACH(cpu) tlb_flush(cpu, 1);.
3. QEMU versions 2.0.0 (including Ubuntu's one) and 2.3.1 are vulnerable to the bug. I will report these as appropriate.