Great!
Do you have an estimate of possible performance gain by introducing
direct pointer to mmu_map for memory read?
I have two ideas for future experimentation.
There is a trick possible without wasting another register for global
variable: use two copies of CPUState (one for privileged and another
for user mode), then make mmu_map.add_read first member of the
struct. This would introduce guest register coping for user/supervisor
switch, but maybe performance gain would justify this.
Another idea: if we could align add_read/add_write on 64k boundary,
"addi" could be removed.