Hello,
I haven't been following this thread too closely, so apologies if any of this is duplicate information.
The multiple core feature you are talking about can be set by passing "-o synth.cpu-cores=2" to use 2 CPU cores. Back when I initially implemented this, it did indeed provide for an increase in the number of voices, but your mileage may vary and I haven't used it in some time and I know David Henningsson did some pretty significant refactoring of the synth not too long after that. I don't think it contributes to additional latency, though there will be increased scheduling overhead which means you very likely wont get a doubling of the simultaneous voice potential.
If you are indeed experiencing a maxing out of the CPU, then you could try other things to reduce the CPU consumption. If you don't need the built in reverb or chorus effects you could disable them (-R0 and -C0 respectively). You could also reduce the sample rate (as it seems you have already tried - using the '-r SAMPLERATE' option), but as you also noted this will increase the latency if the buffer size/count values stay the same.
In regards to the number of buffers and buffer size, that mostly affects latency. It may affect CPU usage as well, but I would think the larger the buffers and count of buffers, the lower the CPU usage would be, rather than the other way around. I doubt this has much affect on CPU usage though. It sounds like you need to patch your driver as was already mentioned. The hardware may have its own limitations too, which if not acceptable may require you to try and use a different audio interface (USB maybe?).
It seems that ARM architecture has an FPU at least, which is good.
Best regards,
Element Green