The base characters themselves certainly fit. However, if one wishes to
operate on syllables (made by combining consonants in the base
character set), the number of these syllables can exceed 256.
Here is a short example of just one of the issues that come up when
treating characters, rather than syllables as the base unit in Hindi.
Take, for example, the conjunct, "kra", à¤à¥à¤°. This is represented
linguistically, and in UTF-8, as ठ+ ॠ+ र (U0915 + U094D + U0930).
It makes no sense to swap the "halant" (U094D) with the "ka" or the
"ra", as that creates a completely different conjunct, and is not a
mistake that would typically be made. As you suggest, I could just
include "kra" in the encoding, but, in many Indian languages, the
256 available slots are not sufficient for all such conjuncts.