discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting UTF-8 value of string occasionally fails


From: Pete French
Subject: Re: Getting UTF-8 value of string occasionally fails
Date: Wed, 13 Oct 2004 13:14:18 +0100

> Should U+D800 return a valid UTF-8 string? Is this a bug in base or
> correct behavior?

No, it shold not - unpaired surrogates are not allowed. We are using UTF16
internally, as the original OpenStep spec was written before the characters
outside the 16 bit range were definted (so a unichar is 16 bits). But UUTF-8
is not allowed to encode the surrogates individually - they are formed back
into a single character which is then converted.

> If it's my fault, suggestions for a workaround would be appreciated.

You need to maake sure that the string contains a single codepoint. If
you are talking codepoints outside the basic 16 bit charater set then that
means your string needs to contain two words to equate to a single codepoint.

-bat.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]