Re: Internationalisation and fonts

discuss-gnustep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internationalisation and fonts - a suggestion

From:	Richard Frith-Macdonald
Subject:	Re: Internationalisation and fonts - a suggestion
Date:	Tue, 5 Aug 2003 14:03:15 +0100


On Tuesday, August 5, 2003, at 12:26 PM, Pete French wrote:

and typed `ls`. I tracked it down to a single file, which had an
LATIN SMALL LETTER A WITH ACUTE in seemingly malformed UTF-8. Once
GWorkspace hit this, it went crazy and started missing some of the
Now that might explain things. I certainly have alot of interestinggarbage
in my home directory. I was also looking at the internal implementation
of the UTF8 translation code last night and it doesnt seem that robust(seethe bug I posted for starters). It should be simple enough to recodethat toskip garbage UTF8 sequences rather than barfing over the whole stringand
returning nil - which might help this problemmaybe ?

But that would be *very* wrong. Conversion to/from character setsneeds tofail if the conversion is not possible, rather than trying to guesswhat thecorrect results are. If we skip unintelligible rubbish whileconverting, theapplication has no way of telling that there is a problem... we have tofail

when there is rubbish in the string, so the application can do something
about the problem.

For instance, with regard to your bug report ... it is certainly truethat GNUstep

only supports the unicode base plane (until someone wants to change

that) ... but it's not a bug in the utf8 conversion. Rather, theunicode strings

in GNUstep are ucs2 so the conversion code is written to reject utf8
data containing characters which can't be represented in ucs2.
ie. we need to do an audit of all the unicode support and update it to
work with a utf16 internal format rather than a ucs2 format (and do so
in a compatible way to that of MacOS-X) before we can change the
characterset conversion code.  To do it the other way round would
merely introduce a lot of subtle bugs in place of a simple limitation.

I think, when the original OpenStep spec was written, utf16 did notexist andunicode had a 16-bit representation for *all* characters... so theNSStringapi presumes that a unicode character is a single 16-bit value. WithApple using

a variable length (utf16) representation for modern unicode, we want to

make sure that a future GNUstep maps that API to a utf16 internalrepresentation

in the same way that apple does.

eg. if we have a string containing a single utf16 character occupying 4bytes,and we use the -length and -characterAtIndex: methods, will the stringappear

to contain two characters or one?

I'd welcome anyone volunteering to do the coding for that move from us2to utf16

(and enhancing NSCharacterSet to support ucs4 instead of ucs2)

[Prev in Thread]

Current Thread

[Next in Thread]

Internationalisation and fonts - a suggestion, Pete French, 2003/08/03
- Re: Internationalisation and fonts - a suggestion, Alexander Malmberg, 2003/08/03
  - Re: Internationalisation and fonts - a suggestion, Pete French, 2003/08/04
- Re: Internationalisation and fonts - a suggestion, MJ Ray, 2003/08/03
  - Re: Internationalisation and fonts - a suggestion, Alexander Malmberg, 2003/08/03
  - Message not available
    - Re: Internationalisation and fonts - a suggestion, MJ Ray, 2003/08/04
- Re: Internationalisation and fonts - a suggestion, Pete French, 2003/08/04
  - Re: Internationalisation and fonts - a suggestion, Christopher Culver, 2003/08/04
    - Re: Internationalisation and fonts - a suggestion, Pete French, 2003/08/05
    - Re: Internationalisation and fonts - a suggestion, Richard Frith-Macdonald <=
    - Re: Internationalisation and fonts - a suggestion, Pete French, 2003/08/05

Prev by Date: Re: Internationalisation and fonts - a suggestion
Next by Date: Re: Internationalisation and fonts - a suggestion
Previous by thread: Re: Internationalisation and fonts - a suggestion
Next by thread: Re: Internationalisation and fonts - a suggestion
Index(es):
- Date
- Thread