[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
_DefaultStringEncoding
From: |
Bruno Haible |
Subject: |
_DefaultStringEncoding |
Date: |
Fri, 17 Oct 2003 16:14:07 +0200 |
User-agent: |
KMail/1.5 |
Hi,
NSString._DefaultStringEncoding is determined as the value of GetDefEncoding()
in Unicode.m.
I have three questions about it.
1) Why are the possible values of GNUSTEP_STRING_ENCODING in the
range { "NSISOLatin1StringEncoding", "NSJapaneseEUCStringEncoding", ... }
and not the widely known and standardized names
{ "ISO-8859-1", "EUC-JP", ... }
? This makes it needlessly hard for users.
2) Why does gnustep-base-1.8.0/Documentation/Base.gsdoc say that the value
of GNUSTEP_STRING_ENCODING
"may be any of the 8-bit encodings supported by your system
(excluding multi-byte encodings)" ?
I've set it to NSUTF8StringEncoding and the Hello world program displays
its greeting message (in German, non-ASCII of course) just fine.
3) If GNUSTEP_STRING_ENCODING is not set, why is the default value
(set in Unicode.m:580) ISO-8859-1? On POSIX systems, all programs
are expected to interpret file names and file contents according to
the encoding given by the current locale (nl_langinfo (CODESET)).
IMO this codeset should be taken and transformed into the GNUstep
specific equivalent name. I'm using a de_DE.UTF-8 locale and all
my local files are UTF-8 encoded.
The situation for URLs is different; for files read from arbitrary
URLs the following heuristic makes sense:
- If the contents is valid UTF-8, then assume it is UTF-8.
- Otherwise assume it is ISO-8859-1.
The reason why this heuristic works well in practice is that normal
human-written ISO-8859-1 texts have a ~ 99.8% probability of being
invalid UTF-8.
Bruno
- _DefaultStringEncoding,
Bruno Haible <=