[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSString lowercaseString
From: |
Sebastian Reitenbach |
Subject: |
Re: NSString lowercaseString |
Date: |
Wed, 01 Aug 2012 13:20:09 +0200 |
User-agent: |
SOGoMail 1.3.17 |
On Wednesday, August 1, 2012 11:49 CEST, David Chisnall <theraven@sucs.org>
wrote:
> On 1 Aug 2012, at 09:50, Sebastian Reitenbach wrote:
>
> > I "enhanced" my test program a bit, and compared output when running on
> > Linux and OpenBSD:
> >
> > #import <Foundation/Foundation.h>
> >
> >
> > int main(int argc, char *argv[]) {
> > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"]
> > lowercaseString]);
> >
> > }
>
> On closer inspection, there is a bug here, but not where you think it is...
>
> Try this test case:
>
> $ cat tolower.m #import <Foundation/Foundation.h>
> #import <wctype.h>
>
>
> int main(int argc, char *argv[]) {
> [NSAutoreleasePool new];
> NSString *l = [@"TöÖst" lowercaseString];
> NSLog(@"Lowercase: %@", l);
> NSLog(@"Lowercase: %s", [l UTF8String]);
> for (int i=0 ; i<[l length] ; i++)
> {
> int c = [l characterAtIndex: i];
> NSLog(@"%c %d", c,c);
> }
> }
> $ clang tolower.m -lgnustep-base
> $ ./a.out
> 2012-07-31 19:23:44.810 a.out[69751] Lowercase: t??st
> 2012-07-31 19:23:44.813 a.out[69751] Lowercase: tööst
> 2012-07-31 19:23:44.813 a.out[69751] t 116
> 2012-07-31 19:23:44.813 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] s 115
> 2012-07-31 19:23:44.814 a.out[69751] t 116
I had to change it slightly to compile with gcc, results are still different:
#import <wctype.h>
#import <Foundation/Foundation.h>
int main(int argc, char *argv[]) {
[NSAutoreleasePool new];
int i=0;
NSString *l = [@"TöÖst" lowercaseString];
NSLog(@"Lowercase: %@", l);
NSLog(@"Lowercase: %s", [l UTF8String]);
for (i ; i<[l length] ; i++)
{
int c = [l characterAtIndex: i];
NSLog(@"%c %d", c,c);
}
}
On Linux:
sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase2
2012-08-01 13:16:59.692 lowercase2[22437] Lowercase: töÖst
2012-08-01 13:16:59.694 lowercase2[22437] Lowercase: töÃst
2012-08-01 13:16:59.694 lowercase2[22437] t 116
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 182
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 150
2012-08-01 13:16:59.694 lowercase2[22437] s 115
2012-08-01 13:16:59.694 lowercase2[22437] t 116
sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:17:12.791 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] t 116
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441] ¶ 182
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441] 150
2012-08-01 13:17:12.792 lowercase2[22441] s 115
2012-08-01 13:17:12.792 lowercase2[22441] t 116
On OpenBSD:
$ LC_CTYPE='en_EN.UTF-8' ./lowercase2
2012-08-01 13:18:25.497 lowercase2[5619] Lowercase: t��st
2012-08-01 13:18:25.502 lowercase2[5619] Lowercase: tööst
2012-08-01 13:18:25.502 lowercase2[5619] t 116
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] s 115
2012-08-01 13:18:25.502 lowercase2[5619] t 116
$ LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:18:32.743 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] t 116
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] s 115
2012-08-01 13:18:32.745 lowercase2[16814] t 116
>
> The error appears to be in converting the 16-bit unicode string that is the
> result of lowercaseString for display. Note the values that are being
> returned in characterAtIndex: - these are the correct unicode values, but
> attempting to display them is failing because the terminal is expecting
> UTF-8, not UCS16 (and 246 is not a valid 8-bit UTF-8 character). It seems
> that NSLog is just truncating the string, rather than translating it into the
> string locale that the terminal expects.
>
> David
>
> -- Sent from my STANTEC-ZEBRA
- Re: NSString lowercaseString, Sebastian Reitenbach, 2012/08/01
- Re: NSString lowercaseString, David Chisnall, 2012/08/01
- Re: NSString lowercaseString,
Sebastian Reitenbach <=
- Re: NSString lowercaseString, David Chisnall, 2012/08/01
- Re: NSString lowercaseString, Sebastian Reitenbach, 2012/08/01
- Re: NSString lowercaseString, Sebastian Reitenbach, 2012/08/01
- Re: NSString lowercaseString, Stefan Bidi, 2012/08/01
- Re: NSString lowercaseString, Richard Frith-Macdonald, 2012/08/02
- Re: NSString lowercaseString, Sebastian Reitenbach, 2012/08/03
- Re: NSString lowercaseString, Richard Frith-Macdonald, 2012/08/08
- Re: NSString lowercaseString, Sebastian Reitenbach, 2012/08/08