discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSString lowercaseString


From: Sebastian Reitenbach
Subject: Re: NSString lowercaseString
Date: Wed, 01 Aug 2012 13:20:09 +0200
User-agent: SOGoMail 1.3.17

On Wednesday, August 1, 2012 11:49 CEST, David Chisnall <theraven@sucs.org> 
wrote:

> On 1 Aug 2012, at 09:50, Sebastian Reitenbach wrote:
>
> > I "enhanced" my test program a bit, and compared output when running on 
> > Linux and OpenBSD:
> >
> > #import <Foundation/Foundation.h>
> >
> >
> > int main(int argc, char *argv[]) {
> > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"] 
> > lowercaseString]);
> >
> > }
>
> On closer inspection, there is a bug here, but not where you think it is...
>
> Try this test case:
>
> $ cat tolower.m #import <Foundation/Foundation.h>
> #import <wctype.h>
>
>
> int main(int argc, char *argv[]) {
>       [NSAutoreleasePool new];
>       NSString *l = [@"TöÖst" lowercaseString];
>       NSLog(@"Lowercase: %@", l);
>       NSLog(@"Lowercase: %s", [l UTF8String]);
>       for (int i=0 ; i<[l length] ; i++)
>       {
>               int c = [l characterAtIndex: i];
>               NSLog(@"%c %d", c,c);
>       }
> }
> $ clang tolower.m  -lgnustep-base
> $ ./a.out
> 2012-07-31 19:23:44.810 a.out[69751] Lowercase: t??st
> 2012-07-31 19:23:44.813 a.out[69751] Lowercase: tööst
> 2012-07-31 19:23:44.813 a.out[69751] t 116
> 2012-07-31 19:23:44.813 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] s 115
> 2012-07-31 19:23:44.814 a.out[69751] t 116

I had to change it slightly to compile with gcc, results are still different:

#import <wctype.h>
#import <Foundation/Foundation.h>


int main(int argc, char *argv[]) {
        [NSAutoreleasePool new];
        int i=0;
        NSString *l = [@"TöÖst" lowercaseString];
        NSLog(@"Lowercase: %@", l);
        NSLog(@"Lowercase: %s", [l UTF8String]);
        for (i ; i<[l length] ; i++)
        {
                int c = [l characterAtIndex: i];
                NSLog(@"%c %d", c,c);
        }
}

On Linux:
sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase2
2012-08-01 13:16:59.692 lowercase2[22437] Lowercase: töÖst
2012-08-01 13:16:59.694 lowercase2[22437] Lowercase: töÃst
2012-08-01 13:16:59.694 lowercase2[22437] t 116
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 182
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 150
2012-08-01 13:16:59.694 lowercase2[22437] s 115
2012-08-01 13:16:59.694 lowercase2[22437] t 116
sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:17:12.791 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] t 116
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441] ¶ 182
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441]  150
2012-08-01 13:17:12.792 lowercase2[22441] s 115
2012-08-01 13:17:12.792 lowercase2[22441] t 116
On OpenBSD:
$ LC_CTYPE='en_EN.UTF-8' ./lowercase2
2012-08-01 13:18:25.497 lowercase2[5619] Lowercase: t��st
2012-08-01 13:18:25.502 lowercase2[5619] Lowercase: tööst
2012-08-01 13:18:25.502 lowercase2[5619] t 116
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] s 115
2012-08-01 13:18:25.502 lowercase2[5619] t 116
$ LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:18:32.743 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] t 116
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] s 115
2012-08-01 13:18:32.745 lowercase2[16814] t 116


>
> The error appears to be in converting the 16-bit unicode string that is the 
> result of lowercaseString for display.  Note the values that are being 
> returned in characterAtIndex: - these are the correct unicode values, but 
> attempting to display them  is failing because the terminal is expecting 
> UTF-8, not UCS16 (and 246 is not a valid 8-bit UTF-8 character).  It seems 
> that NSLog is just truncating the string, rather than translating it into the 
> string locale that the terminal expects.
>
> David
>
> -- Sent from my STANTEC-ZEBRA






reply via email to

[Prev in Thread] Current Thread [Next in Thread]