octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [changeset] Asian Characters and strchr()


From: Ben Abbott
Subject: Re: [changeset] Asian Characters and strchr()
Date: Wed, 11 Mar 2009 20:06:46 +0800


On Mar 11, 2009, at 5:30 PM, Jaroslav Hajek wrote:

On Wed, Mar 11, 2009 at 9:14 AM, Ben Abbott <address@hidden> wrote:

On Mar 11, 2009, at 4:03 PM, Jaroslav Hajek wrote:

On Wed, Mar 11, 2009 at 8:53 AM, Ben Abbott <address@hidden> wrote:

On Mar 11, 2009, at 3:33 PM, Jaroslav Hajek wrote:

On Wed, Mar 11, 2009 at 8:24 AM, Ben Abbott <address@hidden> wrote:

I noticed that fileparts give an error when the full-file contains
asian
characters.

ctave:209> fileparts ("System/Library/Fonts/junk.ttf")
error: subscript indices must be either positive integers or logicals.
error: called from:
error:


/Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/ strings/strchr.m
at line 40, column 19
error:


/Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/ miscellaneous/fileparts.m
at line 30, column 10

It appears that there is a simple fix for strchr, but it will depend
upon
the ascii equivalent for Asian fonts.

I'm seeing negative values.

fullfile = "System/Library/Fonts/junk.ttf";
octave:211> double(fullfile)
ans =

Columns 1 through 16:

83 121 115 116 101 109 47 76 105 98 114 97
114   121    47    70

Columns 17 through 32:

111 110 116 115 47 -27 -115 -114 -26 -106 -121 -25
-69  -122   -23   -69

Columns 33 through 37:

-111    46   116   116   102

Can anyone tell me what the permissible range for integer values of
Asian
characters is?

I think a char->double conversion is supposed to yield nonnegative
values, so this seems buggy.

I'm planning to patch strchr, any reason I shouldn't do that?


I don't think there's a bug in strchr. This is clearly caused by the
negative values.

For Asian fonts the values are 16bit ... unsigned or signed I don't know.

No, they're not. See your own example. Octave has no support for UTF8
strings, so unless "char" is more than 8 bits, the result will be an
8-bit number. Thus, "strchr" won't search for Japanese characters (but
this does not mind here, since you need to find just an ascii
character).
Currently, the sign of char -> double is left up to the compiler,
which I don't think is good. I think we should guarantee that to be
positive, same what Matlab does. Shall I make a patch, or do you wish
to do it?

ok, I'm hadn't considered how many bits Octave was using for characters.

In any even, please to make a patch (I'm not competent enough in c+ + to do
it myself).

In the meantime, I'll avoid using fileparts and strchr when there may be
Asian characters present.

Ben


Fix is uploaded.

regards


It works as expected!

Thanks

Ben




reply via email to

[Prev in Thread] Current Thread [Next in Thread]