Re: 8-bit char problem

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 8-bit char problem

From:	John W. Eaton
Subject:	Re: 8-bit char problem
Date:	Fri, 11 Oct 2002 11:37:16 -0500

On 11-Oct-2002, Paul Kienzle <address@hidden> wrote:

| Mine too :-(  From the toascii man page, I see that it is supposed to strip
| the top bit.

Yes, I just saw that too.  So I don't think we should change the
behavior of toascii.  But I've just made some changes that make Octave
behave as follows:

  octave:1> toascii (setstr ([-100, 100, 200, 300]))
  warning: range error for conversion to character value
  ans =

      0  100   72    0

  octave:2> abs (setstr ([-100, 100, 200, 300]))
  warning: range error for conversion to character value
  ans =

      0  100  200    0

(AFAIK, abs was originally the recommended way to convert a string to
ASCII in Matlab; maybe now they say to use double).

I'm open to suggestions for better things to do with out of range
values other than converting to zero, but I'm afraid that anything
else will not be easy.

Matlab doesn't have this "problem" because character matrices are
stored as arrays of double values with a special flag set, so setstr
simply sets the flag and abs unsets it, which would preserve out of
range values, except that it also checks for negative values and
converts those to zero (with a warning).  I'm not sure why they don't
trap large values, since they don't seem too useful in strings.  For
example, I see this weird behavior:

  >> fprintf ('%s\n', setstr (100))
  d
  >> fprintf ('%s\n', setstr (400))
  4.000000e+02

Does this make any senes?

| Perhaps I will write a function double() to convert from characters to
| unsigned(?) numbers, but I don't need it for now.

Yes, we should probably have a double function for compatibility.

| Yes it will be painful because the system string functions and the
| consumers of charMatrix use char, and we don't want to replace them all, so
| casts will be sprinkled everywhere.

Maybe there aren't that many places where it really matters.

| I wonder how much more work it would be to support unicode?

I have no idea since I'm not really up to date on things like this.
But clues from others would be helpful.

Thanks,

jwe

[Prev in Thread]

Current Thread

[Next in Thread]

8-bit char problem, Paul Kienzle, 2002/10/10
- 8-bit char problem, John W. Eaton, 2002/10/10
  - Re: 8-bit char problem, Paul Kienzle, 2002/10/11
    - Re: 8-bit char problem, John W. Eaton <=
    - Re: 8-bit char problem, Paul Kienzle, 2002/10/11
    - Re: 8-bit char problem, John W. Eaton, 2002/10/11

Prev by Date: Re: 8-bit char problem
Next by Date: Re: 8-bit char problem
Previous by thread: Re: 8-bit char problem
Next by thread: Re: 8-bit char problem
Index(es):
- Date
- Thread