[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying
From: |
John Kearney |
Subject: |
Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying to do? |
Date: |
Sun, 19 Feb 2012 23:07:45 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120129 Thunderbird/10.0 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Can somebody explain to me what u32tochar is trying to do?
It seems like dangerous code?
from the context i'm guessing it trying to make a hail mary pass at
converting utf-32 to mb (not utf-8 mb)
int
u32tochar (x, s)
unsigned long c;
char *s;
{
int l;
l = (x <= UCHAR_MAX) ? 1 : ((x <= USHORT_MAX) ? 2 : 4);
if (x <= UCHAR_MAX)
s[0] = x & 0xFF;
else if (x <= USHORT_MAX) /* assume unsigned short = 16 bits */
{
s[0] = (x >> 8) & 0xFF;
s[1] = x & 0xFF;
}
else
{
s[0] = (x >> 24) & 0xFF;
s[1] = (x >> 16) & 0xFF;
s[2] = (x >> 8) & 0xFF;
s[3] = x & 0xFF;
}
/* s[l] = '\0'; Overwrite Buffer?*/
return l;
}
Couple problems with that though
firstly utf-32 doesn't map directly to non utf mb locals. So you need
a translation mechanism.
Secondly Normal CJK system are state based systems so mutibyte
sequences need to be escaped. Extended Unix Code would need encoding
somewhat like utf-8, in fact any variable multi byte encoding system
is going to need some context to recover the info this is unparsable
behavior,
what it is actually doing is taking utf-32 and depending on the size
encoding it as UTF-32 Big Endian , UTF-16 Big Endian, UTF-8, or
American EAscii codepage(values between 0x80 - 0xff). Choosing one of
these is however Dependant on LC_CTYPE not some arbitrary check.
So this function just seems plain crazy?
I think that all it can safely do is this.
int
utf32tomb (x, s)
unsigned long c;
char *s;
{
if (x <= 0x7f ) /* x>=0x80 = locale specific */
{
s[0] = x & 0xFF;
return 1;
}
else
return 0
}
regarding naming convention u32 = unsigned 32 bit
might be a good idea to rename all the utf32 functions to utf32, would
I think save a lot of confusion in the code as to what is going on.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPQXKxAAoJEKUDtR0WmS054sgH/R+qWtds9MMeN/y4n98wk83l
MAOVBXAn+m8IUf31VtSZ7nqEccJHDPDRMkg21sYNlozsxPVwCYOGZd7LL8wxlwEl
70mRu9cAQOXIAeF9b8ao0/nz6e6nC6FTk03FDhDo+V8RWt9MiQHF4YWRCCmSdmQv
GDM88XyXuQZaBwIHrXeCXRvuXTN8K5BrdbVFJ7OHRUytKNE6OccUDz/iaPCoPy5f
SehHTLJ6AqpYy7NgapyALTvo3/FlVUDc7vtYbCDF5Q0EMIlvjgEQ9Y7vJlKtuAop
9Up32sQSy8red6frOgZmvA5GLeD7Lp/gvfp/U5fQWIZTKKLgBee2mYVqPlLOKw4=
=nHdc
-----END PGP SIGNATURE-----
- Can somebody explain to me what u32tochar in /lib/sh/unicode.c is trying to do?,
John Kearney <=