bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: linebreak.h, linebreak.c


From: Paul Eggert
Subject: Re: [Bug-gnulib] addition: linebreak.h, linebreak.c
Date: 07 Apr 2003 13:22:26 -0700
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Bruno Haible <address@hidden> writes:

> I generally assume that 'unsigned int' serves the same purpose as
> 'uint32_t'. Do you know a platform where 'unsigned int' isn't usable?

That depends on what you mean by "usable", but on some larger
word-oriented systems 'unsigned int' is 64 bits, so one cannot assume
that it stores only 32-bit values.  This is common on Crays.  For
example, in <http://www.globus.org/data_conversion/function_reference.html>
the GLOBUS_DC_FORMAT_CRAYT3E entry shows one environment where one
should use unsigned short int (and not unsigned int) to represent long
strings of 32-bit quantities.

> I don't want to change the header file (-> and have everyone
> recompile its code) once a new encoding has to be added.

I don't see why recompilation would be needed, if the encoding is
represented as (say) an integer.  Yes, the function to convert an
encoding from string to integer would need to be updated and
recompiled, but it's already the case that the library needs to be
updated and recompiled whenever you add an encoding.  The client code
wouldn't need to be changed or recompiled simply because an encoding
got added.


> > I was confused by the prefixes u8, u16, and u32.  At first I thought
> > they meant "unsigned integer of width 8 bits", etc.
> > How about changing the prefixes to utf8, utf16, and ucs4, respectively?
> 
> It'd possible, but what's the gain?

To avoid confusion with prefixes that are already widely used with the
meaning that I intuitively ascribed to them.  For example:

http://oprofile.sourceforge.net/srcdoc/op__types_8h.html
http://osdev.berlios.de/osd-fs.html
http://download.baltimore.com/keytools/docs/v50/crypto/c-docs/html/uttypes.h.html

These are just the first few Google hits that I found.  I didn't find
any interpretations of those abbreviations other than the usual ones.


> I do assume that 'unsigned char' has at least 8 bits, 'unsigned short'
> has at least 16 bits, and 'unsigned int' has at least 32 bits.

Those are valid assumptions, but I was concerned about the opposite
direction, e.g. the Cray T3E where 'unsigned short' is the only 32-bit
unsigned integer type.  On such a host, one should use 'unsigned
short' to represent long strings of 32-bit quantities.

Cray tends to be the most extreme porting case here, among actively
used hosts.  Different Cray hardware implementations have different
assumptions, and one can find currently-used Cray hosts where unsigned
short is 16, 32, and 64 bits.  For example, the newest model, the Cray
X1, does have a 16-bit short, but it is quite slow, so the usual
typedefs for it are "typedef short int_least16_t;" and "typedef int
int_fast16_t;".

I regularly get bug reports from Cray users of the GNU applications
that I help maintain, and I'd rather have library code that works on
Crays.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]