[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#11082: 24.0.94; u.glyphless member in struct glyph does not fit in 3
From: |
Eli Zaretskii |
Subject: |
bug#11082: 24.0.94; u.glyphless member in struct glyph does not fit in 32 bits |
Date: |
Sat, 24 Mar 2012 09:01:04 +0200 |
> Date: Sat, 24 Mar 2012 14:23:28 +0900
> From: YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>
>
> In dispextern.h:
>
> 316 struct glyph
> 317 {
> (snip)
> 418 /* A union of sub-structures for different glyph types. */
> 419 union
> 420 {
> (snip)
> 447 /* Sub-stretch for type == GLYPHLESS_GLYPH. */
> 448 struct
> 449 {
> 450 /* Value is an enum of the type glyphless_display_method.
> */
> 451 unsigned method : 2;
> 452 /* 1 iff this glyph is for a character of no font. */
> 453 unsigned for_no_font : 1;
> 454 /* Length of acronym or hexadecimal code string (at most
> 8). */
> 455 unsigned len : 4;
> 456 /* Character to display. Actually we need only 22 bits.
> */
> 457 unsigned ch : 26;
> 458 } glyphless;
> 459
> 460 /* Used to compare all bit-fields above in one step. */
> 461 unsigned val;
> 462 } u;
> 463 };
>
> The member `u.glyphless' above requires at least 33 bits and does not
> fit in the size (32 bits) of `u.val' on many environments. As a
> result, equality with respect to the `u.val' member (e.g., used in
> GLYPH_EQUAL_P) does not necessarily mean the equality of glyphless
> glyphs.
?? Isn't the size of a union defined by its widest member? If so, we
just end up wasting some storage here, but we should never truncate a
bit field. Do you have an actual test case that shows such kind of a
bug?
> According to the comment above, it seems to be OK to shorten the
> length of `u.glyphless.ch' member from 26 to 25. Could someone
> confirm this?
Confirmed. From the ELisp manual:
To support this multitude of characters and scripts, Emacs closely
follows the "Unicode Standard". The Unicode Standard assigns a unique
number, called a "codepoint", to each and every character. The range
of codepoints defined by Unicode, or the Unicode "codespace", is
`0..#x10FFFF' (in hexadecimal notation), inclusive. Emacs extends this
range with codepoints in the range `#x110000..#x3FFFFF', which it uses
for representing characters that are not unified with Unicode and "raw
8-bit bytes" that cannot be interpreted as characters. Thus, a
character codepoint in Emacs is a 22-bit integer number.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I would actually suggest to use 22-bit for this field, to avoid
confusion in the future.