[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Linux collation problems
From: |
Valeriy E. Ushakov |
Subject: |
Linux collation problems |
Date: |
Thu, 9 Dec 1999 07:34:22 +0300 |
Damn, it seems that collation rules, as shipped with linux (with
glibc, I guess) simply ignore the TAB, so when the index is written
out the order is (example from slides):
12&1377.all.10\t0\t00100
12&1377.all.1\t0\t00014
12&1377.all.11\t0\t00110
but when lout looks for an entry it will only use the key (before the
first tab), so the order is:
12&1377.all.1
12&1377.all.10
12&1377.all.11
Originally, lout simply used strcmp. I added a simple fix to use
strcoll, but it relies on TAB(a field separator in .li files) being
collated before any other character that can be part of a tag, because
lout sorts whole lines in .li.
But even if this particular problem is fixed (and I *did* quickly
brute-forced a fix to verify this) and only relevant keys are compared
(first and second fields in .li files) there's still a problem with
linux collation order. The period seems to be also ignored for the
purpose of collation. But period is used heavily by lout docs for
index entry keys. So consider index keys:
aligned.columns
aligned.displays
aligned.equations
aligneddisplay.
which under en_US collation order will be sorted as:
aligned.columns
aligneddisplay.
aligned.displays
aligned.equations
with corresponding disastrous results fot index entries. But
(surprise, surprise) under en_UK they will be sorted ok.
Grr, it sucks.
Sure, I have a reputation of a "BSD bastard" to maintain :-), but,
partisanship aside, I do find these collation ordering very
unintuitive to put it mildly. Bear in mind, that sort(1) abides to
LC_COLLATE, so under, say en_US, it will happily produce:
Robertson, Peter
Roberts, Oscar
which everyone with a recent glibc is invited to verify.
SY, Uwe
--
address@hidden | Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen