[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] man file character encoding.

From: Colin Watson
Subject: Re: [Groff] man file character encoding.
Date: Fri, 27 Sep 2013 10:30:19 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Sep 26, 2013 at 09:58:04PM +0200, Erwin Waterlander wrote:
> I'm curious to know how man-db determines the encoding of the man
> page. I cannot find that information. Would you like to explain how
> man-db does the encoding detecion?

Certainly.  man-db contains a table of the typical legacy encodings for
each of a number of known languages (I'm happy to add to those, but
since new translation efforts tend to start with UTF-8 these days, it's
a closed set and I haven't had to extend it since 2008 when I synced up
with Fedora).  There is generally only one of these.  UTF-8 is a strict
enough encoding that for reasonable volumes of text it is usually
possible to distinguish automatically between it and a legacy encoding,
simply by trying to decode as UTF-8 and falling back to the legacy
encoding if that fails.  manconv does this job; it is more or less like
iconv except that it can take a priority order of possible input

There are cases where this system fails, and in such cases you can store
manual pages in directories with an explicit encoding tag attached (e.g.
"/usr/share/man/man1/<ll>_<CC>.<encoding>"), or put an explicit
Emacs-style coding tag at the top of the file.  In practice this is
rarely necessary.

> The reason I work with Federico's man is that I often work on Cygwin
> when I don't have Linux at hand. Cygwin does not have man-db
> available. Soon I get a Russian translation of my program
> (dos2unix), that made this problem actual again for me. Three years
> ago I saw this problem coming. At that time I tested also on Fedora
> 12, which was still using Federico's man. I didn't notice that
> Fedora changed to man-db in the meantime.

Ah, yes.  I corresponded at one point with somebody who might be
interested in porting man-db to Cygwin, but it never came to anything.
I would be ecstatic if somebody could help with such a port, as I don't
use Windows myself.

I use Gnulib extensively, which deals with a lot of portability
problems, but not everything.  The main effort will be in porting
libpipeline to deal with Windows-style process creation and supervision;
after that I expect that it will just be a matter of various minor fixes
for Unix-specific assumptions I've made.  You don't have to come with a
complete patch; I'd be willing to accept incremental changes that make
the job easier for the next person, or even "this general pattern of
things you're doing is Unix-specific; you need to use this pattern
instead to be portable to Cygwin".


Colin Watson                                       address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]