groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] man file character encoding.


From: Colin Watson
Subject: Re: [Groff] man file character encoding.
Date: Fri, 27 Sep 2013 10:30:19 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Sep 26, 2013 at 09:58:04PM +0200, Erwin Waterlander wrote:
> I'm curious to know how man-db determines the encoding of the man
> page. I cannot find that information. Would you like to explain how
> man-db does the encoding detecion?

Certainly.  man-db contains a table of the typical legacy encodings for
each of a number of known languages (I'm happy to add to those, but
since new translation efforts tend to start with UTF-8 these days, it's
a closed set and I haven't had to extend it since 2008 when I synced up
with Fedora).  There is generally only one of these.  UTF-8 is a strict
enough encoding that for reasonable volumes of text it is usually
possible to distinguish automatically between it and a legacy encoding,
simply by trying to decode as UTF-8 and falling back to the legacy
encoding if that fails.  manconv does this job; it is more or less like
iconv except that it can take a priority order of possible input
encodings.

There are cases where this system fails, and in such cases you can store
manual pages in directories with an explicit encoding tag attached (e.g.
"/usr/share/man/man1/<ll>_<CC>.<encoding>"), or put an explicit
Emacs-style coding tag at the top of the file.  In practice this is
rarely necessary.

> The reason I work with Federico's man is that I often work on Cygwin
> when I don't have Linux at hand. Cygwin does not have man-db
> available. Soon I get a Russian translation of my program
> (dos2unix), that made this problem actual again for me. Three years
> ago I saw this problem coming. At that time I tested also on Fedora
> 12, which was still using Federico's man. I didn't notice that
> Fedora changed to man-db in the meantime.

Ah, yes.  I corresponded at one point with somebody who might be
interested in porting man-db to Cygwin, but it never came to anything.
I would be ecstatic if somebody could help with such a port, as I don't
use Windows myself.

I use Gnulib extensively, which deals with a lot of portability
problems, but not everything.  The main effort will be in porting
libpipeline to deal with Windows-style process creation and supervision;
after that I expect that it will just be a matter of various minor fixes
for Unix-specific assumptions I've made.  You don't have to come with a
complete patch; I'd be willing to accept incremental changes that make
the job easier for the next person, or even "this general pattern of
things you're doing is Unix-specific; you need to use this pattern
instead to be portable to Cygwin".

Cheers,

-- 
Colin Watson                                       address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]