bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sharutils] unwieldy msgids, unnecessary reformatting


From: Bruce Korb
Subject: Re: [sharutils] unwieldy msgids, unnecessary reformatting
Date: Sun, 13 Jan 2013 14:07:15 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130105 Thunderbird/17.0.2

Hi Benno,

On 01/13/13 13:05, Benno Schulenberg wrote:
> 
> (I've reincluded the list in the CC.)

I omitted the entire list because I'm expecting a somewhat boring
discussion.  I'd be more interested in translator feedback, because
you-all are more directly affected.

> On Thu, Jan 10, 2013, at 20:47, Bruce Korb wrote:
>> With short fragments, it is easier to translate, but these short
>> fragments get woven into usage text in ways that are disparaged
>> by the docs I've read for i18n text.

The "short fragments" are the long option names and the short
(~40 character) description, e.g. here is the real source that
describes the "level-of-compression" option:

> flag = {
>     name        = level-of-compression;
>     value       = g;
>     arg-type    = number;
>     arg-name    = LEVEL;
>     arg-range   = '1->9';
>     arg-default = 9;
>     descrip     = 'pass @file{LEVEL} for compression';
>     doc = <<- _EODoc_
>       Some compression programs allow for a level of compression.  The
>       default is @code{9}, but this option allows you to specify something
>       else.  This value is used by @command{gzip}, @command{bzip2} and
>       @command{xz}, but not @command{compress}.
>       _EODoc_;
> };

the option line in long usage appears as:
   -g, --level-of-compression=num pass LEVEL for compression
That string does not appear anywhere in the source.
It gets pulled together and formatted from the "g", "level-of-compression",
"number", "1->9" and "pass @file{LEVEL} for compression" strings.
So in order to make something that is translatable, I create
a program, emit the help, capture that help text and
put it into the final program.  Those strings plus the "doc" string
show up in man pages and texi docs.

>> Specifically, little bits
>> of the usage are emitted with the expectation that a consistent
>> amount of horizontal space is used.  That works IFF the source
>> language is the display language.
> 
> When a certain indentation needs to be maintained, this is the
> responsibility of the translator.  Half of the time I use a slightly
> different indentation than the original, and use it consistently.
>
>> My solution for this woven text problem is to build a version
>> without a combined usage text, print the help with bit-at-a-time
>> text and suck that output into a combined usage string and
>> rebuild.  In the rebuild, the short strings will never be
>> used.
> 
> In the source code I see for example:
> 
>     static char const shar_opt_strs[10449] =" [[[enormous string]]] ";

That is intermediary source.  I obviously do not hand edit a 10K string.

> In my opinion this is madness...  If you want to add a space
> or a word somewhere, you have to figure out and change fifty
> indexes by hand... !

At the top of that file, you will see:
> /*   -*- buffer-read-only: t -*- vi: set ro:
>  *  
>  *  DO NOT EDIT THIS FILE   (shar-opts.c)
>  *  
>  *  It has been AutoGen-ed  January 11, 2013 at 11:39:24 AM by AutoGen 
> 5.17.2pre7

so if you want to add a space, do not do it in that file.

> Is it AutoOpts that requires that the help text be provided as a
> single huge character array?

AutoOpts only requires the strings associated with each option and
the program as a whole.  On the theory that gluing all these strings
together would be untranslatable and/or sometimes not yield an
aesthetically pleasing help string, I provided a way of overriding
the computation of the usage text by providing _as an alternative_
the entire usage text as a single string.  What I am proposing here
is emitting this long usage a paragraph at a time.

>> Since there is only one source for both texts, getting this to
>> work depends upon coming up with a paragraph splitting algorithm
>> that would split out an exactly matching paragraph.  A desirable
>> goal, but might not be easy to do.  Please suggest an algorithm
>> while I try to puzzle one out, too.  (e.g. separate on every
>> double newline and every line starting with white space.
>> Does that yield something more usable?)
> 
> _If the help text needs to be a single huge string, why not "add"
> (concatenate) many small gettexttized strings?

The pieces of the help text are derived from too many sources.
Gluing together little strings is strongly discouraged for
translatable text.  Therefore, I am suggesting the splitting up
of the monster string according to a well defined algorithm.
viz. start  a new "paragraph" whenever a non-empty line is
preceded by two line breaks or a non-empty line starts with
a few space characters.  I *think* that yields something wieldy.
I could also split them one string per line.  That is likely
somewhat easier for me, but seems like it would make the
translation task a bit harder.  e.g. there would be no guarantee
that every line would be unique and the same line of text might
translate differently in different contexts.  I do think splitting
on "paragraphs" would make the translation effort easier, but I
would take whatever suggestion you make.

>     hugestring = _("shar (GNU sharutils) 4.13.3\n")
>     + _("Copyright (C) 1994-2013 Free Software Foundation, Inc., all rights 
> reserved.\n")
>     + _("This is free software. It is licensed for use, modification and\n")
>     + ...
>
> I have no idea how to actually do this, but it should be possible,
> and then let the program itself work out what the lengths of all
> these substrings are (_if you actually need the indexes).

The indexes are a relatively unimportant implementation detail.
In order to produce libraries that minimize the number of fixups
required at load time, I produced some functions that assemble
massive text strings and #define-d values that reference that huge
table.  I could also make for static global strings that go by the
name used in the #define.  That eliminates all the offset stuff,
but then the link/loader has more fixup work to do.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]