bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

quoting in gnulib


From: Karl Berry
Subject: quoting in gnulib
Date: Sun, 10 Jul 2005 20:23:28 -0400

rms said ok (a while ago, sorry) to our text about quote characters,
which I'll append below for one final check.  But before I check it in,
we should add the promised documentation to Gnulib about the quote and
quotearg modules.

So I wrote the section below, basically just formatting a message Paul
sent to the list back on May 29.  Seems fine as far as it goes, but we
need to say something about locales.  Is it true that if some
environment variable (LANG?) is set to something (e.g., fr_FR?) that a
"translation" of ` and ' will be used?  Looking at quotearg.h, it seems
so.  Do any translations in fact do this?

Thanks,
k

--
(gnulib.texi text)

@node Quoting
@section Quoting

@cindex Quoting
@findex quote
@findex quotearg

Gnulib provides @samp{quote} and @samp{quotearg} modules to help with
quoting text, such as file names, in messages to the user.  Here's an
example of using @samp{quote}:

@example
#include <quote.h>
 ...
  error (0, errno, _("cannot change owner of %s"), quote (fname));
@end example

This differs from

@example
  error (0, errno, _("cannot change owner of `%s'"), fname);
@end example

@noindent in that @code{quote} escapes unusual characters in
@code{fname}, e.g., @samp{'} and control characters like @samp{\n}.

@findex quote_n
However, a caveat: @code{quote} reuses the storage that it returns.
Hence if you need more than one thing quoted at the same time, you
need to use @code{quote_n}.

@findex quotearg_alloc
Also, the quote module is not suited for multithreaded applications.
In that case, you have to use @code{quotearg_alloc}, defined in the
@samp{quotearg} module, which is decidedly less convenient.

--
(standards.texi text)

@node Character set
@section Character set
@cindex character set
@cindex encodings
@cindex ASCII characters
@cindex non-ASCII characters

Sticking to the ASCII character set (plain text, 7-bit characters) is
preferred in GNU source code comments, text documents, and other
contexts, unless there is good reason to do something else because of
the application domain.  For example, if source code deals with the
French Revolutionary calendar, it is OK if its literal strings contain
accented characters in month names like ``Flor@'eal''.  Also, it is OK
to use non-ASCII characters to represent proper names of contributors in
change logs (@pxref{Change Logs}).

If you need to use non-ASCII characters, you should normally stick with
one encoding, as one cannot in general mix encodings reliably.


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for quotation
characters in messages to users: preferably 0x60 (@samp{`}) for left
quotes and 0x27 (@samp{'}) for right quotes.  It is ok, but not
required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and
@code{quotearg} modules provide a reasonably straightforward way to
support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of @samp{`}
and @samp{'}.  This is especially important if the output of your
program is ever likely to be parsed by another program.

Quotation characters are a difficult area in the computing world at this
time: there are no true left or right quote characters in ASCII, or even
Latin1; the @samp{`} character we use was standardized as a grave
accent.  Moreover, Latin1 is still not universally usable.

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with address@hidden  However,
Unicode and UTF-8 are not universally well-supported, either. 

This may change over the next few years, and then we will revisit
this.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]