bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: quote characters in stds


From: James Youngman
Subject: Re: quote characters in stds
Date: Tue, 7 Jun 2005 14:36:54 +0100
User-agent: Mutt/1.3.28i

Karl writes:

> Unicode contains the unambiguous quote characters required, and its
> common encoding address@hidden is upward compatible with address@hidden  

It might be worth pointing out that all valid ASCII files are valid
UTF-8 files, but not all valid Latin-1 files are valid UTF-8 files.

Specifically, there are characters in Latin-1 that are used in Unicode
as leading bytes of multibyte characters (for example 0xE8, which is
an e with a grave accent).  Unicode is a superset of Latin-1, but that
doesn't mean that you can load a Latin-1 file as if it was UTF-8.

It might be worth considering this wording change...

> Unicode contains the unambiguous quote characters required, and its
> common encoding address@hidden is upward compatible with address@hidden
> However, you can't process a Latin-1 encoded file as if it were
> address@hidden, because some Latin-1 character codes are used to begin
> multibyte character sequences in address@hidden

... though this is sort of drifting away from the main point of a
section on quote characters and into guidance on handling character
encoding systems.

James.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]