help-source-highlight
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-source-highlight] Unicode files ?


From: Dario Teixeira
Subject: Re: [Help-source-highlight] Unicode files ?
Date: Fri, 2 Apr 2010 07:03:04 -0700 (PDT)

Hi,

> the html might bring also bad encoding in the head, but I
> guess it is also due to the fact that source-highlight reads
> two bytes, which in unicode represent a single character,
> and interprets them as two characters instead of one. 
> This is unicode, am I right?  Sorry for my ignorance,
> but with unicode in a text file every character is
> represented by two bytes, right?

Nope. There is not one standard Unicode encoding, but several.  The most
common one is UTF-8, which is a variable length encoding where each Unicode
character can take from 1 to 4 bytes (originally it was up to 6, but that's
deprecated now).  Another variable-length encoding is UTF-16, where each
character can occupy between 2 and 4 bytes.  The only fixed-length encoding
is UTF-32 (UCS-4), where each character requires 4 bytes.
 
> I'd like to try with wstring and see whether this solves
> something.

I haven't used C++ in a long time, but isn't wstring based on wchar_t,
which is 2 bytes long?   If so, it won't solve anything.  There is no
Unicode encoding that uses a fixed-length of 2 bytes!

Lorenzo, I think we can give you a hand in implementing this.  However,
if you read through this entire thread you will notice that the best
course of action is dependent on a crucial piece of information which
you are the most qualified person to provide: we need a list of the
manipulations that Source-highlight applies to strings.

Hope that helps!
Best regards,
Dario Teixeira








reply via email to

[Prev in Thread] Current Thread [Next in Thread]