Ulf Ochsenfahrt wrote:
Yes, but
UTF-8 is a _multi-byte_ encoding.
If you see an LF byte, you don't know whether this is a single-byte LF
or part of a multi-byte sequence.
Yes you do, because all multi-byte character sequences in UTF-8 have
the high-bit set. If you see 0x0A in a UTF-8 stream you can be certain
it is an LF and not part of a multi-byte sequence.
http://en.wikipedia.org/wiki/Utf-8#Description
Brian May wrote:
"Daniel" == Daniel Lakeland
<address@hidden> writes:
Daniel> Consider languages like Python that have the ability to
Daniel> create multiline strings, now the \r or \n characters are
Daniel> part of the string. Converting them changes the behavior
Daniel> and meaning of the program. This is very tricky.
Any code that relies on this behaviour is very dodgy IMHO.
Well I'd certainly agree it isn't platform-independent code. But where
is it written that monotone should not support checking in "dodgy" code?
larry
|