monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a w


From: Richard Levitte - VMS Whacker
Subject: [Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a weird taste...
Date: Wed, 18 Aug 2004 08:50:56 +0200 (CEST)

In message <address@hidden> on Tue, 17 Aug 2004 22:19:47 -0400, "graydon hoare" 
<address@hidden> said:

graydon> however, that was a conservative consensus, and it has
graydon> clearly failed in practise. so let's work out what it ought
graydon> to do, not worry too much about what the current (wrong) code
graydon> does.

Agreed.

graydon> > Then, looking at the function split_into_lines(), it's
graydon> > obvious that it hasn't been designed with CRLF in mind.
graydon> > For example, with the line "foo\r\n", it will end up with
graydon> > the tokens, "foo" and "", interpreted as two lines.  This
graydon> > is the actual cause of the trouble detected in bug 9752.
graydon> > Actually, the boost token functions aren't enough to be
graydon> > able to detect CRLF, LF and CR line endings, so something
graydon> > like boost::char_separator, but taking a vector of strings
graydon> > instead of a string of separators is needed.
graydon> 
graydon> good. or better yet, since this is a relatively
graydon> well-understood task, perhaps we can just code it as a manual
graydon> loop over the string.

Oh men, and here I was so proud of the boost::string_separator class I
created...  :-)

graydon> keep in mind that transforms.cc is some of the oldest code in
graydon> monotone,

Ah :-)

graydon> > This takes me back to get_linesep_conv.  Why, exactly,
graydon> > whould it be able to specify what the line separator should
graydon> > be in the database?  Why isn't get_system_linesep enough,
graydon> > especially if it can be given a file name as argument?
graydon> 
graydon> probably the decision was made as a mirror of the charset
graydon> conversion choice: to avoid forcing users to store their
graydon> files in monotone in a particular encoding, if the external
graydon> encoding will always differ.
graydon>
graydon> that's actually important with charsets, because charset
graydon> coding can be lossy: if I force everyone to use UTF-8
graydon> internally, they might actually lose data when going from
graydon> system -> monotone -> system. the same is not really true for
graydon> line separators though. or perhaps it is. I don't actually
graydon> know. anyone familiar with this want to shed some light?

The trouble comes when syncronising two databases for which the hooks
look different.  If the internal encoding of files differ, but still
looks the same for the users on each system, there will be a problem.

And in this discussion, we're completely forgetting about the
possibility to store binary files.  If those get converted for any
reason, they are trashed.

One way to solve this problem is not to bother with it at all, i.e.
view all files as binary for internal representation.  It looks to me
like this is already the case in parts of the code, for example in
calculate_ident(), but I haven't look too thoroughly in that code, so
I don't really know...  Anyhow, if we keep files in the database as
binary blobs and only bother with line separators and character
conversion for visual purposes (for example to display the diff
between files), we're quite safe.

I just realised that I haven't looked at all how files are actually
stored in the database.  Are they converted at all before being
stored, or am I following a red herring?

-----
Please consider sponsoring my work on free software.
See http://www.free.lp.se/sponsoring.html for details.

-- 
Richard Levitte                         address@hidden
                                        http://richard.levitte.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]