monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] A(nother) concrete proposal for line endings


From: Larry Hastings
Subject: [Monotone-devel] A(nother) concrete proposal for line endings
Date: Sun, 10 Dec 2006 14:10:57 -0800
User-agent: Thunderbird 1.5.0.8 (Windows/20061025)



In yet another attempt to shed light on the topic's dark corners, here is a suggestion on yet another way to handle line ending conversion.  Note that, with my usual bad form, I am not volunteering to undertake this; I'm suggesting all this merely as a means of helping nudge monotone along, to help the final solution emerge.  (And yes, in case it's not already clear, I'm definitely in the "mtn still needs to do EOL conversion" camp.)

It occurs to me that local eol convention transformation is similar to another oft-requested feature: RCS-style keyword substitution.  Both involve "transform the contents of a file from an internal canonical form from/to a local form on disk".  And there's a third kind of transformation, which Nathaniel tells me monotone theoretically already supports: character encoding transformation, where you store files in UTF-8 and convert to/from some other encoding on checkin/checkout.

Now that we have two, if not three, I cite the adage "a computer scientist solves for exactly three classes of problem: 0, 1, and infinity", I suggest monotone might enjoy a generalized mechanism for such transformations.  I'm not a monotone hacker, so I don't know what form it should take, though I'll go ahead and suggest one anyway in an attempt to get the conversation rolling.

These transformations are not so common that new ones will come along every day, and we want it to be fast, so hard-coded-in-C++ is better than implement-with-Lua-hooks.  (Why does it need to be fast?  Because all monotone operations that examine local files, like "diff" and "status", will need to convert said files back to canonical form before examining the file.)  Each transformation should have an associated file attribute with a well-defined default, and if desired be configurable per-user with a well-defined default.

And in response to Nathaniel's talk of pre- vs post-commit transformations, I assert that many of these transformations can fail, and if the transformation fails the checkin must fail too, so I further assert that most of the transformations must be pre-commit.

As an optional final measure, any conversion that can fail should also provide a backup transformation that never fails, so that mtn can ask the user "Conversion <x> failed on file <y>: <context-specific failure message>.  What do you want to do? [A]bort [R]etry [F]orce conversion".  That way, if a user accidentally adds naked lfs to a file that is supposed to be crlf (by, say, editing a file on Linux mounted from a coworker's Windows box),

EOL conversion
    File attribute: mtn:eol
        Default value: verbatim
        Valid values:
            verbatim - store unchanged, don't care
            local - store as crs, convert to/from local eol convention, fail on checkin if file has anything besides the local cr convention
            cr - file must always have crs, "transformation" only means "fail on checkin if has anything besides crs"
            lf - file must always have lfs, "transformation" only means "fail on checkin if has anything besides lfs"
            crlf - file must always have crlfs, "transformation" only means "fail on checkin if has anything besides crlfs"
    Per-user setting: sets the "local" cr convention
        Default value: (established by the local eol convention for the platform)
        Valid values: verbatim cr lf crlf
    Note that if either the attribute or the per-user setting is "verbatim", the file contents are unchanged.
    Transformation must be done pre-commit, because it can fail.
    If transformation fails, fallback is to forcefully convert all cr/lf/crlfs to the desired eol.

RCS keyword expansion
    File attribute: mtn:rcs-keywords
       Default value: verbatim
       Valid values:
          verbatim - no change
          rcs - expand RCS keywords on checkout, contract RCS keywords on checkin
    Per-user setting: do you really want RCS keyword expansion?
       Default value: rcs
       Valid values: verbatim rcs
    Note that if either the attribute or the per-user setting is "verbatim", the file contents are unchanged.
    Conversion never fails, so it could theoretically be pre-commit or post-commit.

Character set transformation
    File attribute: mtn:charset
       Default value: verbatim
       Valid values:
          verbatim - no change
          utf-8 - store internally as UTF-8, convert to local character set on checkin
          ascii - enforces 7 bit characters!
          (I dunno what else goes here)
    Per-user setting: the local character set
       Valid values:
          verbatim
          utf-8
          utf-16
          ascii
          (I dunno what else goes here)
    Note that if either the attribute or the per-user setting is "verbatim", the file contents are unchanged.
    Transformation must be done pre-commit, because it can fail.
    If transformation fails, fallback is to throw away all untransformable characters.


Furthermore, monotone clearly wants a way of establishing "default" attributes for files.  I assert that for now monotone wants a database of file extensions -> default attributes, including establishing mime-types for files.  (Yep, I've been won over by the "files should have associated mime-types" camp.)  I suggest that this should be per-database, but easily extracted to a simple text format, swapped/edited, and re-introduced.  It should take the simplest form possible.  I don't know how it should be stored in the database, but its simple text format should be "one line listing one or more extensions (separated by whitespace), followed by n name/value attribute pairs, separated by blank lines, # starts a line comment".   Like so:
    # comment
    extension 1 [ extension 1b [ extension 1c [ ... ] ] ]
    default attribute 1
    default attribute 2
    ...
    default attribute n

    extension 2 [ extension 2b [ extension 2c [ ... ] ] ]
    default attribute 1
    default attribute 2
    ...
    default attribute n

A specific example:
    # C++ source files
    .cpp .cxx .c++ .cc
    mtn:mime-type = text/x-c++src
    mtn:eol = local
    mtn:rcs-keywords = rcs

    # Photoshop documents
    .psd
    mtn:mime-type = application/photoshop
    mtn:binary = true
(It's fine if mtn doesn't reconstitute comments when you extract its list of mime-types.)

In the bright, glorious, Jetsons-like future of monotone "policy", the mechanism for setting default attributes for files is something that should be done by the policy machinery.  None of us have an inkling what that will be like, so it's not really an interesting topic yet.


How's that?


larry

reply via email to

[Prev in Thread] Current Thread [Next in Thread]