bug-cssc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-cssc] bug-CSSC post from address@hidden requires approval


From: James Youngman
Subject: Re: [Bug-cssc] bug-CSSC post from address@hidden requires approval
Date: Sun, 8 May 2011 21:27:10 +0100

On Sun, May 8, 2011 at 5:55 PM, Joerg Schilling
<address@hidden> wrote:
> James Youngman <address@hidden> wrote:
>
>> On Thu, May 5, 2011 at 12:07 PM, Joerg Schilling
>> <address@hidden> wrote:
>> > Also note that SCCS is expected to dump core in case you let it read a 
>> > history
>> > file that contains a flag from outside the range 'a'..'z'.
>>
>> Expected?   I'm sure nobody has ever stated dumping core as a requirement.
>
> Maybe you missinterpret the word expected here.

Looks like it.

> People who know the SCCS source
> (and as this source was made available for anyone aound 1994 with CSRG,
> everybody could check) expect SCCS to dump core in case flahs outside 'a'..'z'
> are used.
>
>> > Introducing _new_
>> > flags from outside the range 'a'..'z' thus only makes sense if the history
>> > format is made incompatible in a way that causes historic versions of SCCS 
>> > to
>> > abort with:
>>
>> One could argue that this is better.   Exiting with a zero status
>> while not producing the data in the g-file that the user wanted (e.g.
>> not expanding the extended keywords) violates the principle of least
>> surprise.
>>
>>
>> > get s.xx9
>> > ERROR [s.xx9]: format error at line 4 (co4)
>> >
>> > sccs help co4
>> > co4:
>> > "format error at line ..."
>> > The format of the SCCS file is logically invalid.  The error
>> > was discovered at the stated line.  See if you can find
>> > the problem with the prs command.  If not, do a "help stuck".
>> >
>> > /opt/schily/ccs/bin/val -T s.xx9
>> >    s.xx9: invalid delta table at line 4
>> >    s.xx9: corrupted SCCS file
>> >
>> > ^Ah41801
>> > ^As 00001/00001/00013
>> > ^Ad D 1.2 07/01/27 15:05:05 joerg 2 1
>> > ^AS test
>> > ^Ac man neu
>> > ^Ae
>>
>> Not really sure what point you're making here.   Flag settings are
>> introduced with ^Af.
>
> You correctly repleied to the previous chunk. Why didn't you understand this
> one too? It is related to it as it shows a method to abort historic SCCS
> implementations. Didn't you check which variations on a history file are
> accepted by SCCS and which are not?

No.  Because I have never needed to change the file format,
experimenting with varying the file's content to discover which
changes other implementations don't notice wasn't that useful.


>> > Sun did already work on a lot of buffer overflow problems in SCCS and I 
>> > did fix
>> > a lot of other such problems. I expect that the SCO source has no such 
>> > fixes
>> > and that it is better to upgrade to more recent SCCS versions than to 
>> > continue
>> > to use the SCO variant.
>>
>> It's probably more an issue for checking out historic source files.
>
> In these case, the x flag is probably unimportant and this specific SCO
> extension is less than 20 years old anyway.

I don't see it like that right now, since somebody actually asked me
to implement the flag for compatibility with the OpenServer
implementation.


>> > I have no idea how the checksum is computed in cases 4 & 5. It cannot be 
>> > the
>> > standard SCCS algotithm on the plain s. file starting with line 2.
>>
>> I don't know either.   But a better checksum algorithm wouldn't be
>> such a bad plan.  The historic one is very weak.   Fortunately typical
>> source repositories are also very small.
>
> What is typical and what is small?

My experience is that data modification rates are the most important
factor, so a smaller but faster-mutating source base can be more at
risk than a larger, static one.    I've never seen data corruption in
a source repository, but only in other kinds of data.   For other
data, I've seen failures from both cksum  (trivially easy to
demonstrate) and Adler32 (mostly much more reliable than any 16-bit
system, but I've seen this fail on double-bit flips in faulty network
hardware).

> RCS has no checksum at all and for this reason, many RCS repositories are
> defective without knowledge of their owners. I personally know of no case 
> where
> the historic checksum did not detect a problem but you are correct, it should
> be enhanced.
>
>> > Compression is a really interesting feature as it typically allows to 
>> > reduce
>> > the size of a history file to less than the size of the g-file. 
>> > Compreession is
>> > needed in order to be competitive with mercurial (e.g. for a "clone" over 
>> > the
>> > network).
>>
>> Overall file size is certainly a useful axis along which to optimise.
>>  But there are other important ones, for example the relationship
>> between the runtime and the total number of deltas (even non-ancestor
>> deltas).   For SCCS this is linear, but a redesigned file format could
>> reduce it.
>
> I am sure you are mistaken: The time to unpack any release from a SCCS history
> file only depends on the size of the history file but not really from the
> number of deltas. If you have 99999 deltas (which is expected to take more 
> than
> 200 years to create), then it may be a bit slower but there is no relation to 
> the
> higher time needed by RCS.

Performance in reading an SCCS file is trivially demonstrated to
depend linearly on the number of deltas because you have to read the
whole delta table.  But since the metadata is normally smaller than
the data, the constant factor is almost always small enough that this
won't dominate.   However, as we read the file, we need to decide if
any given I/E/D line means we should use the data lines we're reading.
  That decision for each control line is either going to be O(1) but
require O(N) setup (i.e. initialising and checking a bitmap) or be
O(ln N) (i.e. using some kind of tree).

This is probably overcomplicating the issue though.   Basically with
the SCCS file format there is no way to get sublinear performance.


> The problem with the SCCS file format is that there have been false and
> unproven claims from the people behind RCS. Larry McVoy was the first who did
> tests and he discovered, that RCS is not faster than SCCS. Even today, many
> people correctly believe (besides the speed) that the SCCS history file 
> format is
> one of the best formats.

Any problems with the SCCS file format are to do with its file format
only.   Incorrect claims about RCS are irrelevant.

> Did you make tests and do you believe that there is a real performance 
> problem?

Essentially no, not at the individual file level, because the number
of deltas is small enough that this isn't going to be a killer.  While
linearity in the total number of deltas is a fundamental limitation of
the SCCS file format, it's unlikely to be the bottleneck in terms of
wall-clock execution time.

> Do you have an idea for a better format?

I haven't worked on it.   If I were trying to work on performance, I'd
concentrate more on reducing the total number of file-open and
file-stat operations.

James.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]