[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Organizing a retrocomputing project
From: |
hendrik |
Subject: |
[Monotone-devel] Organizing a retrocomputing project |
Date: |
Sat, 28 Jun 2008 11:25:36 -0400 |
User-agent: |
Mutt/1.5.9i |
I've opened a new project, a68h, at mtn-host.prjek.net.
I'm looking for advice about how to do the initial checkin, preferably
before I actually do it. There are a few issues that may be of wider
interest, and may relate to things planned (or not) for future versions
of monotone.
The project is to restore an ancient Algol 68 compiler for the IBM/360,
and make it run on today's popular hardware.
I'm starting from two development snapshots, taken approximately four
years apart. In the intervening period, several development directions
were aborted because of limitations in the toolset being used. But the
final stages of each of these are still present in the second snapshot,
though they were no longer in use. I do not have complete development
snapshots of these abandoned development directions -- just
final snapshots of the files that were discarded.
Now clearly this history can be included in the monotone archive. They
affect one major component of the system, the code generator. The reast
of the compiler (the majority of the code) was just improved in an
orderly way between the two snapshots. It is unlikely that the
discarded code will ever be of any use, since they are
machine-dependent hardware has changed radically in the meantime.
Large parts of the code generator that *was* in the final version are
also slated for replacement, but it is possible that significant parts
will remain.
So the first question is,
Is it worthwhile to represent this ancient history in the repository?
Next, the code base is stored in EBCDIC in IBM's FB records,
and some of it (mostly the test suite) is in IBM's VBS format.
For those not in the know, the FB records are fixed-length records, 80
bytes each, now concatenated into a long Linux binary file. In Linux,
you just read them 80 bytes at a time; each 80 bytes is a line of the
source code, 72 bytes of ENBDIC text, and 8 bytes of sequence number.
Line boundaries are indicated by counting bytes; there are no newline
characters of any kind.
It's not hard to convert to ASCII, but any reasonable conversion does
some damage to the data -- there are a few characters that don't have
ideal translation, and any sane change to Unix-style lines would involve
the removal of trailing spaces and line numbers.
Does it make sense to try to store the EBCDIC files into the monotone
repository? Monotone, I understand, prefers to store everything
internally in Unicode (possibly UTF-8 to save space). Now there are
reversible translations of EbCDIC to and from Unicode, but I don't think
the standard one plays nicelt with some of the weirder characters on the
TN print train (such as corners for drawing boxes). And there's still
the matter of line endings -- counting bytes won't work after the
conversion. Is there a Unicide newline (say) that is not a translation
of an EBCDIC characer?
Are there any plans for monotone to address character set issues beyond
CR-LF vs \n ? Should there be?
Should I just check all this in as binary files? Should I convert to
Unicode as if monotone recognised the EBCDIC character set and unicoded
it?
Or should I just abandon that bit of history and just use a plain ASCII
version of the latest snapshot and work from there?
---
There are some real questions here, that are likely to be of relevance
to others trying to work in the archaeology of computing. One that
doesn't affect me in this project is:
If you have reconstructed history from ancient snapshots and checked
it in accordingly, what do you do when you discover *another* ancient
snapshot that fits before all of them, or in between two existing ones?
-- hendrik
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Monotone-devel] Organizing a retrocomputing project,
hendrik <=