[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] I18n, filename encoding, text file encoding
From: |
Christof Petig |
Subject: |
[Monotone-devel] I18n, filename encoding, text file encoding |
Date: |
Fri, 28 Nov 2003 11:20:43 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux ppc; de-AT; rv:1.5) Gecko/20031110 Debian/1.5-3 |
I recently spent some thoughts envisioning a decent scheme for
internationalized SCM (source code management) support.
To make local use possible now, monotone should allow binary filenames
(containing spaces as well as high bit characters (in whatever encoding
used by the importer)). Actually this prevents me from dumping cvs right
now. A later migration to UTF-8 etc might be done using rename operations.
IIRC a lot of people did not yet migrate to UTF-8 when it comes to file
names. (to be honest the filenames on all computers I know of are still
8859-1) And I can not see that _all_ people will use UTF-8 for filenames
in the near future (about five years that is). Western Europe tends to
stick with 8859-1/15 and I suspect CJK (east asian) users will stick to
one of their encodings. But to support real international development
(or even development between developers using different file name
encodings [likely during the transition to utf-8]) monotone might
support filename encoding conversions when interfacing checked out
version and database version.
This leads me to another problem: Sometimes file name encoding and file
content encoding are not the same. I have a lot of UTF-8 encoded files
for gtk2 projects while I still use ISO for file names.
Perhaps people using different code sets want to co-edit text files. I
suspect that this is far more likely to occur in CJK.
So several things are possible:
- stay away from this can of worms and make people of a project agree on
name and content encoding [while accepting "illegal in UTF-8" sequences
for the poor people still using different code sets] Assume utf-8 names
for a windows client (which clearly has native unicode filenames).
- tackle file name conversion and stay away from content conversion
- go for full encoding transparency. [Someone from CJK should comment on
whether this is interesting]
Clearly confusing but 8bit transparency will already cover my actual needs.
Christof
PS: There's a misspelling in
http://www.venge.net/monotone/self-hosting.html:
$ monotone --db=monotone.db lscerts manifest dcc23
should read
$ monotone --db=monotone.db ls certs manifest dcc23
- [Monotone-devel] I18n, filename encoding, text file encoding,
Christof Petig <=