monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] second go at i18n spec


From: graydon hoare
Subject: Re: [Monotone-devel] second go at i18n spec
Date: Tue, 09 Dec 2003 11:08:24 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031115 Thunderbird/0.3

Christof Petig wrote:

Original idea:
----->8-------
If it would be not too much hassle the possibility to specify a different encoding for filenames than for file content would save me much trouble. [e.g. my toolkit uses UTF-8 for strings and my shell still likes ISO or vice versa.*)] But then the encoding for a Makefile might differ from a program source file => encoding per file? *2)
----->8-------

oh, sorry, I guess I wasn't clear enough: I meant to support exactly what you suggest. under the proposal, filenames are subject to conversion to a normal form, but the normal form only applies to data *inside* monotone (when calculating SHA1 values, xdeltas, etc). file names in the working copy will be written to the file system using the "system encoding", and file data may be subject to *any* conversion in and out of the database, not related to the "system encoding".

the only reason for normalization on filenames is that monotone reads and interprets the manifest file, so it must know what the character set is. so I convert all filename character codes to UTF-8 before letting monotone read them.

so, to elaborate: there are 2 hooks you write (with analagous hooks for line-ending conversion).

-- this is used to map normalized (internal -- UTF-8) filenames to and -- from your filesystem. the UTF-8 side of the conversion is fixed, by
-- monotone, but this only applies to path names.

function system_charset()
  return "ISO-8859-1"
end


-- this is a per-file transformation, probably left blank or
-- returning nil (meaning "leave the file contents alone") but
-- possibly used to massage character codes to and from your
-- preferred forms for editing and for storage

function charconv(filename)

  if (string.find(filename, "%.java$")) then
    -- store java files as UTF-8 in monotone, check out as ISO
    return {"UTF-8", "ISO-8859-1"}
  end

  if (string.find(filename, "%.cbl.jp$")) then
    -- keep japanese cobol stuff in EBCDIC, check out as UCS-2
    return {"EBCDIC-JP-KANA", "UCS-2"}
  end

  -- otherwise leave the file alone
  return nil
end






reply via email to

[Prev in Thread] Current Thread [Next in Thread]