[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CVS libidn/doc/specifications
From: |
libidn-commit |
Subject: |
CVS libidn/doc/specifications |
Date: |
Thu, 22 Dec 2005 09:40:15 +0100 |
Update of /home/cvs/libidn/doc/specifications
In directory dopio:/tmp/cvs-serv22716
Added Files:
draft-iab-idn-nextsteps-01.txt
Log Message:
Add.
--- /home/cvs/libidn/doc/specifications/draft-iab-idn-nextsteps-01.txt
2005/12/22 08:40:15 NONE
+++ /home/cvs/libidn/doc/specifications/draft-iab-idn-nextsteps-01.txt
2005/12/22 08:40:15 1.1
Network Working Group J. Klensin
Internet-Draft
Expires: June 21, 2006 P. Faltstrom
IAB
December 18, 2005
Review and Recommendations for Internationalized Domain Names (IDN)
draft-iab-idn-nextsteps-01.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 21, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This note describe issues raised by the deployment and use of
Internationalized Domain Names. It describes problems both at the
time of registration and those for use of those names for use in the
DNS. It recommends that IETF should update the IDN related RFCs and
a framework to be followed in doing so, as well as summarizing and
identifying some work that is required outside the IETF. In
particular, it proposes that some changes be investigated for the
Klensin & Faltstrom Expires June 21, 2006 [Page 1]
Internet-Draft IAB -- IDN Next Steps December 2005
IDNA standard and its supporting tables, based on experience gained
since those standards were completed.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Status of this Document and its Recommendations . . . . . 4
1.2. The IDNA Standard . . . . . . . . . . . . . . . . . . . . 4
1.3. Unicode Documents . . . . . . . . . . . . . . . . . . . . 5
1.4. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1. language . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2. script . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.3. multilingual . . . . . . . . . . . . . . . . . . . . . 6
1.4.4. localization . . . . . . . . . . . . . . . . . . . . . 6
1.4.5. internationalization . . . . . . . . . . . . . . . . . 6
1.5. Statements and Guidelines . . . . . . . . . . . . . . . . 7
1.5.1. IESG Statement . . . . . . . . . . . . . . . . . . . . 7
1.5.2. ICANN statements . . . . . . . . . . . . . . . . . . . 7
2. Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1. Examples of issues . . . . . . . . . . . . . . . . . . . . 10
2.1.1. Language specific character matching . . . . . . . . . 10
2.1.2. Multiple scripts . . . . . . . . . . . . . . . . . . . 10
2.1.3. Normalization and Character Mappings . . . . . . . . . 11
2.1.4. URL on a bus . . . . . . . . . . . . . . . . . . . . . 13
2.1.5. Bidirectional text . . . . . . . . . . . . . . . . . . 13
2.1.6. Confusable Character Issues . . . . . . . . . . . . . 13
2.1.7. The IESG Statement and IDNA issues . . . . . . . . . . 15
2.1.8. Versions of Unicode . . . . . . . . . . . . . . . . . 15
3. Framework for next steps in IDN development . . . . . . . . . 16
3.1. Issues within the scope of the IETF . . . . . . . . . . . 16
3.1.1. Review of IDNA . . . . . . . . . . . . . . . . . . . . 16
3.1.2. Non-DNS and Above-DNS Internationalization
Approaches . . . . . . . . . . . . . . . . . . . . . . 17
3.1.3. Security issues, certificates, etc. . . . . . . . . . 18
3.1.4. Non US-ASCII in local part of email addresses . . . . 19
3.1.5. Use of the Unicode Character Set in the IETF . . . . . 19
3.2. Issues that fall within the purview of ICANN . . . . . . . 19
3.2.1. Dispute resolution . . . . . . . . . . . . . . . . . . 19
3.2.2. Policy at registries . . . . . . . . . . . . . . . . . 19
3.2.3. IDN TLDs . . . . . . . . . . . . . . . . . . . . . . . 20
4. Specific Recommendations for Next Steps . . . . . . . . . . . 20
4.1. Reduction of permitted character list . . . . . . . . . . 20
4.2. Elimination of all non-language characters . . . . . . . . 21
4.3. Elimination of word-separation punctuation . . . . . . . . 21
4.4. Updating to new versions of Unicode . . . . . . . . . . . 21
4.5. Combining Characters and Character Components . . . . . . 21
4.6. Role and Uses of the DNS . . . . . . . . . . . . . . . . . 22
Klensin & Faltstrom Expires June 21, 2006 [Page 2]
Internet-Draft IAB -- IDN Next Steps December 2005
5. Security Considerations . . . . . . . . . . . . . . . . . . . 22
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
7. Change History . . . . . . . . . . . . . . . . . . . . . . . . 23
7.1. Changes for version -01 . . . . . . . . . . . . . . . . . 23
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8.1. Normative References . . . . . . . . . . . . . . . . . . . 23
8.2. Informative References . . . . . . . . . . . . . . . . . . 24
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
Intellectual Property and Copyright Statements . . . . . . . . . . 28
Klensin & Faltstrom Expires June 21, 2006 [Page 3]
Internet-Draft IAB -- IDN Next Steps December 2005
1. Introduction
1.1. Status of this Document and its Recommendations
This document reviews the IDN landscape from an IETF perspective and
presents the recommendations and conclusions of the IAB, based
partially on input from an ad hoc committee charged with reviewing
IDN issues and the path forward (See Section 6). Its recommendations
are recommendations to the IETF, or in a few cases to other bodies,
for topics to be examined and actions to be taken if those bodies,
after their examinations, consider those actions appropriate.
IMPORTANT: The IAB has not yet reached consensus that this document
is ready for final publication. While considerable input from the
members of the ad hoc committee went into the document, no claim is
made that it represents the consensus of that group. However, the
IAB concluded that it was appropriate to expose these versions, as
working drafts, for community comment and feedback. Such comments
should be sent to address@hidden
1.2. The IDNA Standard
During 2002 IETF created the following RFCs that, together, define
IDNs:
RFC 3454 Preparation of Internationalized Strings ("stringprep")
[RFC3454].
Stringprep is a generic mechanism for taking a Unicode string and
converting it into a canonical format. Stringprep itself is just
a collection of rules, tables, and operations. Any protocol or
algorithm that uses it must define a "stringprep profile", which
specifies which of those rules are applied, how, and with which
characteristics.
RFC 3490 Internationalizing Domain Names in Applications (IDNA)
[RFC3490].
IDNA is the base specification in this group. It specifies that
Nameprep is used as the stringprep profile for domain names, and
that Punycode is the relevant the encoding mechanism use for use
in generating an ASCII-compatible ("ACE") form of the name. It
also applies some additional conversions and character filtering
that are not part of Nameprep.
RFC 3491 Nameprep: A Stringprep Profile for Internationalized Domain
Names (IDN) [RFC3491].
Nameprep is one such profile. It is designed to meet the specific
needs of IDNs and, in particular, to support case-folding for
scripts that support what are traditionally known as upper and
Klensin & Faltstrom Expires June 21, 2006 [Page 4]
Internet-Draft IAB -- IDN Next Steps December 2005
lower case forms of the same letters. The result of the nameprep
algorithm is a string containing a subset of the Unicode Character
set, normalized and case folded so that case insensitive
comparison can be made.
RFC 3492 Punycode: A Bootstring encoding of Unicode for
Internationalized Domain Names in Applications (IDNA) [RFC3492].
Punycode is a mechanism for encoding a Unicode string in ASCII
characters. The characters used are the same the subset of
characters that are allowed in the hostname definition of DNS,
i.e., the "letter, digit, and hyphen" characters, sometimes known
as "LDH".
1.3. Unicode Documents
Unicode is used as the base, and defining, character set for IDN.
Unicode is standardized by the Unicode Consortium, and synchronized
with ISO to create ISO/IEC 10646 [ISO10646]. At the time the RFCs
mentioned earlier were created, Unicode was at version 3.2. For
reasons explained later, the RFCs explicitly use Unicode version 3.2
[Unicode32] and no other version (see Section 2.1.8).
Unicode is a very large and complex character set. (The term
"character set" or "charset" is used in a way that is peculiar to the
IETF and may not be the same as the usage in other bodies and
contexts.) The Unicode Standard and related documents are created
and maintained by the Unicode Technical Committee (UTC), one of the
committees of the Unicode Consortium.
The Consortium first published The Unicode Standard [Unicode10] in
1991, and continues to develop standards based on that original work.
Unicode is developed in conjunction with the International
Organization for Standardization, and it shares its character
repertoire with ISO/IEC 10646. Unicode and ISO/IEC 10646 function
equivalently as character encodings, but The Unicode Standard
contains much more information for implementers, covering -- in depth
-- topics such as bitwise encoding, collation, and rendering. The
Unicode Standard enumerates a multitude of character properties,
including those needed for supporting bidirectional text. The two
standards do use slightly different terminology.
1.4. Definitions
The following terms and their meanings are criticial to understanding
of IDNs and the rest of this document. These terms are derived from
[RFC3536], which contains additional discussion of some of them.
Klensin & Faltstrom Expires June 21, 2006 [Page 5]
Internet-Draft IAB -- IDN Next Steps December 2005
1.4.1. language
A language is a way that humans interact. The use of language occurs
in many forms, the most common of which are speech, writing, and
signing.
Some languages have a close relationship between the written and
spoken forms, while others have a looser relationship. RFC 3066
[RFC3066] discusses languages in more detail and provides identifiers
for languages for use in Internet protocols. Computer languages are
explicitly excluded from this definition. The most recent IETF work
in this area, and on script identification (see below), is documented
in [ltru-registry] and [ltru-initial].
1.4.2. script
A set of graphic characters used for the written form of one or more
languages. This definition is the one used in [ISO10646].
Examples of scripts are Latin, Cyrillic, Greek, Arabic, and Han (the
ideographs used in writing Chinese, Japanese, and Korean). RFC 2277
[RFC2277] discusses scripts in detail.
1.4.3. multilingual
The term "multilingual" has many widely-varying definitions and thus
is not recommended for use in standards. Some of the definitions
relate to the ability to handle international characters; other
definitions relate to the ability to handle multiple charsets; and
still others relate to the ability to handle multiple languages.
1.4.4. localization
The process of adapting an internationalized application platform or
application to a specific cultural environment. In localization, the
same semantics are preserved while the syntax or presentation forms
may be changed.
Localization is the act of tailoring an application for a different
language or script or culture. Some internationalized applications
can handle a wide variety of languages. Typical users only
understand a small number of languages, so the program must be
tailored to interact with users in just the languages they know.
1.4.5. internationalization
In the IETF, "internationalization" means to add or improve the
handling of non-ASCII text in a protocol.
Klensin & Faltstrom Expires June 21, 2006 [Page 6]
Internet-Draft IAB -- IDN Next Steps December 2005
Many protocols that handle text only handle one script (often, a
subset of the characters used in writing English text), or leave the
question of what character set is used up to local guesswork (which
leads, of course, to interoperability problems). Adding non-ASCII
text to such a protocol allows the protocol to handle more scripts,
with the intention of being able to include all of the scripts that
are useful in the world. It should be noted that many English words
cannot be written in ASCII, various mythologies notwithstanding.
1.5. Statements and Guidelines
When the IDN RFCs were published, IESG and ICANN made statements that
were intended to guide deployment and future work. In recent months,
ICANN has updated its statement and others have also made
contributions.
1.5.1. IESG Statement
The IESG made a statement on IDNA
(http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt):
IDNA, through its requirement of Nameprep [RFC3491], uses
equivalence tables that are based only on the characters
themselves; no attention is paid to the intended language (if any)
for the domain name. However, for many domain names, the intended
language of one or more parts of the domain name actually does
matter to the users.
Similarly, many names cannot be presented and used without
ambiguity unless the scripts to which their characters belong are
known. In both cases, this additional information should be of
concern to the registry.
The statement is longer than this, but these paragraphs are the
important ones. The rest of the statement are explanations and
examples.
1.5.2. ICANN statements
1.5.2.1. Initial ICANN Guidelines
Soon after the IDNA standard was adopted, ICANN produced an initial
version of its "IDN Guidelines" [ICANNv1]. This document was
intended to serve two purposes. The first was to provide a basis for
releasing the gTLD registries that had been established by ICANN from
a contractual restriction on the registration of labels containing
hyphens in the third and fourth positions. The second was to provide
a general framework for the development of registry policies for the
Klensin & Faltstrom Expires June 21, 2006 [Page 7]
Internet-Draft IAB -- IDN Next Steps December 2005
implementation of IDN.
One of the key components of this framework was prescribing strict
compliance with RFCs 3490, 3491, and 3492. This established the ACE
scheme defined in nameprep as the sole such encoding to be used by
[1168 lines skipped]