libidn-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CVS libidn/doc/specifications


From: libidn-commit
Subject: CVS libidn/doc/specifications
Date: Fri, 23 Dec 2005 23:41:40 +0100

Update of /home/cvs/libidn/doc/specifications
In directory dopio:/tmp/cvs-serv6511

Added Files:
        rfc4290.txt 
Log Message:
Add.


--- /home/cvs/libidn/doc/specifications/rfc4290.txt     2005/12/23 22:41:40     
NONE
+++ /home/cvs/libidn/doc/specifications/rfc4290.txt     2005/12/23 22:41:40     
1.1






Network Working Group                                         J. Klensin
Request for Comments: 4290                                 December 2005
Category: Informational


                Suggested Practices for Registration of
                  Internationalized Domain Names (IDN)

Status of This Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2005).

IESG Note

   This RFC is not a candidate for any level of Internet Standard.  The
   IETF disclaims any knowledge of the fitness of this RFC for any
   purpose and notes that the decision to publish is not based on IETF
   review apart from IESG review for conflict with IETF work.  The RFC
   Editor has chosen to publish this document at its discretion.  See
   RFC 3932 for more information.

Abstract

   This document explores the issues in the registration of
   internationalized domain names (IDNs).  The basic IDN definition
   allows a very large number of possible characters in domain names,
   and this richness may lead to serious user confusion about similar-
   looking names.  To avoid this confusion, the IDN registration process
   must impose rules that disallow some otherwise-valid name
   combinations.  This document suggests a set of mechanisms that
   registries might use to define and implement such rules for a broad
   range of languages, including adaptation of methods developed for
   Chinese, Japanese, and Korean domain names.












Klensin                      Informational                      [Page 1]

RFC 4290               IDN Registration Practices          December 2005


Table of Contents

   1. Introduction ....................................................3
      1.1. Background .................................................3
      1.2. The Nature and Status of these Recommendations .............4
      1.3. Terminology ................................................5
         1.3.1. Languages and Scripts .................................5
         1.3.2. Characters, Variants, Registrations, and Other
                Issues ................................................6
         1.3.3. Confusion, Fraud, and Cybersquatting ..................9
      1.4. A Review of the JET Guidelines .............................9
         1.4.1. JET Model .............................................9
         1.4.2. Reserved Names and Label Packages ....................10
      1.5. Languages, Scripts, and Variants ..........................11
         1.5.1. Languages versus Scripts .............................11
         1.5.2. Variant Selection ....................................13
      1.6. Variants are not a Universal Remedy .......................14
      1.7. Reservations and Exclusions ...............................14
         1.7.1. Sequence Exclusions for Valid Characters .............14
         1.7.2. Character Pairing Issues .............................15
      1.8. The Registration Bundle ...................................15
         1.8.1. Definitions and Structure ............................15
         1.8.2. Application of the Registration Bundle ...............16
   2. Some Implications of This Approach .............................17
   3. Possible Modifications of the JET Model ........................18
   4. Conclusions and Recommendations About the General Approach .....18
   5. A Model Table Format ...........................................19
   6. A Model Label Registration Procedure: "CreateBundle" ...........20
      6.1. Description of the CreateBundle Mechanism .................21
      6.2. The "no-variants" Case ....................................22
      6.3. CreateBundle and Nameprep Mapping .........................22
   7. IANA Considerations ............................................23
   8. Internationalization Considerations ............................24
   9. Security Considerations ........................................24
   10. Acknowledgements ..............................................25
   11. Informative References ........................................26















Klensin                      Informational                      [Page 2]

RFC 4290               IDN Registration Practices          December 2005


1.  Introduction

1.1.  Background

   The IDNA (Internationalized Domain Names in Applications)
   specification [RFC3490] defines the basic model for encoding non-
   ASCII strings in the DNS.  Additional specifications [RFC3491]
   [RFC3492] define the mechanisms and tables needed to support IDNA.
   As work on these specifications neared completion, it became apparent
   that it would be desirable for registries to impose additional
   restrictions on the names that could actually be registered (e.g.,
   see [IESG-IDN] and [ICANN-IDN]) to reduce potential confusion among
   characters that were similar in some way.  This document explores
   these IDN (international domain name) registration issues and
   suggests a set of mechanisms that IDN registries might use.

   Registration restrictions are part of a long tradition.  For example,
   while the original DNS specifications [RFC1035] permitted any string
   of octets in a DNS label, they also recommended the use of a much
   more restricted subset.  This subset was derived from the much older
   "hostname" rules [RFC952] and defined by the "LDH" convention (for
   the three permitted types of characters: letters, digits, and the
   hyphen).  Enforcement of this restricted subset in registrations was
   the responsibility of the registry or domain administrator.  The
   definition of the subset was embedded in the DNS protocol itself,
   although some applications protocols, notably those concerned with
   electronic mail, did impose and enforce similar rules.

   If there are no constraints on registration in a zone, people can
   register characters that increase the risk of misunderstandings,
   cybersquatting, and other forms of confusion.  A similar situation
   existed even before the introduction of IDNA, as exemplified by
   domain names such as example.com and examp1e.com (note that the
   latter domain contains the digit "1" instead of the letter "l").

   For non-ASCII names (so-called "internationalized domain names" or
   "IDNs"), the problem is more complicated.  In the earlier situation
   that led to the LDH (hostname) rules, all protocols, hosts, and DNS
   zones used ASCII exclusively in practice, so the LDH restriction
   could reasonably be applied uniformly across the Internet.  Support
   for IDNs introduces a very large character repertoire, different
   geographical and political locations, and languages that require
   different collections of characters.  The optimal registration
   restrictions are no longer a global matter; they may be different in
   different areas and, hence, in different DNS zones.






Klensin                      Informational                      [Page 3]

RFC 4290               IDN Registration Practices          December 2005


   For some human writing systems, there are characters and/or strings
   that have equivalent or near-equivalent usages.  If a name can be
   registered with such a character or string, the registry might want
   to automatically associate all of the names that have the same
   meaning with the registered name.  The registry might also decide
   whether the names that are associated with, or generated by, one
   registration should, as a group or individually, go into the zone or
   should be blocked from registration by different parties.

   To date, the best-developed system for handling registration
   restrictions for IDNs is the JET Guidelines for Chinese, Japanese,
   and Korean [RFC3743], the so-called "CJK" languages.  The JET
   Guidelines are limited to the CJK languages and, in particular, to
   their common script base.  Those languages are also the best-known
   and most widely-used examples of writing systems constructed on
   "ideographic" or "pictographic" principles.  This document explores
   the principles behind the JET guidelines.  It then examines some of
   the issues that might arise in adapting them to alphabetic languages,
   i.e., to languages whose characters primarily represent sounds rather
   than meanings.

   This document describes five things:

   1.  The general background and considerations for non-ASCII scripts
       in names.

   2.  Suggested practices for describing character variants.

   3.  A method for using a zone's character variants to determine which
       names should be associated with a registration.

   4.  A format for publishing a zone's table of character variants;
       Such tables are referred to below simply as "language tables" or
       simply "tables".

   5.  A model algorithm for name registration given the presence of
       language tables.

1.2.  The Nature and Status of these Recommendations

   The document makes recommendations for consideration by registries
   and, where relevant, by those who coordinate them, and by those who
   use their services.  None of the recommendations are intended to be
   normative.  Instead, the intent of the document is to illustrate a
   framework for developing variations to meet the needs of particular
   registries and their processing of particular languages.  Of course,
   if registries make similar decisions and utilize similar tools, costs




Klensin                      Informational                      [Page 4]

RFC 4290               IDN Registration Practices          December 2005


   and confusion may be reduced -- both between registries and for users
   and registrars who have relationships with more than one domain.

   Just as the JET Guidelines contain some suggestions that may not be
   applicable to alphabetic scripts, some of the suggestions here,
   especially the more specific ones, may be applicable to some scripts
   and not others.

1.3.  Terminology

1.3.1.  Languages and Scripts

   This document uses the term "language" in what may be, to many
   readers, an odd way.  Neither this specification, nor IDNA, nor the
   DNS are directly concerned with natural language, but only with the
   characters that make up a given label.  In some respects, the term
   "script", used in the character coding community for a collection of
   characters, might be more appropriate.  However, different subsets of
   the same script may be used with different languages, and the same
   language may be written using different characters (or even
   completely different scripts) in different locations, so "script" is
   not precisely correct either.

   Long-standing confusion has also resulted from the fact that most
   scripts are, informally at least, named after one of the languages
   written in them.  "Chinese" describes both a language and a
   collection of characters that are also used in writing Japanese,
   Korean, and, at least historically, some other languages.  "Latin"
   describes a language, the characters used to write that language,
   and, often, characters used to write a number of contemporary
   languages that are derived from or similar to those used to write the
   Latin language.  The script used to write the Arabic language is
   called "Arabic", but it is also used (typically with some additions
   or deletions) to write a number of other languages.  Situations in
   which a script has a clearly-defined name that is independent of the
   name of a language are the exception, rather than the rule; examples
   include Hangul, used to write Korean, Katakana and Hiragana, used to
   write Japanese, and a few others.  Some scholars have historically
   used "Roman" or "Roman-derived" for the script in an attempt to
   distinguish between a script and the Latin language.

   The term "language" is therefore used in this document in the
   informal sense of a written language and is defined, for this
   purpose, by the characters used to write it, i.e., as a language-
   specific subset of a script.  In this context, a "language" is
   defined by the combination of a code (see Section 1.4.1) and an
   authority that has chosen to use that code and establish a
   character-listing for it.  Authorities are normally TLD (top-level



Klensin                      Informational                      [Page 5]

RFC 4290               IDN Registration Practices          December 2005


   domain) registries; see Section 7 and [IANA-language-registry].
   However, it is expected that TLD registries will find appropriate
   experts and that advice from language and script experts selected by
   international neutral bodies will also become part of the
   registration system.  In addition, as discussed below in Section 7,
   registries may conclude that the best interests of registrants,
   stakeholders, and the Internet community would be served by
   constructing "language tables" that mix scripts and characters in
   ways that conform to no known language.  Conventions should be
   developed for such registrations that do not misleadingly reflect
   specific language codes.

1.3.2.  Characters, Variants, Registrations, and Other Issues

   1.  Characters in this document are specified by their Unicode
       codepoints in U+xxxx format, by their official names, or both.

   2.  The following terms are used in this document.

       *  String

          A "string" is an sequence of one or more characters.

       *  Base Character

          This document discusses characters that may have equivalent or
          near-equivalent characters or strings.  A "base character" is
          a character that has zero or more equivalents.  In the JET
          Guidelines, base characters are referred to as "valid
          characters".  In a table with variants, as described in
          Section 5, the base characters occupy the first column.
          Normally (and always, if the recommendation of Section 6.3 is
          adopted), the base characters will be the characters that
          appear in registration requests from registrants; any other
          character will invalidate the registration attempt.

       *  Native Script

          Native script is the form in which the relevant string would
          normally be represented.  For example, it might use Lower
          Slobbovian characters and the glyphs normally used to write
          them.  It would not be punycode as a presentation form.

       *  Variant Characters/Strings

          The "variant(s)" are character(s) and/or string(s) that are
          treated as equivalent to the base character.  Note that these
          might not be exactly equivalent characters; a particular



Klensin                      Informational                      [Page 6]

RFC 4290               IDN Registration Practices          December 2005


          original character may be a base character with a mapping to a
          particular variant character, but that variant character may
          not have a mapping to the original base character.  Indeed,
          the variant character may not appear in the base character
          list, and hence may not be valid for use in a registration.
          Usually, characters or strings to be designated as variants
          are considered either equivalent or sufficiently similar (by
          some registry-specific definition) that confusion between them
          and the base character might occur.

       *  Base Registration

          The "base registration" is the single name that the registrant
          requested from the registry.  The JET Guidelines use the term
          "label string" for this concept.

       *  Registered, Activated

          A label (or "name") is described as "registered" if it is
          actually entered into a domain (i.e., into a zone file) by the
          registry, so that it can be accessed and resolved using
          standard DNS tools.  The JET Guidelines describe a
          "registered" label as "activated".  However, some domains use
          a slightly different registration logic in which a name can be
          registered with the registrar (if one is involved) and with
          the registry, but not actually entered into the zone file
          until an additional activation or delegation step occurs.
          This document does not make that distinction, but is
          compatible with it.

          As specified in the IDNA Standard, the name actually placed in
          the zone file is always the internal ("punycode") form.  There
          is no provision for actually entering any other form of an IDN
          into the DNS.  It remains controversial, with different
          registrars and registries having adopted different policies,
          as to whether the registration, as submitted by the
          registrant, is in the form of:

          o  The native-script name, either in UTF-8 or in some coding
             specified by the registrar, or

          o  the internal-form ("punycode") name, or

          o  both forms of the name together, so that the registrar and
             registry can verify the intended translation.






Klensin                      Informational                      [Page 7]

RFC 4290               IDN Registration Practices          December 2005


          If any of the approaches defined in this document is used, it
          is almost certain to be necessary that the native-script form

[1171 lines skipped]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]