emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting a string to valid XHTML id?


From: rm
Subject: Re: Converting a string to valid XHTML id?
Date: Wed, 1 Dec 2010 16:58:58 +0100
User-agent: Mutt/1.5.15+20070412 (2007-04-11)

On Wed, Dec 01, 2010 at 07:34:00AM -0800, Davis Herring wrote:
> >   (let ((old (assoc id org-newhtml-escaped-ids))
> 
> Wouldn't it be easier to do something like percent encoding?  Map
> everything that isn't [-.a-zA-Z0-9] onto _HH.  Multibyte characters could
> be handled by writing their UTF-8 encoding, or else by escaping as _nHH...
> where n is the number of hex digits needed (itself always a single digit):


That sounds tempting but is wrong :-/ Percent-encoding doesn't produce
valid  ID values. From the html 4 specs:

 6.2 SGML basic types

  ....

 ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
 followed by any number of letters, digits ([0-9]), hyphens ("-"),
 underscores ("_"), colons (":"), and periods (".").


Cheers. Ralf Mattes

> 
> ;; Uses Emacs' internal encoding instead of UTF-8 proper.
> (defun org-newhtml-escape-id (str)
>   "Return a valid xhtml id attribute string.
> See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
>   (replace-regexp-in-string
>    "[^-.a-zA-Z0-9]" (lambda (c)
>                       (mapconcat (lambda (d) (format "_%02x" d))
>                                  (string-as-unibyte c) "")) str))
> 
> Certainly someone could already have an id "foo_5fbar", but the
> table-based implementation already makes the assumption that all IDs will
> be generated by it.
> 
> Davis
> 
> -- 
> This product is sold by volume, not by mass.  If it appears too dense or
> too sparse, it is because mass-energy conversion has occurred during
> shipping.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]