emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting a string to valid XHTML id?


From: Lawrence Mitchell
Subject: Re: Converting a string to valid XHTML id?
Date: Thu, 02 Dec 2010 15:50:11 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (usg-unix-v)

Lennart Borgman wrote:
> On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg <address@hidden> wrote:

>>> In the context where it is used it is for export of org-mode files to
>>> xhtml. Obviously if there are links to anchors within other files my
>>> approach will fails.

>>> So, hm, maybe I should reset this variable when starting a directory
>>> tree export or a single file export rather than making it buffer
>>> local. (But then I have to look into the export of directory trees in
>>> org-mode which I have not done yet.)



>> Just to be sure we're on the same page: the string MUST be unique
>> within the output, but it may NOT be unique within the input?
>> Therefore calling the function twice with the same argument must give
>> different results?

> No, I think they are already unique enough so to say in org-mode.
> Otherwise the links within org-mode could not work.

> So calling the function with the same argument must give the same
> result all times. (AND that result must be unique, ie no other input
> string should give the same result.)

As suggested previously, just take a crypto hash of the id.

(defun org-newhtml-escape-id (id)
   (format "ANON-%s" (sha1 id)))

As long as you do this for /all/ ids in the buffer, that'll work
fine.

If you only do it to invalid ids, then there's the possibility
that an existing ID in the buffer will have the form ANON-sha1sum
and a different invalid id will be escaped to ANON-sha1sum.

Or use Davis' solution which works in a similar way, and as a
bonus you can map back to the original id easily.

Recall his solution:

(defun org-newhtml-escape-id (str)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
  (replace-regexp-in-string
   "[^-.a-zA-Z0-9]" (lambda (c)
                      (mapconcat (lambda (d) (format "_%02x" d))
                                 (string-as-unibyte c) "")) str))

Notice that the output uses "_" which is a /valid/ char in an
xhtml id.  However, it is not considered valid in an input
string.

So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar
But (org-newhtml-escape-id "foo_bar") => foo_5fbar

So notice that valid ids /without/ an underscore in them are left
as is, but ids with an underscore are encoded under this scheme,
so you can't generate a collision.

Lawrence

-- 
Lawrence Mitchell <address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]