emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Should `auto-coding-functions' be mode-specific?


From: Kevin Rodgers
Subject: Re: Should `auto-coding-functions' be mode-specific?
Date: Tue, 02 Jan 2007 22:26:22 -0700
User-agent: Thunderbird 1.5.0.9 (Macintosh/20061207)

Romain Francoise wrote:
I received a bug report from a Debian user (CC'd) who was surprised
to see that Emacs 22 opens one of his utf-8-encoded files as ASCII,
because it contains the following HTML snippet near the top:

| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
| <HTML><HEAD>
| <META http-equiv=Content-Type content="text/html; charset=us-ascii">
| </HEAD>
| <BODY>
| </BODY></HTML>

The file itself is not an HTML file, but Emacs still uses the
encoding specified in the HTML code to set the encoding.  (This is
caused by `sgml-html-meta-auto-coding-function', which is present by
default in the list of `auto-coding-functions' -- the functions are
tried in the first 1K or last 3K bytes of the buffer.)

I replied that the encoding can be forced using a -*- coding: .. -*-
cookie, but the submitter argues that the functions to get the
encoding from the file's contents should only be enabled in modes
where the content of the buffer is supposed to match -- i.e. don't
use the META header function in buffers that aren't in html-mode (or
equivalent).

The other default element of auto-coding-functions is
sgml-xml-auto-coding-function, which looks for the encoding specified in
the XML declaration but is careful to ensure that the declaration occurs
at the beginning of the buffer (optionally preceded by whitespace, as
allowed by the XML spec).  Shouldn't sgml-html-meta-auto-coding-function
ensure that the <meta> tag occurs within an HTML document, by also
matching an appropriate pattern at the beginning buffer?

I know there is more variation in what is allowed at the beginning of an
HTML document compared to an XML document, but I think it would be an
improvement to require either an HTML document type declaration or an
<html> tag (optionally preceded by whitespace):

(when (re-search-forward "\\`[[:space:]\n]*\\(<!doctype[[:space:]\n]+html\\|<html\\)"
                           size t)
    ...)

Finally, note the following ChangeLog entry, which describes the patch
proposed by Juri in
<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-10/msg00916.html>
to handle invalid HTML (such as Mozilla Firefox bookmark files):

2006-06-02  Juri Linkov  <address@hidden>

        * international/mule.el (sgml-html-meta-auto-coding-function):
        Remove the condition `(search-forward "<html" size t)'.
        Replace `\"' with `[\"']?' in `re-search-forward'.

I agree that Emacs should not be too pedantic about HTML, but I don't
think it's too much to require an <html> tag before the <meta> tag.  The
bug reported by the Debian user concerns a file which is clearly not an
HTML file even though it contains a valid HTML document, because of the
text that precedes the markup.

What do people think?

(See http://bugs.debian.org/404236 for the discussion.)

--
Kevin Rodgers
Denver, Colorado, USA





reply via email to

[Prev in Thread] Current Thread [Next in Thread]