Re: BibTeX-mode: Key generation when latin-1 characters appear in author

From: Christian Schlauer
Subject: Re: BibTeX-mode: Key generation when latin-1 characters appear in author field
Date: Wed, 01 Aug 2007 22:54:22 +0200
[Crossposting to gmane.emacs.auctex.devel where RefTeX is maintained

The following is from a discussion in 2005, starting with
<http://article.gmane.org/gmane.emacs.pretest.bugs/6843>, where I wrote:

> I create the following entry in a BibTeX file -- the critical thing is
> that the name of the author contains an umlaut. (This is okay, as the
> BibTeX versions that nowadays come with TeX distributions are 8-bit
> capable, which means that I can enter umlauts in the .bib file
> directly instead of using \"o or something similar -- I just have to
> specify \usepackage[latin1]{inputenc} in the document preamble as
> well):
> @Article{,
>   author =     {B. Blöd},
>   title =      {Test},
>   journal =    {A},
>   year =       {2005},
>   OPTkey =     {},
>   OPTvolume =          {},
>   OPTnumber =          {},
>   OPTpages =   {},
>   OPTmonth =   {},
>   OPTnote =    {},
>   OPTannote =          {}
> }
> Now, press `C-c C-c' inside the entry -- Emacs suggests `blöd05:_test'
> as the key to use

... and Emacs 22.1 uses that key. I also wrote:

> It seems to me that the conversion of funny characters to ASCII
> characters is probably safer.

Roland Winkler replied in

> I have been thinking about this for a little while. I think, it
> goes beyond BibTeX. 
> Some time ago, I have written a little package umlaute.el, see
> http://www.tfkp.physik.uni-erlangen.de/~winkler/src/umlaute.el 
> The idea is to have `translation tables' so that one can go back and
> forth between different `representations' of a character, depending
> on the emacs mode, for example (I live in southern Germany)
>   text-mode:            Grüß Gott
>   tex-mode:             Gr\"u{\ss} Gott
>   german latex-mode:    Gr"u"s Gott
>   html-mode:            Gr&uuml;&szlig; Gott
>   other modes (7 bit):  Gruess Gott
> BibTeX keys probably should use the 7-bit `representation'. 
> For German umlaute the 7-bit `representation' is fairly well defined
> - even though I expect that some people would prefer a translation
> table that simply drops the double dots, but does not add the extra
> `e'. The best choice might depend on the context.
> However, I do not know how this could be done for other languages.
> Some spanish people use `Señor' -> `Se~nor'. Certainly, this is not
> a good choice in the context of BibTeX keys.
> I do not know whether such a feature would be useful for other emacs
> packages, too, nor do I know how it could best be implemented.

FWIW, RefTeX also needs to "convert" the text of \section{} etc. in
order to create a label. So yes, there are other packages (that are
included in Emacs!) that need this, too. I played a little with the
function `reftex-latin1-to-ascii', see below:

*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (reftex-latin1-to-ascii "räksmörgås")
ELISP> (reftex-latin1-to-ascii "blåbærsyltetøy")
ELISP> (reftex-latin1-to-ascii "Viele Grüße")
"Viele Gru3e"
ELISP> (reftex-latin1-to-ascii "Other letters: ł œ š ñ")
"Other letters: ł œ š n"

Maybe BibTeX-mode and RefTeX could share some label/key generation
code, and use `reftex-latin1-to-ascii' as a starting point?


Christian Schlauer

