% Created 2007-11-03 sam 11:58 \documentclass[11pt,a4paper]{article} \usepackage[mathletters]{ucs} %\usepackage{ucs} % Unicode support \usepackage[utf8x]{inputenc} % UCS' UTF-8 driver is better than the LaTeX kernel's \usepackage[T1]{fontenc} % The default font encoding only contains Latin characters \usepackage{ae,aecompl} % Almost European fonts/hyphenation do a better job than Computer Modern \usepackage{graphicx} \usepackage{hyperref} \hypersetup{ colorlinks=true, urlcolor=blue, linkcolor=blue, } \title{Unicode and org mode} \author{William Henney} \date{03 novembre 2007} \begin{document} \maketitle \section*{Notes on using unicode characters in org mode} \subsection*{How to enter the unicode characters} Use either the SGML or TeX input method. \subsubsection*{Using the TeX input method} \begin{itemize} \item Type \texttt{C-u C-\textbackslash{} tex} to activate \item Type things like \texttt{$\alpha$} or \texttt{$x^2$} and they will be translated into the unicode glyph. Use tab for completion help. \item Pro: ``Intuitive'' to use. \item Con: Gets in the way of typing a ``real'' backslash \end{itemize} \subsubsection*{Using the SGML input method} \begin{itemize} \item Type \texttt{C-u C-\textbackslash{} sgml} to activate \item Type things like \texttt{\α} or \texttt{\°} to get α and °. \item Pro: Access to more glyphs than with TeX it seems \item Con: No access to sub/superscripts \end{itemize} \subsection*{Punctuation} We can use the em and en dashes—this clause is bounded by em dashes—directly in the org file. However, they aren't very easily distinguishable in some fonts, especially fixed width ones at small sizes. Here is a range of numbers separated by an en dash: 223–999. In this sentence – following British typographic convention – the en dash is used like the em dash is used in American typography. Here are some minus signs:— binary (223 − 999) and unary (−0.2). Finally, here is a hyphen for comparison: a-b. They look good in proportional fonts, such as Times, Futura and Optima. Baskerville is the font where they look most like their Computer Modern versions. In fact, Baskerville looks quite a lot like CMR in other ways too… Oh, and that was an ellipsis. \begin{verbatim} Test in fixed-width font:— range 666–999 \end{verbatim} \begin{tabular}{ll} symbol & examples \\ \hline hyphen & 1-2 a-b \\ en dash & 1–2 a–b \\ em dash & 1—2 a—b \\ minus & 1−2 a−b \\ \hline \end{tabular} It seems that the glyphs for the non-ascii characters are always taken from those of the font family of the \texttt{default} face, even where the font-lock face is specifically set to another font family. \subsection*{Dealing with pre-formatted text} \begin{verbatim} This uses the org-code face, so we can easily make it fixed-width \end{verbatim} Even if we are using a proportional font family for the \texttt{default} face, by customizing the \texttt{org-code} face, we can use a fixed-width font (such as Monaco) for pre-formatted material (lines starting with ``:'' and words delimited with ``=''). We can do the same with the \texttt{org-table} face, so that the alignment of table lines still works. In the case of the pairing of Monaco and Times, it is also necessary to set the height of the fixed-width faces to 0.85, so that the character sizes match up. \subsubsection*{Bugs} \begin{enumerate} \item Table alignment still won't be quite right if there are unicode characters in the table cells, since the glyphs for these have variable widths, even in a \emph{supposedly} fixed-width font like Monaco. \item It doesn't work for sections with the QUOTE keyword, since these do not use any special face. \end{enumerate} \subsection*{Other typographical symbols (e.g., §)} % FIXME Cannot be printed: % It would be nice if we could use ∗, • and ⋆ as list markers. Maybe even % ♥ and ♠, although they look a bit heavy. Diamond character: ♢ % FIXME Cannot be printed: %✧ ♥ ⊼ ⋓ ∡ □ ϑ \subsection*{Greek letters and math symbols: \emph{α = x² − y²}} % FIXME Cannot be printed: %Examples: ½∫ Ξ₀ dz = ℏc/λ ⇒ ϑ ⊂ \{⊼, ⋓, ∡\} □ Examples: ½∫ Ξ₀ dz = ℏc/λ ⇒ \subsubsection*{Variations between fonts (Mac OS X 10.4/Aquamacs 1.2)} As far as I can see, only a few fonts have their own set of glyphs for the Greek letters. Times has a nice set of glyphs, although it does have the problem that italic nu and italic v look \emph{very} similar. Spot the difference: \emph{νv} ! Most font families use a common set of glyphs that have a Sans Serif feel to them, as though they were designed to go with Helvetica (although Helvetica actually uses a slightly different set). These glyphs have the problem that the ``gamma'' looks too much like a ``y'' and the ``tau'' looks like a ``t''. When used with Monaco, they look too small. \subsubsection*{Super- and sub-scripts} These don't exist for all letters. \subsubsection*{Example alphabets} αβγδεζηθικλμνξοπρστυφχψω\\ /αβγδεζηθικλμνξοπρστυφχψω/\\ abcdefghijklmnopqrstuvwxyz\\ \emph{abcdefghijklmnopqrstuvwxyz} \\ ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ\\ \emph{ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ} \\ ABCDEFGHIJKLMNOPQRSTUVWXYZ\\ \emph{ABCDEFGHIJKLMNOPQRSTUVWXYZ} \\ \begin{verbatim} αβγδεζηθικλμνξοπρστυφχψω /αβγδεζηθικλμνξοπρστυφχψω/ abcdefghijklmnopqrstuvwxyz /abcdefghijklmnopqrstuvwxyz/ ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ /ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ/ ABCDEFGHIJKLMNOPQRSTUVWXYZ /ABCDEFGHIJKLMNOPQRSTUVWXYZ/ \end{verbatim} \subsection*{Export to HTML} This should work since the charset is declared as utf-8. However, support in browsers is variable. \begin{itemize} \item Safari and Opera work the best—everything looks pretty nice in both. \item Firefox does OK, but the minus signs come out as hyphen. The bold math looks funny too with greek letters being \textbf{very} bold. \end{itemize} \subsection*{Export to \LaTeX{}} Presumably, this won't work out of the box. I haven't tried it yet. However, see this \href{http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html}{blog post by Graham Lee} for a possible solution: \begin{verbatim} \usepackage{ucs} % Unicode support \usepackage[utf8x]{inputenc} % UCS' UTF-8 driver is better than the \LaTeX{} kernel's \usepackage[T1]{fontenc} % The default font encoding only contains Latin characters \usepackage{ae,aecompl} % Almost European fonts/hyphenation do a better job than Computer Modern \end{verbatim} \subsubsection*{Update [2007-11-02 Fri]} % FIXME (some chars were removed): Best to use the option \texttt{[mathletters]}, since otherwise it tries to use commands like \texttt{textalpha} and I have no idea where these are defined (and Google wasn't much help). With \texttt{mathletters} it uses the standard math symbol greek alphabet, whether you are in math mode or not. I guess a better solution would be to use \texttt{ifmmode} to test if we are in math mode and use \texttt{upalpha} if we are not. \begin{description} \item[Problems encountered with \texttt{org-export-latex}] \begin{itemize} \item Backslashes in quoted text are not properly escaped. \end{itemize} \end{description} \subsection*{Integration with calc} Calc does not understand unicode as afar as I can see (e.g., it doesn't recognise 2.3 ± 0.4 as an error form). Presumably, this could be fixed rather easily since calc already has the concept of display styles. \end{document}