emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fallback fonts in LaTeX export for non latin scripts


From: Juan Manuel Macías
Subject: Re: Fallback fonts in LaTeX export for non latin scripts
Date: Mon, 04 Sep 2023 22:22:25 +0000

Ihor Radchenko writes:

> Juan Manuel Macías <maciaschain@posteo.net> writes:
>
>>> #+language: ancientgreek russian arabic
>>
>> Of course, this syntax would be the most appropriate and consistent
>> within Org. The problem is LaTeX, specifically babel, and that certain
>> inconsistencies would be created with the rest of the backends. At first
>> some pitfalls come to mind:
>>
>> - The keyword #+language accepts for now only language codes (es, en,
>>   el, ar, ru, etc.). Consistency with other backends should
>>   be maintained in this regard: ancientgreek is not a valid language
>>   code, but a name that only babel understands. If we put something
>>   like (a valid language code):
>>
>>   #+language: el-polyton
>>
>>   this could be translated in babel as polutonikogreek (in the classic
>>   syntax, that is, the languages that are loaded in the options of
>>   \usepackage[options]{babel}), or, in the new syntax, ancientgreek and
>>   polytonicgreek, which are actually two different languages: the first
>>   is ancient polytonic Greek and the second modern polytonic Greek. To
>>   add more confusion to the matter, in classical babel syntax
>>   greek.ancient and greek.polytonic are also supported. But neither of
>>   these things can be deduced by simply putting el-polyton, unless
>>   breaking the consistency with the other backends.
>
> I am now working on unifying Org translation system as discussed in
> https://orgmode.org/list/87o7iw8yem.fsf@bzg.fr
> As a part of the effort, I plan to introduce a new constant that will
> unify language abbreviations across Org and also associate them with
> more human-readable names.
>
> (defconst org-language-abbrevs
>   '(("am".  "Amharic")
>     ("ar" . "Arabic")
>     ("ast" . "Asturian")
>     ("bg" . "Bulgarian")
>     ("bn" . "Bengali")
>     ...))
>
> The idea is to allow
>
> #+language: Austrian German, Greek
> as a valid specifier, in addition to
>
> #+language: de-at, el
>
> Then, across Org, we will make use of the standardized language
> abbreviations.

Great! I think it's great news. Yes, I agree with what you say below. I
think Org should move towards a multilingual support that is 100% native
to Org. That is, Org had its own "selectlanguage" mechanism, to be able
to delimit text segments in other languages and have control over them,
both within Org and when exporting to the different backends. That
scenario seems very desirable to me, and I would like to contribute my
help to the best of my ability (and time).

In LaTeX, as I mentioned, things are complicated. There is Babel and
Polyglossia, and there is LuaTeX and XeTeX. In addition, there is also
pdfTeX, which is still the default engine and (to be honest) is the
engine used by a high percentage of LaTeX users. Although perhaps things
will change soon to the detriment of LuaTeX. Both babel and polyglossia
could be supported, but that means more work, more code, and more
complications. And we are not sure that polyglossia is no longer
maintained. After all, babel is the official LaTeX package for language
support, and polyglossia appeared at a time when babel had no support
for the new unicode engines. Now Babel supports all of that and is much
more powerful, but its interface has also grown in complexity. There is
the problem of the double syntax for loading languages: the old one,
which loads traditional ldf files, and the modern one (\babelprovide),
which loads languages using ini files. It is more powerful, with more
options, but has added more verbosity to babel. I have taken advantage
of \babelprovide, specifically its onchar=id fonts property, to
automatically apply fonts to non-Latin scripts.

>> I like this idea, but with the exception that in the two examples you
>> give the user is declaring two fonts for both languages. In my example
>> there was also Arabic, where the default font for the Arabic script is
>> used.
>
> My idea was that
>
> #+language: ancientgreek russian arabic
>
> implies "use default font for arabic", unless #+latex_font is specified.

This seems the most consistent to me for Org, but, as I mentioned in the
other email, I have some concerns. Currently, what we are talking about
is simply font support for non-Latin languages. If it is allowed, in the
current state of things, that #+language can accept a list of language
names, we can give the user a wrong perception of reality. That is:
multilingual support that does not exist as such. It is more like font
support for non-Latin languages. And only in LaTeX, and specifically in
LuaLaTeX. Furthermore, the user could mix languages that in Babel are
loaded through ldf and others through ini files. For example, something
like this:

#+language: spanish, english, french, russian

in Babel it would be:

\usepackage[english,french,spanish]{babel}

and here we need babelprovide for the font (and load Russian via ini
file):

\babelprovide[onchar=id fonts, import]{russian}
\babelfont[russian]{rm}[options]{somefont}

Org would have to discern which name refers to a non-Latin language
(which wouldn't be complicated with the functionality you're working on)
and then apply the default font by adding a line with \babelprovide.

Of course, English, French and Spanish can also be loaded via ini files:

\babelprovide[main,import]{spanish}
\babelprovide[import]{french}
\babelprovide[import]{english}

Even babel also supports:

\usepackage[english,french,spanish,provide*=*]{babel}

but in that line we cannot put Russian with onchar, etc. And then there
is pdfTeX, where only the classic babel syntax is allowed, without any
"*provide".

In short, I find everything very confusing. I am not opposed to doing it
as you propose (in fact, it is the option I like the most, especially
when org is polyglot in the future), but I also want to warn of possible
complications.

Therefore, since we are, for now, with fonts for non-Latin languages, I
think it should be made clear that the keyword is about fonts (and about
LuaLaTeX). Maybe through two keywords:

#+lualatex_fonts_for: language(s)
#+lualatex_fonts[language(s)]: "font" options

?

I think it's ugly, but I can't think of anything else.

By the way, and as a side note, is it currently possible in Org to
define a keyword within :options-alist of the style #+foo[anything] or
would something like org-collect-keywords have to be modified?

-- 
Juan Manuel Macías

https://juanmanuelmacias.com

https://lunotipia.juanmanuelmacias.com

https://gnutas.juanmanuelmacias.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]