[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thoughts on the standardization of Org
From: |
Tom Gillespie |
Subject: |
Re: Thoughts on the standardization of Org |
Date: |
Sun, 1 Nov 2020 01:20:19 -0400 |
Hi Asa,
My general take is that any active work toward standardization
would be premature. At the very least a full implementation outside
of Emacs would need to exist. In the absence of that there is little
point to standardization. There is ample existing documentation to
build a compliant parser (pandoc exists as well ...) and any effort
toward standardization right now would be better spent improving
the existing implementation or fixing broken ones (e.g. org-ruby).
>From your comments, I would suggest reading through
https://orgmode.org/worg/dev/org-syntax.html if you have not
done so already. Much of what you mention is already there.
If something like standardization is still desired, I would suggest that
the proper framing for any such activities would be as improvement and
clarification in the documentation, and potentially as formalization of some
of the existing behaviors of the system. Org is a fairly stable system,
and as others have said, explicitly leaving things open an unspecified
would be vital.
There are also parts of org (e.g. babel) where the behavior needs to be
regularized and made consistent. At the moment those areas need
contributors, not standardization.
A few more thoughts in line. Best!
Tom
On Sat, Oct 31, 2020 at 8:22 PM Asa Zeren <asaizeren@gmail.com> wrote:
> this is impossible. If org catches on before it is standardized, we
> end up in the situation of Markdown, with many competing standards and
> non-standards. Hence, standardization is essential.
The situation for Org is not comparable to markdown. There is a single reference
implementation for org at the moment. The codebase is massive. There are many
existing parsers for org files. Many are obviously broken since they
do not match the reference
implementation's behavior. The obviousness is a sign that there is not a need
for standardization at this time. Further, there is little risk that
another impl will
be created without interoperating with the elisp implementation. For example,
consider Mauro's use case: being able to get colleagues who do not use Emacs
to use Org. I suspect most of the people who would be working on other
implementations
would be starting from Emacs and would be unlikely to leave. Also
unlike markdown,
html export is just one tiny part of Org, whereas markdown was
implemented repeatedly
to allow text input on web pages where people needed to implement
parts of html that
had not already been specified in markdown.
> Standardizing org is much harder than standardizing something like
> Markdown, but I think by breaking it down as follows will maximize the
> portability of org while not compromising on development of org.
See some of my other recent emails. In the short term this is impossible
due to the deep dependence on Emacs Lisp. Any outside implementation
that is created today would have to implement elisp. Few have been able
to do this in over 30 years. Moving beyond elisp requires additional machinery
to be added to org to be able to specify other top level langauges. This is
not something that is remotely ready for standardization because no one
even has a single working implementation yet!
> I see three areas of standardization, which I think should be
> standardized separately:
> - Org DOM
No. This is an implementation detail (see below for more).
> - Org Syntax
This is pretty much done, there are some outstanding points for discussion,
but they are about implementation details, not about the contents of the
syntax. Also extension of the syntax needs to be open and defined entirely
by the elisp implementation, as mentioned by others.
> - Org Standard Environments
Read https://orgmode.org/worg/dev/org-syntax.html. It will get you up to speed
with the existing terminology that is used in the community.
>
> Org DOM:
> The first thing to specify is the org DOM. (Maybe a different name
> should be used to avoid confusion with the HTML DOM) This is the
> structure of an org-mode document, without the textual
> representation. Many org-related tools operate on org documents
> without needing to use the textual representation. Specifying the DOM
> separately would (a) create a separation of concerns and (b) allow for
> better libraries built around org mode.
Depending on exactly what you mean by DOM this does not need to be standardized.
There are a couple of points that need to be clarified regarding how
to treeify the flat
list of elements that come out of a parse in order to tie things like
associated keywords
to the correct elements, but these are quite minimal. The potential
rats nest that is
trying to standardize a DOM when it is an implementation detail means
that I would
strongly discourage even thinking about Org in that way. I would even
discourage putting
too much emphasis on the org-element api which, while extremely useful
inside Emacs,
is not something that should be standardized because it is a detail
peculiar to the elisp
implementation.
There are cases where certain behaviors, such as how to parse and format
footnotes, could be specified, but such behaviors don't require a dom in order
to be specified, and adding a DOM to the picture does nothing but complicate
the format. Org is a text format. The semantics for interaction with the text
format are defined entirely by the text representation (In Emacs
there.is.only.buffer).
Other semantics, such as export to html and latex, are not something that you
would want to try to standardize, you would likely lose friends, enemies, and
whatever sanity you had left at the end (see discussion on Mauro's thread about
the fact that it is probably just easier to use Emacs directly if you
need to export
to a certain format in a specific way. It is free software after all.)
To the extent that an element tree could be useful, I think it would
be as a concept
in an implementation guide, not as something formally specified.
> Org Syntax:
> This would be specifying the mapping between the DOM and the textual
> representation, specified in terms of an environment.
There is no DOM. Modification to an org document must be made on the
text representation otherwise it is meaningless. This isn't html where there
is no canonical representation outside the DOM. The text representation of
an org document IS the canonical representation (modulo a normalization
pass).
> Org Standard Environments:
> This is how I would specify elements such as #+begin_src..#+end_src
> would be specified, as standardized elements of the environment. This
> would be structured as a number of individual standard environments,
> such as "Source Blocks" or "Standard Header Properties" (specifying
> #+title, #+author, etc.)
These are well specified already in the worg syntax draft. There are a couple
of special cases such as src and example blocks that could be included
explicitly in the syntax to facilitate interoperability with parsers
for org babel
languages. Beyond that, the community already has vocabulary that covers
what you describe here, as mentioned above.
- Re: Thoughts on the standardization of Org, Pankaj Jangid, 2020/11/01
- Re: Thoughts on the standardization of Org,
Tom Gillespie <=
- Re: Thoughts on the standardization of Org, Dr. Arne Babenhauserheide, 2020/11/01
- Re: Thoughts on the standardization of Org, Asa Zeren, 2020/11/01
- Re: Thoughts on the standardization of Org, Dr. Arne Babenhauserheide, 2020/11/01
- Re: Thoughts on the standardization of Org, TEC, 2020/11/01
- Re: Thoughts on the standardization of Org, Asa Zeren, 2020/11/01
Re: Thoughts on the standardization of Org, TEC, 2020/11/01
Re: Thoughts on the standardization of Org, Tim Cross, 2020/11/01
Re: Thoughts on the standardization of Org, Gustav Wikström, 2020/11/01