emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: enriched-mode and switching major modes.


From: Oliver Scholz
Subject: Re: enriched-mode and switching major modes.
Date: Wed, 22 Sep 2004 12:01:27 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (windows-nt)

Richard Stallman <address@hidden> writes:

>     When rendered by a graphical, CSS2-enabled browser, you'll see two
>     paragraphs on a gray background sourounded by a dashed border.  Those
>     two paragraphs are again contained in a larger paragraph on a purple
>     background surounded by a solid border.
>
> It will be very hard to implement this in a way that fits in with
> Emacs.

Okay, maybe this is the time to lay out the design on which I am
spending thought and code since I ditched the approach that I already
mentioned (as in `wp-example.el').  I fear, though, that you won't
like it.

The idea crossed my mind when I thought about how to implement a
data structure fit for XML + CSS in Emacs Lisp.  In other words: how
to make Emacs a /rendering/ XML editor.

XML is by nature a tree-like format.  The W3C has specified the
structure of information contained in XML documents in a way that
abstracts from the pointy brackets syntax; this abstract data set is
called the "XML Information Set":

http://www.w3.org/TR/xml-infoset/

For simplicity, I focus on elements and character data here and talk
about them as "nodes" in a tree; thus we have "element nodes" and
"text nodes".  An XHTML fragment like

<h1>Some <em>meaningless</em> text</h1>

Would be regarded as a `h1' element node which has three children: a
text node "Some ", an element node `em' (which has itself a text node
as its single child) an another text node " text".

I found out that I can translate any RTF document into an instance of
the XML info set.  So I can reduce the problem of designing a data
structure for word processing in Emacs to the question of how to
implement the XML info set in a way that text nodes are stored in a
buffer rather than in a string and that they are /editable/.  And that
question I can reduce to: how can I implement a tree-like data
structure with text properties?

So far I have considered two ways to do this. Both have specific
disadvantages.  But I'll come to that in a minute.

[If desired, I have prototype code for each of those two approaches to
experiment with. :-/ ]

One way is to have a single, unique Lisp object, a vector for example,
stored in a text property, say `text-node'.  That vector (or list)
would store a reference to its immeditate parent (which is always an
element node).  That parent would have a reference both to its
children and to its own parent and so on.  In addition, a buffer-local
variable would store the root element.

This has the advantage that I have two views of the document: One as a
Lisp Object, a tree of vectors or of lists, stored in a variable; the
other one as the content of a buffer with specific text properties.
The former allows to implement an API for accessing the contents of
the document and modifying it---I am thinking of XPath, DOM and other
W3C standards here that many people are familiar with.  If I have a
text node (the vector or list), then I can find its text in the buffer
with

`(text-property-any (point-min) (point-max) 'text-node TEXT-NODE)'

This should be fast.

But this solution has an undesirable fragility: care must be taken,
when killing and yanking, that both the text properties and the tree
be updated accordingly (for example if the killing results in the
entire deletion of a text node).  And if the tree is modified directly
(via the API), then the buffer contents need to be updated, too (for
example when this leads to transfering text nodes to another place in
the tree).  Basically this is again the problem of keeping two
structures in sync again.

The other way is to have a text property `parents' on each text
node in the buffer.  This would hold a list of all ancestor nodes
in the tree, starting with the immediate parent.  The
disadvantage here is that finding nodes takes much more time.
Especially finding all the children or descendants of a node
takes time.  Whereas in #1 I have a reference to the children in
the node, here I have to scan several ranges of text properties
to determine the children, e.g.

    1. Find the first position in the buffer where NODE is a
       member of the value of the text property `parents'.

    2. Push the value of `parents' to a list.

    3. Find the next single property change of `parents'.
    
    4. Determine if NODE is a member of the value of `parents'.
       If yes, goto 2.  If no, got 5.

    5. Determine children or descendants from the collected
       values.

Some care is necessary with copying and inserting text.  But we avoid
to keep to separate structures in sync at the cost that the access of
nodes (and thus the API) is inefficient.


So how to handle formatting in the buffer?  The element nodes would
store formatting information---either after applying a CSS stylesheet
to the tree, or, in the case of RTF, right away when parsing the file.
Functions that apply the formatting in the buffer (i.e. filling and
jit-lock) scan the tree upwards until they find the information they
need.

I have not yet determined whether #2 requires too much time for this.
The idea of #2 is rather new and not fully thought out and tested.  I
am not certain if I am aware of all possible pitfalls.  Moreover, I
have not yet figured out every detail of handling formatting
information in general.  I am still in the process of reading
specifications in order to get an overview.  I have to admit that I
have also been wondering, whether something could be done on the C
level to provide for such tree-like documents in an Emacs buffer.  I
don't have a clue here, though.

[Yesterday or so, a third way how to handle nested blocks crossed my
mind.  Maybe each paragraph could have a `nesting-level' text property
whose value is an integer.  For each nesting-level N, with N > 0, the
first preceeding block with a nesting level N - 1 is the immediate
parent.  I have no idea yet how, if at all, that would translate to
the XML info set, though.]


    Oliver
--
Oliver Scholz               1 Vendémiaire an 213 de la Révolution
Ostendstr. 61               Liberté, Egalité, Fraternité!
60314 Frankfurt a. M.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]