[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Texmacs-dev] Questions regarding conversion between strings and tre
From: |
Henri Lesourd |
Subject: |
Re: [Texmacs-dev] Questions regarding conversion between strings and trees |
Date: |
Sat, 04 Mar 2006 15:43:34 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 |
David MENTRE wrote:
Hello,
I'm pursuing the idea of a literate programming mode for texmacs. My
current running code is able to parse (and produce in the reverse
direction) a source file into a Scheme structure like the following one
(" and \ are backquoted in the Scheme way). TM and CODE a two Scheme
symbols representing respectively texmacs document and literate code.
'((tm . "<TeXmacs|1.0.6>")
(tm . "")
(tm . "<style|generic>")
(tm . "")
(tm . "<\\body>")
(tm . " Sample texmacs document.")
(tm . "")
(code . "(define (hello-world) (display \"Hello world!\"))")
(code . "")
(tm . " \;")
(tm . "</body>")
(tm . "")
(tm . "<\\initial>")
(tm . " <\\collection>")
(tm . " <associate|language|french>")
(tm . " </collection>")
(tm . "</initial>")
I need to transform this data structure into a texmacs document (and in
the reverse way for saving). I would like to keep the CODE blocks into a
specific node type of texmacs document tree.
So my questions:
1. Is it possible to add a new node type (something like
<\lp-code></lp-code>) to texmacs tree? How? If it is possible, I
suppose I need to define a style file for rendering?
Just define a new macro, for example :
[[
<assign|lp-code|<macro|x|<with|font-shape|italic|<arg|x>>>
]]
, and then :
[[
<lp-code|hello>
]]
is displayed as an 'hello' in italics. Of course it's
up to you to decide what kind of display you really
want for <lp-code|...>.
To summarize, as far as I know, the only way to define
new markup / new node types in TeXmacs is by defining
a new macro using <assign|name|<macro|...>>.
Is this the answer to your question, or are you in
fact asking for a more broader question ?
2. How can I convert serialized part of texmacs document into tree data
structure suitable for inclusion in the current document buffer?
Apparently, string->tree could be used for this but, from my
attempts, tags are not interpreted correctly. E.g.:
(display* (string->tree "<TeXmacs|1.0.6>"))
gives
<tree \<TeXmacs\|1.0.6\>>
and not an expected
<tree <TeXmacs|1.0.6>>
Two points :
1. The header and suffix parts of a TeXmacs document
are currently *not* part of what you can access
with the TeXmacs tree API (i.e. (path->tree),
(path-assign), etc.) : the part you can access
with the API is only the part located inside
the <body|...> part of a TeXmacs document ;
2. You can *not* build composite TeXmacs trees
with (string->tree). What (string->tree) does
is only building *atomic* TeXmacs trees (leaves).
This is why in any case, a command like :
[[
(string->tree "everything <you> want")
]]
will always build an atomic tree like :
[[
<tree "everything \<you\> want">
]]
If you want to build a composite TeXmacs tree,
you must use the function (stree->tree). For
example, if you do :
[[
(tree->stree '(with "font-shape" "italics" (underline "Hello")))
]]
*then* you get the following composite TeXmacs tree :
[[
<with|font-shape|italics|<underline|Hello>>
]]
Or maye I don't interpret display* output correctly? In a previous
email Henri said that (string->tree "<gtr>") produces the expected
">" character in TeXmacs but a display* still prints "<tree
\<gtr\>>" on console.
This is because "<gtr>" is a valid string representation of ">",
if you want the symbol ">" itself appearing in a TeXmacs document.
You can observe that for example (display (string->tree ">"))
also works, you get <tree "\>">.
Thus there is an ambiguity, here, TeXmacs should consider
either ">" or "<gtr>" for being the appropriate representation
of the symbol ">" in <tree "..."> leaves, but not the two...
Anyway, what is important is to be able to generate
a symbol ">" if you need to, namely, if you write
to a file you know that you must use "\<gtr\>",
and that inside TeXmacs (string->tree "<gtr>")
amounts in fact to ">".
3. Moreover, I'm wondering how to handle the issue that opening and
closing tags (for example <\body> and </body>) are not in the same
string. One solution would be to:
a. first convert (code . "toto") lines into
(tm . "<\lp-code>toto</lp-code>");
b. and then concat all the strings to do a big string->tree on the
final string.
Is there a better way to do this?
The solution to your problem is either :
a. You generate the content of a TeXmacs file. In this case,
no problem with <\body> that starts on one line, etc., because
you generate everything. Moreover, givent that you know exactly
what you want, what you must generate is clearly defined by
the syntax of TeXmacs file's markup ;
b. You want to change a TeXmacs document 'on the fly', then
the right thing to do is (I get your very example) :
[[
(stree->tree
'(document
" Sample texmacs document."
""
(lp-code "(define (hello-world) (display \"Hello world!\"))")
(lp-code "")
""
))
]]
Some explanation is needed here for the use of '(document)
and the non-need to use ";\" : inside TeXmacs, the
markup :
[[
(document "A" "B" "C")
]]
is the one that is in fact used to implement the splitting
of the text into several paragraphs ; namely :
[[
(document "A" "B" "C")
]]
is displayed as :
[[
A
B
C
]]
inside TeXmacs.
The other important point about (document ...) is that
usually, you never see it in the Edit source tree mode,
nor in the markup. The reason why it is so is because
it is always *combined* with other markups : for example
with <body|...>. If you input a <body|...> markup in
TeXmacs and go in Source mode just afterwards, you will
see :
[[
<body|
>
]]
, instead of the more expected :
[[
<body|>
]]
This is because **IN FACT**, the **REAL** markup that
has been inserted is :
[[
<body|<document|>>
]]
This thing is extremely important, because in doing your
path calculations, you must **OF COURSE** take into account
the intermediary <document|> tag ! For example, the (local)
path to get the "A" inside the markup below :
[[
<body|
A>
]]
is '(0 0) and not simply '(0), because the markup you
have in hand is in fact :
[[
<body|<document|A>>
]]
The presence (or the absence) of an intermediary <document>
markup is also the reason why the "\" and "/" symbols appear
in the .tm files. The markup :
[[
<body
A>
]]
is written as :
[[
<\body>
A
</body>
]]
in a .tm file, while more simple markups, for
example <underline|A>, would be written the
same when serialized in a .tm file.
I've read with great interest recent discussion between Lionel and Henri
but I must admit I'm a bit lost in the string successive escapes. ;)
The conclusion is simple : if the symbols "<" and ">" appear in
the text of your document (namely, like in "a<b", for example),
then you must translate them either to "<less>" / "<gtr>" if
you use the TeXmacs tree API, or directly to "\<less\>" / "\<gtr\>"
if you are generating a TeXmacs file.
The problem is that the practice is made tricky because it is
not very clear where TeXmacs himself does additional translations.
Currently, this is perhaps not a big problem for you, you
probably don't need to consider immediately these particular
cases if you want to implement the 1s shot of your literate
programming tool.
Best, Henri