[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Fenfire-dev] PEG: tstring schema
From: |
Tuomas Lukka |
Subject: |
[Fenfire-dev] PEG: tstring schema |
Date: |
Thu, 2 Oct 2003 11:36:10 +0300 |
User-agent: |
Mutt/1.5.4i |
(Only change: fixed schema to work)
Any objections to this?
Tuomas
=============================================================
PEG refstring_dtd--tjl: A DTD for refstrings
=============================================================
:Author: Tuomas J. Lukka
:Last-Modified: $Date: 2003/09/28 12:35:29 $
:Revision: $Revision: 1.1 $
:Status: Current
:Affects-PEGs: alph_lite--tjl
With Alph lite, we need to stabilize at least that data format.
There are several problems with the current XML format:
- for RICC (URN5) text spans and fake text spans, the actual
text is not written into the XML inside them. This would be
useful
- the element names are less than clear
Issues
======
- What is the name for this DTD? RefString is what it started
as, but later it was realized that these are *not* referential
strings but *idded* strings.
RESOLVED: Transcludable String, or TString for short.
Spans are Transcludable Spans or TSpans
- Should we have an element that surrounds a whole TString?
What about elements for fake spans?
RESOLVED: We should have an **optional** surrounding
element, to allow easy integration in different ways.
Using elements for fake spans is pointless
as they are best modeled by plain strings: consider::
<faketextspan>ab</faketextspan><faketextspan>cd</faketextspan>
<faketextspan>abcd</faketextspan>
In *all* semantics, these two lines should be equivalent.
- How should we define the TString DTD/Schema? DTD or Schema or other?
RESOLVED: XML Schemas seem the best option, due to proper namespace
support &c.
- What should be the URI for use with XML namespaces?
RESOLVED: The URI should be, analogous to the RDF vocab
conventions,
``http://fenfire.org/xmlns/2003/09/tstring#``
The Transcludable String XML DTD
================================
Define a Transcludable String XML schema as follows::
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:alph="http://fenfire.org/xmlns/2003/09/tstring#"
targetNamespace="http://fenfire.org/xmlns/2003/09/tstring#"
elementFormDefault="qualified"
attributeFormDefault="qualified"
>
<annotation>
<documentation xml:lang="en">
Transcludable String schema v1.0.
* Copyright (c) 2003, Tuomas J. Lukka
* This file is part of Alph.
*
* Alph is free software; you can redistribute it and/or modify it under
* the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* Alph is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
* or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
* Public License for more details.
*
* You should have received a copy of the GNU Lesser General
* Public License along with Alph; if not, write to the Free
* Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
* MA 02111-1307 USA
* Written by Tuomas J. Lukka
* Designed by Tuomas J. Lukka and Benja Fallenstein
</documentation>
</annotation>
<element name="tstring" type="alph:TStringType"/>
<element name="tspan" type="alph:TSpanType"/>
<complexType mixed="true" name="TStringType">
<annotation>
<documentation xml:lang="en">
A transcludable string, consisting of transcludable spans
and also text content (which will not be
transclusion-sensitive).
This is just a container element - the magic is in the spans.
</documentation>
</annotation>
<sequence>
<element ref="alph:tspan" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<complexType name="TSpanType">
<annotation>
<documentation xml:lang="en">
A transcludable span.
Transcludable spans are spans of text that identify themselves
through a URI and an offset.
Basic model
-----------
The basic model for TSpans is that there exists a single,
unique block
of letters denoted by the URI, and a TSpan contains a
contiguous
span of letters from that block.
However, to allow practical, non-centralized implementations,
the restrictions are relaxed: the ids only need
to be unique *with a high probability*.
Creating TSpans
---------------
There are two possible situations for creating TSpans:
creating tspans from text being typed in by a user,
or creating TSpans from text that already exists somewhere.
Creating TSpans while the user types
""""""""""""""""""""""""""""""""""""
For the URIs, we recommend "urn-5" random IDs, or UUIDs.
The TSpans can be generated by creating a single random id
for the entire session and simply increasing the current offset
by one whenever the user types a new character.
In the resulting text, adjacent length-1 spans that have
contiguous ids should be combined.
Creating TSpans from text that already exists somewhere
"""""""""""""""""""""""""""""""""""""""""""""""""""""""
This is a more difficult situation, as this is a case of adding
extra information where there used to be none. If two people
separately do this to the same text, it can happen that
transclusions
will not be found.
If the text is stable and unique, we recommend using some
Hash-based
URI scheme, such as urn:sha-1 or urn:x-storm, or a permanent
stable
identifier for exactly those characters, if that exists.
If the text is changing, **in no case** should something like
the URL of a webpage be used for the URI, as this will cause
undesirable
effects.
Editing operations
------------------
TSpans should never be edited except by splitting or by
removing: changes to the text inside the span are not permitted.
For inserting text, split the span first, then insert the text
between the spans. For removing text, split the span
appropriately
and remove one of the resulting spans.
The span-splitting operation works as follows: a TSPan with uri
X offset Y,
and N characters of content,
(tspan uri="X" offs="Y")N chars(/tspan)
becomes
(tspan uri="X" offs="Y")S chars(/tspan)(tspan uri="X"
offs="Y+S")N-S chars(/tspan)
for some S between 0 and N, exclusive.
Identifying transclusions
-------------------------
(Regions) spans are considered to be transclusions of each
other, if
the URI attributes match exactly and the text with the same
offset match.
The simplest way to explain the idea of "same offset" is to
split both spans
to one-character spans: the offsets in the resulting spans will
be consecutive,
and if **all** the one-character spans with the same offsets
match,
the two spans *overlap*. If even one one-character span does
not match,
the spans will not be considered overlapping.
Interoperability
----------------
The tspan element is defined through TSpanType in order to allow
other elements to take on this type: for instance, SVG ignores
text
inside "foreign elements" unlike HTML, where the default is to
show it.
In HTML, using tspan thus works out all right, but in SVG the
text
would not be shown. The solution is to use the alph:uri and
alph:offs
attributes on the SVG span element.
Rationale
---------
The idea of TSpans is to provide a simple way to get some of
the benefits
of Referential Fluid Media (see Nelson, "Xanalogical structure,
needed
now more than ever: parallel documents, deep links to content,
deep versioning,
and deep re-use", ACM Computing Surveys, 31(4es), 1999) by
providing an *identity*
for text.
TSpans carry their own content and thus need no central servers
to "resolve" the text from, and can be added to normal
applications
with minimal effort.
</documentation>
</annotation>
<simpleContent>
<extension base="string">
<attribute name="uri" type="anyURI" use="required"/>
<attribute name="offs" type="nonNegativeInteger"
use="required"/>
</extension>
</simpleContent>
</complexType>
</schema>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Fenfire-dev] PEG: tstring schema,
Tuomas Lukka <=