gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Literal names in the structure


From: Rauli Ruohonen
Subject: Re: [Gzz] Literal names in the structure
Date: Sat, 1 Mar 2003 11:44:03 +0200 (EET)

On Fri, 28 Feb 2003, Benja Fallenstein wrote:

> Literals in RDF are text string *representations* of actual values. For
> example, string "714" with datatype xsd:integer (or :int, dunno) is the
> number sevenhundredandfourteen.

There's no problem in this, as long as nobody ever sees the representation
itself :-) It's only about things you can see when using the system that
I'm concerned of; anyone tinkering with the Java code can be expected
to be able to understand English.

> One could use binary, but strings make the serialization human-readable.

Human-readable for those who can read the original representation. I kind
of doubt there are any potential computer users who can't read roman
numerals, but for other types of data it may be an issue.

> Yes, but Unicode goes a much longer way than ASCII. I think it's more
> culturally sensitive to have serializations in incomplete Unicode than
> plain ASCII that can encode zero Asian ideographs.

True, but plain Unicode is still not the ideal solution, and it would be
nice to do things right for once, when in an early stage of implementing a
system. (I'm not implying that they aren't, just making sure you've
thought of it)

> One should also be able to emphasize words. Both of these are tasks
> handled by markup languages on top of plain Unicode literals.

I think this is a reasonable optimization of space, but care should be
taken that those markups are possible everywhere you can use a Unicode
string. Also, using a different encoding than Unicode should be allowed,
for space concerns. (if space isn't a concern, you'd get a cleaner system
by not even using Unicode, but an URI for each character..)

> > The important advantage of ISO 10646 and Unicode is that they are
> > extensible character sets.  New mathematical characters are created, and
> > when they get significant use, they are allocated code points.  Witness
> > the new characters in 3.2, for encoding Z.

You still have to wait for them to be added to the character set. And as
to the Chinese characters, there are multiple variants of each character,
and I very much doubt anyone could even catalogue all the ones in use;
however, they can be encoded as superpositions of elementary parts.

It would be nice if, say, someone implementing an encoding & display
system for the Chinese characters in the fashion mentioned above, could
implement it without major hassle on top of the complete [*] system, and
put strings encoded in that way everywhere he could use a Unicode string.

[*] This is a cell representing the project, displayed using its current name.

> Um, both? How would you express xanalogical media in RDF except as literals?
>
> Of course, we won't usually show the actual string content of the
> literal (<xu:span uri="..." pos="17" len="6">foobar</xu:span>...).

If everything would be either a span or an URI, there shouldn't be
a problem, unless the span references are misused to represent
something else than the content of the span (e.g. as keywords, those
should always be URIs).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]