[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-smalltalk] Unicode problem on parsing XML
From: |
Bèrto ëd Sèra |
Subject: |
[Help-smalltalk] Unicode problem on parsing XML |
Date: |
Wed, 5 May 2010 06:36:47 +0300 |
Hi all,
lately I started to see an out-of-index error when parsing Unicode
text from XML files. I am *not* 100% sure this isn't due to some
changes database-side that are now exposing a wider amount of text to
the parser, so I cannot safely claim it's new. Yet, now even common
accented Latin chars get warped into something unusable when read from
the parser, and this *surely* was not happening the last time I worked
on the interface, say 3 months ago, with Iliad 7.0.
Now I'm using gst 3.2 and iliad 0.8. What I get from the following code:
content := 'taxonomy.xml' asFile.
parser := XML.XMLParser new.
parser validate: false.
parser parse: content readStream.
is an error you can easily reply by putting
http://eng.i-iter.org/graph/taxonomy.xml file into your local dir.
Before you get crazy (as I did) digging around the text looking for
the guilty chars I can tell
you the breakers are, for example:
1)...the æ and œ ligatures, ...
2) Devanāgarī script for Hindi
3) Japanese Rōmaji script
I was wondering what changed... or, most probably, what kind of silly
mistake I'm making...
Bèrto
--
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.
- [Help-smalltalk] Unicode problem on parsing XML,
Bèrto ëd Sèra <=