[Help-smalltalk] Unicode problem on parsing XML

help-smalltalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-smalltalk] Unicode problem on parsing XML

From:	Bèrto ëd Sèra
Subject:	[Help-smalltalk] Unicode problem on parsing XML
Date:	Wed, 5 May 2010 06:36:47 +0300

Hi all,

lately I started to see an out-of-index error when parsing Unicode
text from XML files. I am *not* 100% sure this isn't due to some
changes database-side that are now exposing a wider amount of text to
the parser, so I cannot safely claim it's new. Yet, now even common
accented Latin chars get warped into something unusable when read from
the parser, and this *surely* was not happening the last time I worked
on the interface, say 3 months ago, with Iliad 7.0.

Now I'm using gst 3.2 and iliad 0.8. What I get from the following code:
content := 'taxonomy.xml' asFile.
parser := XML.XMLParser new.
parser validate: false.
parser parse: content readStream.

is an error you can easily reply by putting
http://eng.i-iter.org/graph/taxonomy.xml file into your local dir.
Before you get crazy (as I did) digging around the text looking for
the guilty chars I can tell
you the breakers are, for example:
1)...the æ and œ ligatures, ...
2) Devanāgarī script for Hindi
3) Japanese Rōmaji script

I was wondering what changed... or, most probably, what kind of silly
mistake I'm making...

Bèrto

-- 
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-smalltalk] Unicode problem on parsing XML, Bèrto ëd Sèra <=
- [Help-smalltalk] Re: Unicode problem on parsing XML, Bèrto ëd Sèra, 2010/05/05
  - Message not available
    - Re: [Help-smalltalk] Re: Unicode problem on parsing XML, Bèrto ëd Sèra, 2010/05/06
- [Help-smalltalk] Re: Unicode problem on parsing XML, Paolo Bonzini, 2010/05/05

Prev by Date: Re: [Help-smalltalk] GNU Smalltalk 3.2 released
Next by Date: [Help-smalltalk] Re: Unicode problem on parsing XML
Previous by thread: [Help-smalltalk] [PATCH] Add --verbose to all tools
Next by thread: [Help-smalltalk] Re: Unicode problem on parsing XML
Index(es):
- Date
- Thread