octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Date: Thu, 10 May 2018 15:08:59 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #7, bug #53842 (project octave):

Mike: You are right. UTF-16 files aren't read correctly.
The problem seems to be with octave_fgets which reads the file as an array of
bytes (using char) and uses strlen to determine the string length. That means
for most files using UTF-16 it reads only the first (half) character.

Even if we solved the strlen issue, it doesn't seem to be easy to read UTF-16
or UTF-32 files with std::fgets (e.g. stops reading at single byte \n which
could be part of a valid 2-byte or 4-byte character).
Does anyone have an idea what we could do? Besides maybe document that we are
not able to read .m files with these encodings and emit an error when trying
to set them as the mfile_encoding.

Once we should be able to read the file, the conversion to UTF-8 should be
working out-of-the-box.

The attached set fixes the second issue you mentioned.


(file #44137)
    _______________________________________________________

Additional Item Attachment:

File name: bug53842_previous_setting.patch Size:0 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53842>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]