[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte

From:	Markus Mützel
Subject:	[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Date:	Thu, 10 May 2018 15:08:59 -0400 (EDT)
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #7, bug #53842 (project octave):

Mike: You are right. UTF-16 files aren't read correctly.
The problem seems to be with octave_fgets which reads the file as an array of
bytes (using char) and uses strlen to determine the string length. That means
for most files using UTF-16 it reads only the first (half) character.

Even if we solved the strlen issue, it doesn't seem to be easy to read UTF-16
or UTF-32 files with std::fgets (e.g. stops reading at single byte \n which
could be part of a valid 2-byte or 4-byte character).
Does anyone have an idea what we could do? Besides maybe document that we are
not able to read .m files with these encodings and emit an error when trying
to set them as the mfile_encoding.

Once we should be able to read the file, the conversion to UTF-8 should be
working out-of-the-box.

The attached set fixes the second issue you mentioned.


(file #44137)
    _______________________________________________________

Additional Item Attachment:

File name: bug53842_previous_setting.patch Size:0 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53842>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/05
- [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/05
  - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/06
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/08
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/09
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/09
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Mike Miller, 2018/05/09
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel <=
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Mike Miller, 2018/05/11
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/05/12

Prev by Date: [Octave-bug-tracker] [bug #44863] [octave forge] (statistics) version 1.2.4 fails tests
Next by Date: [Octave-bug-tracker] [bug #53872] type of 'xilaenv_' does not match original declaration
Previous by thread: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Next by thread: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Index(es):
- Date
- Thread