[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] mom : unicode in .INCLUDE'd files
From: |
Mike Bianchi |
Subject: |
Re: [Groff] mom : unicode in .INCLUDE'd files |
Date: |
Sun, 23 Jul 2017 08:23:51 -0400 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
This library purports to be a way to approach the problem ...
https://www.autoitconsulting.com/site/development/utf-8-utf-16-text-encoding-detection-library/
UTF-8 and UTF-16 Text Encoding Detection Library
by Jonathan Bennett | Aug 23, 2014 | Development |
This post shows how to detect UTF-8 and UTF-16 text and presents a fully
functional C++ and C# library that can be used to help with the detection.
I recently had to upgrade the text file handling feature of AutoIt to better
handle text files where no byte order mark (BOM) was present. The older
version of code I was using worked fine for UTF-8 files (with or without BOM)
but it wasn't able to detect UTF-16 files without a BOM. I tried to the the
IsTextUnicode Win32 API function but this seemed extremely unreliable and
wouldn't detect UTF-16 Big-Endian text in my tests.
Note, especially for UTF-16 detection, there is always an element of ambiguity.
This post by Raymond shows that however you try and detect encoding there will
always be some sequence of bytes that will make your guesses look stupid.
Here are the detection methods I'm currently using for the various types of
text file. The order of the checks I perform are:
BOM
UTF-8
UTF-16 (newline)
UTF-16 (null distribution)
:
:
--
Mike Bianchi
- Re: [Groff] mom : unicode in .INCLUDE'd files, (continued)
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/20
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files,
Mike Bianchi <=
- Re: [Groff] mom : unicode in .INCLUDE'd files, John Gardner, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, John Gardner, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Steffen Nurpmeso, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/23