[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_f
From: |
John Darrington |
Subject: |
Re: [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_file(). |
Date: |
Thu, 12 May 2011 17:19:47 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
These patches look good to me. However it's a shame that latin-1 etc. cannot be
detected. Other minor issues:
The text "Character Encoding:" is juxtaposed hard against the combo box, making
it look a bit awkward. Either the hbox needs padding or just append a space to
the string.
The "(Auto)" in "Automatically Detect (Auto)" seems somewhat redundant to me.
J'
On Wed, May 11, 2011 at 10:42:02PM -0700, Ben Pfaff wrote:
This will be used for the first time in an upcoming commit.
---
src/libpspp/encoding-guesser.c | 27 +++++++++++++++++++++++++++
src/libpspp/encoding-guesser.h | 4 ++++
2 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/src/libpspp/encoding-guesser.c
b/src/libpspp/encoding-guesser.c
index 298861e..7d10015 100644
--- a/src/libpspp/encoding-guesser.c
+++ b/src/libpspp/encoding-guesser.c
@@ -283,3 +283,30 @@ encoding_guess_tail_is_utf8 (const void *data, size_t
n)
: is_all_utf8_text (data, n));
}
+/* Attempts to guess the encoding of a text file based on ENCODING, an
encoding
+ name in one of the forms described at the top of encoding-guesser.h,
and the
+ SIZE byts in DATA, which contains the entire contents of the file.
Returns
+ the guessed encoding, which might be ENCODING itself or a suffix of it
or a
+ statically allocated string.
+
+ Encoding autodetection only takes place if ENCODING actually specifies
+ autodetection. See encoding-guesser.h for details. */
+const char *
+encoding_guess_whole_file (const char *encoding, const void *text, size_t
size)
+{
+ const char *guess;
+
+ guess = encoding_guess_head_encoding (encoding, text, size);
+ if (!strcmp (guess, "ASCII") && encoding_guess_encoding_is_auto
(encoding))
+ {
+ size_t ofs = encoding_guess_count_ascii (text, size);
+ if (ofs < size)
+ return encoding_guess_tail_encoding (encoding,
+ (const char *) text + ofs,
+ size - ofs);
+ else
+ return encoding_guess_parse_encoding (encoding);
+ }
+ else
+ return guess;
+}
diff --git a/src/libpspp/encoding-guesser.h
b/src/libpspp/encoding-guesser.h
index 2ec2fee..0a7d1f9 100644
--- a/src/libpspp/encoding-guesser.h
+++ b/src/libpspp/encoding-guesser.h
@@ -115,6 +115,10 @@ bool encoding_guess_tail_is_utf8 (const void *,
size_t);
const char *encoding_guess_tail_encoding (const char *encoding,
const void *, size_t);
+/* Guessing from entire file contents. */
+const char *encoding_guess_whole_file (const char *encoding,
+ const void *, size_t);
+
/* Returns true if C is a byte that might appear in an ASCII text file,
false otherwise. */
static inline bool
--
1.7.2.5
_______________________________________________
pspp-dev mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/pspp-dev
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_file()., Ben Pfaff, 2011/05/12
- [bug33254 6/6] gui: Make File|Recent Files remember the correct encoding., Ben Pfaff, 2011/05/12
- [bug33254 2/6] i18n: New function is_encoding_supported()., Ben Pfaff, 2011/05/12
- [bug33254 3/6] gui: Move null_if_empty_param() from psppire-window to helper., Ben Pfaff, 2011/05/12
- [bug33254 5/6] gui: Recode syntax files on load and save., Ben Pfaff, 2011/05/12
- [bug33254 4/6] gui: Refactor checking for .sav and .por suffixes., Ben Pfaff, 2011/05/12
- Re: [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_file().,
John Darrington <=