pspp-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pspp-cvs] pspp tests/command/get-data-txt-examples.sh tes...


From: Ben Pfaff
Subject: [Pspp-cvs] pspp tests/command/get-data-txt-examples.sh tes...
Date: Sun, 10 Feb 2008 08:17:52 +0000

CVSROOT:        /cvsroot/pspp
Module name:    pspp
Changes by:     Ben Pfaff <blp> 08/02/10 08:17:52

Modified files:
        tests/command  : get-data-txt-examples.sh 
        tests          : ChangeLog 
        src/ui/gui     : psppire-case-file.c helper.c find-dialog.c 
                         ChangeLog 
        src/language/xforms: recode.c 
        src/language/lexer: range-parser.c 
        src/language/expressions: operations.def 
        src/language/data-io: get-data.c data-parser.h data-parser.c 
        src/data       : data-in.h data-in.c ChangeLog 
        doc            : files.texi 

Log message:
        Add a couple of extensions to GET DATA TYPE=TXT.  Patch #6412.  Thanks
        to John Darrington for review.

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/pspp/tests/command/get-data-txt-examples.sh?cvsroot=pspp&r1=1.1&r2=1.2
http://cvs.savannah.gnu.org/viewcvs/pspp/tests/ChangeLog?cvsroot=pspp&r1=1.122&r2=1.123
http://cvs.savannah.gnu.org/viewcvs/pspp/src/ui/gui/psppire-case-file.c?cvsroot=pspp&r1=1.33&r2=1.34
http://cvs.savannah.gnu.org/viewcvs/pspp/src/ui/gui/helper.c?cvsroot=pspp&r1=1.32&r2=1.33
http://cvs.savannah.gnu.org/viewcvs/pspp/src/ui/gui/find-dialog.c?cvsroot=pspp&r1=1.6&r2=1.7
http://cvs.savannah.gnu.org/viewcvs/pspp/src/ui/gui/ChangeLog?cvsroot=pspp&r1=1.108&r2=1.109
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/xforms/recode.c?cvsroot=pspp&r1=1.30&r2=1.31
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/lexer/range-parser.c?cvsroot=pspp&r1=1.11&r2=1.12
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/expressions/operations.def?cvsroot=pspp&r1=1.18&r2=1.19
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/data-io/get-data.c?cvsroot=pspp&r1=1.5&r2=1.6
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/data-io/data-parser.h?cvsroot=pspp&r1=1.1&r2=1.2
http://cvs.savannah.gnu.org/viewcvs/pspp/src/language/data-io/data-parser.c?cvsroot=pspp&r1=1.3&r2=1.4
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/data-in.h?cvsroot=pspp&r1=1.8&r2=1.9
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/data-in.c?cvsroot=pspp&r1=1.26&r2=1.27
http://cvs.savannah.gnu.org/viewcvs/pspp/src/data/ChangeLog?cvsroot=pspp&r1=1.184&r2=1.185
http://cvs.savannah.gnu.org/viewcvs/pspp/doc/files.texi?cvsroot=pspp&r1=1.14&r2=1.15

Patches:
Index: tests/command/get-data-txt-examples.sh
===================================================================
RCS file: /cvsroot/pspp/pspp/tests/command/get-data-txt-examples.sh,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -b -r1.1 -r1.2
--- tests/command/get-data-txt-examples.sh      5 Dec 2007 06:40:13 -0000       
1.1
+++ tests/command/get-data-txt-examples.sh      10 Feb 2008 08:17:49 -0000      
1.2
@@ -75,12 +75,12 @@
 
 activity="create pets.data"
 cat > pets.data <<'EOF'
-"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
+'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 , (Years), , , (Dollars), ,
-"Rover", 4.5, Brown, "12 Feb 2004", 80, True, "Dog"
-"Charlie", , Gold, "5 Apr 2007", 12.3, False, "Fish"
-"Molly", 2, Black, "12 Dec 2006", 25, False, "Cat"
-"Gilly", , White, "10 Apr 2007", 10, False, "Guinea Pig"
+"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
+"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 EOF
 if [ $? -ne 0 ] ; then no_result ; fi
 
@@ -114,14 +114,14 @@
                    age 40-47 F.
 LIST.
 
-GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='"'
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
         /FIRSTCASE=3
         /VARIABLES=name A10
                    age F3.1
                    color A5
                    received EDATE10
                    price F5.2
-                   needs_walking a5
+                   height a5
                    type a10.
 LIST.
 EOF
@@ -152,12 +152,12 @@
 Civic        2003    13415    15900 EX              1
 Civic        1992   107000     3800 n/a            12
 Accord       2002    26613    17900 EX              1
-      name  age color   received  price needs_walking       type
----------- ---- ----- ---------- ------ ------------- ----------
-Rover       4.5 Brown 12.02.2004  80.00         True  Dog
-Charlie      .  Gold  05.04.2007  12.30         False Fish
-Molly       2.0 Black 12.12.2006  25.00         False Cat
-Gilly        .  White 10.04.2007  10.00         False Guinea Pig
+      name  age color   received  price height       type
+---------- ---- ----- ---------- ------ ------ ----------
+Rover       4.5 Brown 12.02.2004  80.00  1'4"  Dog
+Charlie      .  Gold  05.04.2007  12.30  3"    Fish
+Molly       2.0 Black 12.12.2006  25.00  5"    Cat
+Gilly        .  White 10.04.2007  10.00  3"    Guinea Pig
 EOF
 if [ $? -ne 0 ] ; then fail ; fi
 

Index: tests/ChangeLog
===================================================================
RCS file: /cvsroot/pspp/pspp/tests/ChangeLog,v
retrieving revision 1.122
retrieving revision 1.123
diff -u -b -r1.122 -r1.123
--- tests/ChangeLog     3 Feb 2008 06:52:13 -0000       1.122
+++ tests/ChangeLog     10 Feb 2008 08:17:50 -0000      1.123
@@ -1,3 +1,9 @@
+2008-02-10  Ben Pfaff  <address@hidden>
+
+       * command/get-data-txt-examples.sh: Update to match changes to
+       documentation (which were in turn updated to show how the escaped
+       quote feature works).
+
 2008-02-02  Ben Pfaff  <address@hidden>
 
        * automake.mk: Add target for dissect-sysfile.

Index: src/ui/gui/psppire-case-file.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/ui/gui/psppire-case-file.c,v
retrieving revision 1.33
retrieving revision 1.34
diff -u -b -r1.33 -r1.34
--- src/ui/gui/psppire-case-file.c      3 Feb 2008 12:09:25 -0000       1.33
+++ src/ui/gui/psppire-case-file.c      10 Feb 2008 08:17:50 -0000      1.34
@@ -366,7 +366,7 @@
   width = fmt_var_width (fmt);
   value = xmalloca (value_cnt_from_width (width) * sizeof *value);
   ok = (datasheet_get_value (cf->datasheet, casenum, idx, value, width)
-        && data_in (input, LEGACY_NATIVE, fmt->type, 0, 0, value, width)
+        && data_in (input, LEGACY_NATIVE, fmt->type, 0, 0, 0, value, width)
         && datasheet_put_value (cf->datasheet, casenum, idx, value, width));
 
   if (ok)

Index: src/ui/gui/helper.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/ui/gui/helper.c,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -b -r1.32 -r1.33
--- src/ui/gui/helper.c 10 Feb 2008 07:08:45 -0000      1.32
+++ src/ui/gui/helper.c 10 Feb 2008 08:17:50 -0000      1.33
@@ -91,7 +91,7 @@
     }
 
   msg_disable ();
-  ok = data_in (ss_cstr (text), LEGACY_NATIVE, format.type, 0, 0,
+  ok = data_in (ss_cstr (text), LEGACY_NATIVE, format.type, 0, 0, 0,
                 v, fmt_var_width (&format));
   msg_enable ();
 

Index: src/ui/gui/find-dialog.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/ui/gui/find-dialog.c,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -b -r1.6 -r1.7
--- src/ui/gui/find-dialog.c    29 Jan 2008 11:13:00 -0000      1.6
+++ src/ui/gui/find-dialog.c    10 Feb 2008 08:17:50 -0000      1.7
@@ -600,7 +600,7 @@
   if ( ! data_in (ss_cstr (target),
                   LEGACY_NATIVE,
                  fmt->type,
-                 0, 0,
+                 0, 0, 0,
                  vc->pattern, width) )
     {
       free (vc);

Index: src/ui/gui/ChangeLog
===================================================================
RCS file: /cvsroot/pspp/pspp/src/ui/gui/ChangeLog,v
retrieving revision 1.108
retrieving revision 1.109
diff -u -b -r1.108 -r1.109
--- src/ui/gui/ChangeLog        8 Feb 2008 23:30:13 -0000       1.108
+++ src/ui/gui/ChangeLog        10 Feb 2008 08:17:50 -0000      1.109
@@ -1,3 +1,18 @@
+2008-02-09  Ben Pfaff  <address@hidden>
+
+       Consolidate multiple messages into single message dialog.  Patch
+       #6405.  Thanks to John Darrington for review.
+
+       * automake.mk (dist_src_ui_gui_psppire_DATA): Add
+       message-dialog.glade.
+
+       * helper.c (give_help): Use GtkMessageDialog directly instead of
+       trying to reuse message-dialog code.
+
+       * message-dialog.c: Rewritten.
+
+       * message-dialog.glade: New file.
+
 2008-02-08  Jason Stover  <address@hidden>
 
        * crosstabs-dialog.c: New file.

Index: src/language/xforms/recode.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/xforms/recode.c,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -b -r1.30 -r1.31
--- src/language/xforms/recode.c        11 Nov 2007 05:51:43 -0000      1.30
+++ src/language/xforms/recode.c        10 Feb 2008 08:17:50 -0000      1.31
@@ -608,7 +608,7 @@
 
             msg_disable ();
             match = data_in (ss_buffer (value, width), LEGACY_NATIVE,
-                             FMT_F, 0, 0, &uv, 0);
+                             FMT_F, 0, 0, 0, &uv, 0);
             msg_enable ();
             out->value.f = uv.f;
             break;

Index: src/language/lexer/range-parser.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/lexer/range-parser.c,v
retrieving revision 1.11
retrieving revision 1.12
diff -u -b -r1.11 -r1.12
--- src/language/lexer/range-parser.c   9 Nov 2007 03:06:29 -0000       1.11
+++ src/language/lexer/range-parser.c   10 Feb 2008 08:17:50 -0000      1.12
@@ -99,7 +99,7 @@
     {
       union value v;
       data_in (ds_ss (lex_tokstr (lexer)), LEGACY_NATIVE,
-               *format, 0, 0, &v, 0);
+               *format, 0, 0, 0, &v, 0);
       lex_get (lexer);
       *x = v.f;
       if (*x == SYSMIS)

Index: src/language/expressions/operations.def
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/expressions/operations.def,v
retrieving revision 1.18
retrieving revision 1.19
diff -u -b -r1.18 -r1.19
--- src/language/expressions/operations.def     9 Nov 2007 03:06:29 -0000       
1.18
+++ src/language/expressions/operations.def     10 Feb 2008 08:17:51 -0000      
1.19
@@ -573,7 +573,7 @@
 function NUMBER (string s, ni_format f)
 {
   union value out;
-  data_in (ss_head (s, f->w), LEGACY_NATIVE, f->type, f->d, 0, &out, 0);
+  data_in (ss_head (s, f->w), LEGACY_NATIVE, f->type, f->d, 0, 0, &out, 0);
   return out.f;
 }
 

Index: src/language/data-io/get-data.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/data-io/get-data.c,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -b -r1.5 -r1.6
--- src/language/data-io/get-data.c     6 Feb 2008 02:08:19 -0000       1.5
+++ src/language/data-io/get-data.c     10 Feb 2008 08:17:51 -0000      1.6
@@ -24,6 +24,7 @@
 #include <data/dictionary.h>
 #include <data/format.h>
 #include <data/procedure.h>
+#include <data/settings.h>
 #include <language/command.h>
 #include <language/data-io/data-parser.h>
 #include <language/data-io/data-reader.h>
@@ -429,18 +430,29 @@
 
           lex_get (lexer);
         }
-      else if (lex_match_id (lexer, "QUALIFIER"))
+      else if (lex_match_id (lexer, "QUALIFIERS"))
         {
-          if (!set_type (parser, "QUALIFIER", DP_DELIMITED, &has_type))
+          if (!set_type (parser, "QUALIFIERS", DP_DELIMITED, &has_type))
             goto error;
           lex_match (lexer, '=');
 
           if (!lex_force_string (lexer))
             goto error;
 
+          if (settings_get_syntax () == COMPATIBLE
+              && ds_length (lex_tokstr (lexer)) != 1)
+            {
+              msg (SE, _("In compatible syntax mode, the QUALIFIER string "
+                         "must contain exactly one character."));
+              goto error;
+            }
+
           data_parser_set_quotes (parser, ds_ss (lex_tokstr (lexer)));
           lex_get (lexer);
         }
+      else if (settings_get_syntax () == ENHANCED
+               && lex_match_id (lexer, "ESCAPE"))
+        data_parser_set_quote_escape (parser, true);
       else if (lex_match_id (lexer, "VARIABLES"))
         break;
       else

Index: src/language/data-io/data-parser.h
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/data-io/data-parser.h,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -b -r1.1 -r1.2
--- src/language/data-io/data-parser.h  5 Dec 2007 06:40:13 -0000       1.1
+++ src/language/data-io/data-parser.h  10 Feb 2008 08:17:51 -0000      1.2
@@ -54,6 +54,7 @@
 void data_parser_set_empty_line_has_field (struct data_parser *,
                                            bool empty_line_has_field);
 void data_parser_set_quotes (struct data_parser *, struct substring);
+void data_parser_set_quote_escape (struct data_parser *, bool escape);
 void data_parser_set_soft_delimiters (struct data_parser *, struct substring);
 void data_parser_set_hard_delimiters (struct data_parser *, struct substring);
 

Index: src/language/data-io/data-parser.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/language/data-io/data-parser.c,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -b -r1.3 -r1.4
--- src/language/data-io/data-parser.c  19 Jan 2008 06:58:05 -0000      1.3
+++ src/language/data-io/data-parser.c  10 Feb 2008 08:17:51 -0000      1.4
@@ -54,6 +54,7 @@
     bool span;                  /* May cases span multiple records? */
     bool empty_line_has_field;  /* Does an empty line have an (empty) field? */
     struct substring quotes;    /* Characters that can quote separators. */
+    bool quote_escape;          /* Doubled quote acts as escape? */
     struct substring soft_seps; /* Two soft separators act like just one. */
     struct substring hard_seps; /* Two hard separators yield empty fields. */
     struct string any_sep;      /* Concatenation of soft_seps and hard_seps. */
@@ -94,6 +95,7 @@
   parser->span = true;
   parser->empty_line_has_field = false;
   ss_alloc_substring (&parser->quotes, ss_cstr ("\"'"));
+  parser->quote_escape = false;
   ss_alloc_substring (&parser->soft_seps, ss_cstr (CC_SPACES));
   ss_alloc_substring (&parser->hard_seps, ss_cstr (","));
   ds_init_empty (&parser->any_sep);
@@ -218,6 +220,20 @@
   ss_alloc_substring (&parser->quotes, quotes);
 }
 
+/* If ESCAPE is false (the default setting), a character used for
+   quoting cannot itself be embedded within a quoted field.  If
+   ESCAPE is true, then a quote character can be embedded within
+   a quoted field by doubling it.
+
+   This setting affects parsing of DP_DELIMITED files only, and
+   only when at least one quote character has been set (with
+   data_parser_set_quotes). */
+void
+data_parser_set_quote_escape (struct data_parser *parser, bool escape)
+{
+  parser->quote_escape = escape;
+}
+
 /* Sets PARSER's soft delimiters to DELIMITERS.  Soft delimiters
    separate fields, but consecutive soft delimiters do not yield
    empty fields.  (Ordinarily, only white space characters are
@@ -401,6 +417,7 @@
    beginning of the field on success. */
 static bool
 cut_field (const struct data_parser *parser, struct dfm_reader *reader,
+           int *first_column, int *last_column, struct string *tmp,
            struct substring *field)
 {
   struct substring line, p;
@@ -422,16 +439,34 @@
       else
         {
           *field = p;
+          *first_column = dfm_column_start (reader);
+          *last_column = *first_column + 1;
           dfm_forward_columns (reader, 1);
           return true;
         }
     }
 
+  *first_column = dfm_column_start (reader);
   if (ss_find_char (parser->quotes, ss_first (p)) != SIZE_MAX)
     {
       /* Quoted field. */
-      if (!ss_get_until (&p, ss_get_char (&p), field))
+      int quote = ss_get_char (&p);
+      if (!ss_get_until (&p, quote, field))
+        msg (SW, _("Quoted string extends beyond end of line."));
+      if (parser->quote_escape && ss_first (p) == quote)
+        {
+          ds_assign_substring (tmp, *field);
+          while (ss_match_char (&p, quote))
+            {
+              struct substring ss;
+              ds_put_char (tmp, quote);
+              if (!ss_get_until (&p, quote, &ss))
         msg (SW, _("Quoted string extends beyond end of line."));
+              ds_put_substring (tmp, ss);
+            }
+          *field = ds_ss (tmp);
+        }
+      *last_column = dfm_column_start (reader);
 
       /* Skip trailing soft separator and a single hard separator
          if present. */
@@ -444,6 +479,7 @@
     {
       /* Regular field. */
       ss_get_chars (&p, ss_cspan (p, ds_ss (&parser->any_sep)), field);
+      *last_column = dfm_column_start (reader);
       if (!ss_ltrim (&p, parser->soft_seps) || ss_is_empty (p))
         {
           /* Advance past a trailing hard separator,
@@ -491,7 +527,8 @@
         data_in (ss_substr (line, f->first_column - 1,
                             f->format.w),
                  encoding, f->format.type, f->format.d,
-                 f->first_column, case_data_rw_idx (c, f->case_idx),
+                 f->first_column, f->first_column + f->format.w,
+                 case_data_rw_idx (c, f->case_idx),
                  fmt_var_width (&f->format));
 
       dfm_forward_record (reader);
@@ -508,14 +545,17 @@
                       struct dfm_reader *reader, struct ccase *c)
 {
   enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (reader);
+  struct string tmp = DS_EMPTY_INITIALIZER;
   struct field *f;
 
   for (f = parser->fields; f < &parser->fields[parser->field_cnt]; f++)
     {
       struct substring s;
+      int first_column, last_column;
 
       /* Cut out a field and read in a new record if necessary. */
-      while (!cut_field (parser, reader, &s))
+      while (!cut_field (parser, reader,
+                         &first_column, &last_column, &tmp, &s))
        {
          if (!dfm_eof (reader))
             dfm_forward_record (reader);
@@ -524,15 +564,17 @@
              if (f > parser->fields)
                msg (SW, _("Partial case discarded.  The first variable "
                            "missing was %s."), f->name);
+              ds_destroy (&tmp);
              return false;
            }
        }
 
       data_in (s, encoding, f->format.type, 0,
-               dfm_get_column (reader, ss_data (s)),
+               first_column, last_column,
                case_data_rw_idx (c, f->case_idx),
                fmt_var_width (&f->format));
     }
+  ds_destroy (&tmp);
   return true;
 }
 
@@ -544,6 +586,7 @@
                          struct dfm_reader *reader, struct ccase *c)
 {
   enum legacy_encoding encoding = dfm_reader_get_legacy_encoding (reader);
+  struct string tmp = DS_EMPTY_INITIALIZER;
   struct substring s;
   struct field *f;
 
@@ -552,7 +595,8 @@
 
   for (f = parser->fields; f < &parser->fields[parser->field_cnt]; f++)
     {
-      if (!cut_field (parser, reader, &s))
+      int first_column, last_column;
+      if (!cut_field (parser, reader, &first_column, &last_column, &tmp, &s))
        {
          if (settings_get_undefined ())
            msg (SW, _("Missing value(s) for all variables from %s onward.  "
@@ -560,18 +604,13 @@
                        "or blanks, as appropriate."),
                 f->name);
           for (; f < &parser->fields[parser->field_cnt]; f++)
-            {
-              int width = fmt_var_width (&f->format);
-              if (width == 0)
-                case_data_rw_idx (c, f->case_idx)->f = SYSMIS;
-              else
-                memset (case_data_rw_idx (c, f->case_idx)->s, ' ', width);
-            }
+            value_set_missing (case_data_rw_idx (c, f->case_idx),
+                               fmt_var_width (&f->format));
           goto exit;
        }
 
       data_in (s, encoding, f->format.type, 0,
-               dfm_get_column (reader, ss_data (s)),
+               first_column, last_column,
                case_data_rw_idx (c, f->case_idx),
                fmt_var_width (&f->format));
     }
@@ -583,6 +622,7 @@
 
 exit:
   dfm_forward_record (reader);
+  ds_destroy (&tmp);
   return true;
 }
 

Index: src/data/data-in.h
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/data-in.h,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -b -r1.8 -r1.9
--- src/data/data-in.h  19 Jan 2008 06:58:04 -0000      1.8
+++ src/data/data-in.h  10 Feb 2008 08:17:51 -0000      1.9
@@ -27,7 +27,8 @@
 
 union value;
 bool data_in (struct substring input, enum legacy_encoding,
-              enum fmt_type, int implied_decimals, int first_column,
+              enum fmt_type, int implied_decimals,
+              int first_column, int last_column,
               union value *output, int width);
 
 #endif /* data/data-in.h */

Index: src/data/data-in.c
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/data-in.c,v
retrieving revision 1.26
retrieving revision 1.27
diff -u -b -r1.26 -r1.27
--- src/data/data-in.c  19 Jan 2008 06:58:04 -0000      1.26
+++ src/data/data-in.c  10 Feb 2008 08:17:51 -0000      1.27
@@ -90,13 +90,16 @@
    IMPLIED_DECIMALS decimal places are implied.  Specify 0 if no
    decimal places should be implied.
 
-   If FIRST_COLUMN is nonzero, then it should be the 1-based
-   column number of the first character in INPUT, used in error
-   messages. */
+   If FIRST_COLUMN and LAST_COLUMN are nonzero, then they should
+   be the 1-based column number of the first and
+   one-past-the-last-character in INPUT, for use in error
+   messages.  (LAST_COLUMN cannot always be calculated from
+   FIRST_COLUMN plus the length of the input because of the
+   possibility of escaped quotes in strings, etc.) */
 bool
 data_in (struct substring input, enum legacy_encoding encoding,
          enum fmt_type format, int implied_decimals,
-         int first_column, union value *output, int width)
+         int first_column, int last_column, union value *output, int width)
 {
   static data_in_parser_func *const handlers[FMT_NUMBER_OF_FORMATS] =
     {
@@ -131,7 +134,7 @@
   i.width = width;
 
   i.first_column = first_column;
-  i.last_column = first_column + ss_length (input) - 1;
+  i.last_column = last_column;
 
   if (!ss_is_empty (i.input))
     {
@@ -1167,11 +1170,11 @@
   ds_put_char (&text, '(');
   if (i->first_column != 0)
     {
-      if (i->first_column == i->last_column)
+      if (i->first_column == i->last_column - 1)
         ds_put_format (&text, _("column %d"), i->first_column);
       else
         ds_put_format (&text, _("columns %d-%d"),
-                       i->first_column, i->last_column);
+                       i->first_column, i->last_column - 1);
       ds_put_cstr (&text, ", ");
     }
   ds_put_format (&text, _("%s field) "), fmt_name (i->format));

Index: src/data/ChangeLog
===================================================================
RCS file: /cvsroot/pspp/pspp/src/data/ChangeLog,v
retrieving revision 1.184
retrieving revision 1.185
diff -u -b -r1.184 -r1.185
--- src/data/ChangeLog  6 Feb 2008 02:08:18 -0000       1.184
+++ src/data/ChangeLog  10 Feb 2008 08:17:51 -0000      1.185
@@ -1,3 +1,23 @@
+2008-02-09  Ben Pfaff  <address@hidden>
+
+       Add a couple of extensions to GET DATA TYPE=TXT.  Patch #6412.
+       Thanks to John Darrington for review.
+
+       * data-in.c (data_in): Add new argument to designate the last
+       column of the data field being parsed, for use in error messages.
+       Update all callers.
+
+       * data-parser (struct data_parser): New member `quote_escape'.
+       (data_parser_create): Initialize quote_escape.
+       (data_parser_set_quotes): New function.
+       (cut_field): Support escaped quotes.
+       (parse_delimited_span): Ditto.
+       (parse_delimited_no_span): Ditto.
+
+       * get-data.c (parse_get_txt): Support ESCAPE extension subcommand
+       in enhanced mode.  Only support multiple quote characters in
+       enhanced mode.
+
 2008-02-06  John Darrington <address@hidden>
 
        psql-reader.c psql-reader.h: Read more than one tuple at

Index: doc/files.texi
===================================================================
RCS file: /cvsroot/pspp/pspp/doc/files.texi,v
retrieving revision 1.14
retrieving revision 1.15
diff -u -b -r1.14 -r1.15
--- doc/files.texi      6 Feb 2008 02:08:18 -0000       1.14
+++ doc/files.texi      10 Feb 2008 08:17:52 -0000      1.15
@@ -385,7 +385,7 @@
         [/address@hidden,FIRST max_cases,PERCENT address@hidden
 
         /DELIMITERS="delimiters"
-        [/QUALIFIER="quote"]
+        [/QUALIFIER="quotes" [/ESCAPE]]
         [/address@hidden,VARIABLES address@hidden
         /VARIABLES=del_var address@hidden
 where each del_var takes the form:
@@ -417,11 +417,22 @@
 which each field appears on a separate line, specify the empty string
 for DELIMITERS.
 
-The optional QUALIFIER subcommand names a character that can be used
-to quote values within fields in the input.  A field that begins with
-the specified quote character ends at the next match quote.
-Intervening delimiters become part of the field, instead of
-terminating it.
+The optional QUALIFIER subcommand names one or more characters that
+can be used to quote values within fields in the input.  A field that
+begins with one of the specified quote characters ends at the next
+matching quote.  Intervening delimiters become part of the field,
+instead of terminating it.  The ability to specify more than one quote
+character is a PSPP extension.
+
+By default, a character specified on QUALIFIER cannot itself be
+embedded within a field that it quotes, because the quote character
+always terminates the quoted field.  With ESCAPE, however, a doubled
+quote character within a quoted field inserts a single instance of the
+quote into the field.  For example, if @samp{'} is specified on
+QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
+fields that contain @samp{a} and @samp{b}, but with ESCAPE it
+specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
+extension.
 
 The DELCASE subcommand controls how data may be broken across lines in
 the data file.  With LINE, the default setting, each line must contain
@@ -495,12 +506,12 @@
 Consider the following information on animals in a pet store:
 
 @example
-"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
+'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 , (Years), , , (Dollars), ,
-"Rover", 4.5, Brown, "12 Feb 2004", 80, True, "Dog"
-"Charlie", , Gold, "5 Apr 2007", 12.3, False, "Fish"
-"Molly", 2, Black, "12 Dec 2006", 25, False, "Cat"
-"Gilly", , White, "10 Apr 2007", 10, False, "Guinea Pig"
+"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
+"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 @end example
 
 @noindent
@@ -509,15 +520,15 @@
 @c If you change this example, change the regression test in
 @c tests/command/get-data-txt-examples.sh to match.
 @example
-GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='"'
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
         /FIRSTCASE=3
         /VARIABLES=name A10
                    age F3.1
                    color A5
                    received EDATE10
                    price F5.2
-                   needs_walking A5
-                   type A10.
+                   height a5
+                   type a10.
 @end example
 
 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED




reply via email to

[Prev in Thread] Current Thread [Next in Thread]