[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Suggestions
From: |
Ben Pfaff |
Subject: |
Re: Suggestions |
Date: |
Thu, 29 Jan 2009 22:09:42 -0800 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) |
Rémi Dewitte <address@hidden> writes:
> On Thu, Jan 29, 2009 at 06:40, Ben Pfaff <address@hidden> wrote:
>
> Rémi Dewitte <address@hidden> writes:
>
> > Working with pspp-0.6.1 I am glad it works fine. Nevertheless I
> encountered
> > two minor issues for which I have patched a bit pspp.
> >
> > First one is the ability to import CSV file with DOS endlines. I don't
> know
> > whether it is the right place to trim the '\r'.
>
> This is not the right place to do this.
>
> You might give me some clues...
Sure, I was just in a hurry when I wrote that.
Here is my suggested substitute fix. I have not had a chance to
test that it works. Will you test it and report your results?
Reviews welcome from everyone else too, of course.
Thanks!
commit f4cc711051121873dd2e11436b10dd829094bdb9
Author: Ben Pfaff <address@hidden>
Date: Thu Jan 29 22:01:27 2009 -0800
Accept LF, CR LF, and LF as new-line sequences in data files.
Until now, PSPP has used the host operating system's idea of the
new-line sequence when reading data files and other text files.
This means that, when a file with CR LF line ends is read on an OS
that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
appears to have a CR at the the end. This commit fixes the
problem, by normalizing the new-line sequence at time of reading.
This commit eliminates a performance optimization from
ds_read_line(), because the getdelim() function that it used cannot
be made to stop reading at one of two different delimiters. If
this causes a real performance regression, then the getndelim2
function from gnulib could be used to restore the optimization.
Thanks to Rémi Dewitte <address@hidden> for pointing out the problem
and providing an initial patch.
commit 70a46fb66ae0de5e312c4fc007bddf65e8ea5ac9
Author: Ben Pfaff <address@hidden>
Date: Thu Jan 29 22:08:43 2009 -0800
Accept LF, CR LF, and LF as new-line sequences in data files.
Until now, PSPP has used the host operating system's idea of the
new-line sequence when reading data files and other text files.
This means that, when a file with CR LF line ends is read on an OS
that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
appears to have a CR at the the end. This commit fixes the
problem, by normalizing the new-line sequence at time of reading.
This commit eliminates a performance optimization from
ds_read_line(), because the getdelim() function that it used cannot
be made to stop reading at one of two different delimiters. If
this causes a real performance regression, then the getndelim2
function from gnulib could be used to restore the optimization.
Thanks to Rémi Dewitte <address@hidden> for pointing out the problem
and providing an initial patch.
diff --git a/src/libpspp/str.c b/src/libpspp/str.c
index d082672..f054c9e 100644
*** a/src/libpspp/str.c
--- b/src/libpspp/str.c
***************
*** 1,5 ****
/* PSPP - a program for statistical analysis.
! Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
--- 1,5 ----
/* PSPP - a program for statistical analysis.
! Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
***************
*** 1190,1237 ****
return st->ss.string;
}
! /* Appends to ST a newline-terminated line read from STREAM, but
! no more than MAX_LENGTH characters.
! Newline is the last character of ST on return, if encountering
! a newline was the reason for terminating.
! Returns true if at least one character was read from STREAM
! and appended to ST, false if no characters at all were read
! before an I/O error or end of file was encountered (or
! MAX_LENGTH was 0). */
bool
ds_read_line (struct string *st, FILE *stream, size_t max_length)
{
! if (!st->ss.length && max_length == SIZE_MAX)
! {
! size_t capacity = st->capacity ? st->capacity + 1 : 0;
! ssize_t n = getline (&st->ss.string, &capacity, stream);
! if (capacity)
! st->capacity = capacity - 1;
! if (n > 0)
! {
! st->ss.length = n;
! return true;
! }
! else
! return false;
! }
! else
{
! size_t length;
! for (length = 0; length < max_length; length++)
{
! int c = getc (stream);
! if (c == EOF)
! break;
!
! ds_put_char (st, c);
! if (c == '\n')
! return true;
}
!
! return length > 0;
}
}
/* Removes a comment introduced by `#' from ST,
--- 1190,1231 ----
return st->ss.string;
}
! /* Reads characters from STREAM and appends them to ST, stopping
! after MAX_LENGTH characters, after appending a newline, or
! after an I/O error or end of file was encountered, whichever
! comes first. Returns true if at least one character was added
! to ST, false if no characters were read before an I/O error or
! end of file (or if MAX_LENGTH was 0).
!
! This function accepts LF, CR LF, and CR sequences as new-line,
! and translates each of them to a single '\n' new-line
! character in ST. */
bool
ds_read_line (struct string *st, FILE *stream, size_t max_length)
{
! size_t length;
!
! for (length = 0; length < max_length; length++)
{
! int c = getc (stream);
! if (c == EOF)
! break;
! if (c == '\r')
{
! c = getc (stream);
! if (c != '\n')
! {
! ungetc (c, stream);
! c = '\n';
! }
}
! ds_put_char (st, c);
! if (c == '\n')
! return true;
}
+
+ return length > 0;
}
/* Removes a comment introduced by `#' from ST,
--
Ben Pfaff
http://benpfaff.org
- Re: Suggestions, (continued)
- Re: Suggestions, Rémi Dewitte, 2009/01/28
- Re: Suggestions, John Darrington, 2009/01/28
- Re: Suggestions, Ben Pfaff, 2009/01/29
- Re: Suggestions, Rémi Dewitte, 2009/01/29
- Re: Suggestions, John Darrington, 2009/01/29
- Re: Suggestions, Rémi Dewitte, 2009/01/29
- Re: Suggestions, John Darrington, 2009/01/30
- Re: Suggestions, Ben Pfaff, 2009/01/30
- Value label length [Was Re: Suggestions], John Darrington, 2009/01/30
Re: Suggestions, Ben Pfaff, 2009/01/29