pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Combining comma separate files


From: John Darrington
Subject: Re: Combining comma separate files
Date: Tue, 12 Nov 2013 08:28:52 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Nov 11, 2013 at 10:35:40PM -0500, Ken Singh wrote:
     Hello,
     
     It is unclear to me how to quickly and efficiently combine a set of comma
     separated files into one data file. The easy solution would be to use the
     unix 'cat' command to concatenate the files then import using the graphical
     interface.  However, I'm interested in a purely PSPP based solution.
     
     I have tried GET DATA in combination with SAVE but it appears that the
     dataset must be made active.  I am not certain about this step.
     
     GET DATA
         /TYPE=TXT /ARRANGEMENT=DELIMITED
         /FILE='e:\Dropbox\data\raw1.txt'
         /DELIMITERS=','
     
      SAVE
         /OUTFILE = 'e:\dropbox\data\tmp.sav'

Here is one way you could solve that problem, assuming that both your
CSV files have the same arrangement:

dataset declare d_one.
dataset activate d_one.
GET DATA /TYPE=TXT /FILE='one.csv' /VARIABLES=x F8.2 y F8.2 z F8.2.

dataset declare d_two.
dataset activate d_two.
GET DATA /TYPE=TXT /FILE='two.csv' /VARIABLES=x F8.2 y F8.2 z F8.2.

dataset declare d_concat.
dataset activate d_concat.

ADD FILES /FILE=d_one /FILE=d_two.

LIST.

     
     It also appears that the VARIABLES subcommand is required.  Is there a
     solution for when one has dozens of variables?

The problem with CSV is that there is no metadata.  How should pspp (or anyone 
else!) 
know if a column is to be interpreted as a string, a date, or whatever?  If you 
happen to know that all your files have the same arrangement, then one solution 
you 
could try, is to use psppire's import function to "guess" the arrangement of 
each file
(hopefully it should guess each one identically) and save to a .sav - then you 
can use
ADD FILES to concatonate all the files at once.
     
     The bigger problem is that I have many raw#.txt files, not necessarily
     contiguously numbered.  Any suggestions would be most appreciated.
     
Again, if the order that the .txt files should be read cannot be determined from
the names, then you must tell it.  At the end of the day, pspp is a statistical 
analysis tool, not an artificial intelligence engine (although the format 
guesser does
attempt to go a small way in that direction).

Hope this is helpful.



-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]