pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compare two data files


From: Michelle Parker
Subject: Re: Compare two data files
Date: Wed, 4 Aug 2010 14:40:22 +1000

Hi John

Thanks for your reply!

FLIP seems to truncate the variable names to 8 chars, and remove the string values so that's not really an option.

We did try exporting to text, but we need to get this working for our testers, and exporting and running the diff is quite complicated for them.  It also seems that the txt export joins variable names together and breaks lines after 76 chars which makes it very hard to diff.

Exporting to cvs seems to be the best option at the moment and comparing manually, as the expected/allowed differences in values are grouped at the end of the variables which makes other differences easier to spot.

Tried to add DROP to the MATCH FILES to omit the 55 variables that are allowed to be different, but it crashes, so I just cleared these variables from the input files which makes MATCH FILES work well. The problem is the actual differences are not highlighted and they are often very difficult to spot manually and it is a pain to delete 110 variables by hand each time.

Also it is a pain that Copy and Paste don't seem to work for us in the Syntax editor on Windows psppire 0.7.4.

I was thinking about trying to use the SPSS Identify Duplicate Cases option, but I can't find any reference to this being available in PSPP.

Anyway thanks very much for your suggestions!

ciao
mich


On 03/08/2010, at 10:30 PM, John Darrington wrote:

Well I would seriously consider Ben's suggestion of exporting to text and using the
posix diff utility.

Another possibility, which may be of use since you have a lot of variables but only
a few cases, is to use the FLIP command.  Then you will have a lot of cases but fewer variables,
which will make it feasible to calculate the difference between them with a command like
CALCULATE diff_X = x_1 - x_2.
Then any non-zero values you know highlight a difference in the input.

J'

On Mon, Aug 02, 2010 at 04:43:57PM +1000, Michelle Parker wrote:
    HI John

    Thanks, this works and is great!

    But, I'm finding each file has some allowed differences, e.g. dates, times, durations, so every file will be found in this list.

    Since there are probably 20% of the values that are different between files, it would be much easier just to list the values when they are different.

    Can I highlight every individual difference?

    Non-different values could be empty or even spaces to make the important differences easier to spot in the output.

    What do you think?

    Much appreciated!

    thanks
    mich

    On 20/07/2010, at 1:31 AM, John Darrington wrote:

One way to do this is as follows:

MATCH FILES
      /FILE='f1.sav' /IN=file1 /SORT
      /FILE='f2.sav' /IN=file2 /SORT
      /BY ALL
      .

SELECT IF file2=0 OR file1=0.


LIST.

This will show a list of all the cases which don't match.  And you get two extra
variables file1 and file2 showing where those cases came from.

J'


On Mon, Jul 19, 2010 at 02:09:42PM +1000, Michelle Parker wrote:
   Hi Michel

   Thanks for getting back to me.

   The files have 730 variables, types and lengths are identical.
   There are 13 cases in each file.

   Some of the cases may have different values (eg date/times) but in general they should be the same between files. Specifically I need to know if there are any differences.

   thanks!
   mich





   On 19/07/2010, at 12:48 PM, Michel Boaventura wrote:

Hello Michelle,

Would you like to compare the variables or the cases on the files? If the variables,
it matters if they have the same name but diverge on type, length, etc?

Regards,

Michel

_______________________________________________
Pspp-users mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/pspp-users

   ---------------------------------------
   Michelle Parker
   Web Objectives Pty Ltd
   33 Ridge St
   Gordon, NSW, 2072
   Australia
   Phone: (02) 9499 3166
   Fax: (02) 9499 3166
   Mobile : 0412 064 123
   address@hidden
   ---------------------------------------




   _______________________________________________
   Pspp-users mailing list
   address@hidden
   http://lists.gnu.org/mailman/listinfo/pspp-users


--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.



    ---------------------------------------
    Michelle Parker
    Web Objectives Pty Ltd
    33 Ridge St
    Gordon, NSW, 2072
    Australia
    Phone: (02) 9499 3166
    Fax: (02) 9499 3166
    Mobile : 0412 064 123
    address@hidden
    ---------------------------------------




    _______________________________________________
    Pspp-users mailing list
    address@hidden
    http://lists.gnu.org/mailman/listinfo/pspp-users


--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.



---------------------------------------
Michelle Parker
Web Objectives Pty Ltd
33 Ridge St
Gordon, NSW, 2072
Australia 
Phone: (02) 9499 3166
Fax: (02) 9499 3166
Mobile : 0412 064 123
---------------------------------------




reply via email to

[Prev in Thread] Current Thread [Next in Thread]