pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "tags" in PSPP - can a variable have multiple values for the same ca


From: John Darrington
Subject: Re: "tags" in PSPP - can a variable have multiple values for the same case?
Date: Sun, 18 Jan 2015 08:36:22 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Jan 16, 2015 at 09:54:08PM +0100, Benjamin Oppermann wrote:
     Hello,
     This is basically a question about cleaning up my data:
     I made a survey where for some of the questions, the answer could be any
     number of keywords. The informant could be more or less specific to
     their liking. This was intended to work like tags (they are
     comma-separated), but we didn't consider that they would still end up in
     the same column of the output.
     This means, for example the values for "place of origin" could be as
     precise as "Hannover, Niedersachsen, Germany, Europe" or "Hannover,
     Germany", or just "Hannover". All in the same variable! 
     I currently have this variable specified as a string in PSPP, which
     means each represents a different value for PSPP.
     I want to achieve an outcome where all these values would read the same,
     i.e. the region, "Nidersachsen" in this example. Since my sample
     includes ~360 cases, I'd like to find a way to do this automatically.
     I might be able to do it in another program like a spreadsheet
     application or GoogleRefine/OpenRefine, but maybe do you know a way to
     recode this variable in PSPP? Any suggestions?
     Regards,
     Ben
     

There are a number of string functions which you could use to parse the string
variable into several other variables.  See section 7.7.7 of the manual.

But I think I agree with other people, that this is best done manually.  Doing
it automatically is an exercise in artificial intelligence.  One could use a
self learning neural net to deal with typos: Eg: "Niedersachsen" vs. 
"Nidersachsen".  A clever algorithm could also deal with umlauts.  But it would 
be very hard to get an automated program which, for example, without being 
specifically programmed, could know that "Munich" and "Muenchen" are the same 
place.

J'

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]