[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "tags" in PSPP - can a variable have multiple values for the same ca
From: |
John Darrington |
Subject: |
Re: "tags" in PSPP - can a variable have multiple values for the same case? |
Date: |
Sun, 18 Jan 2015 08:36:22 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Jan 16, 2015 at 09:54:08PM +0100, Benjamin Oppermann wrote:
Hello,
This is basically a question about cleaning up my data:
I made a survey where for some of the questions, the answer could be any
number of keywords. The informant could be more or less specific to
their liking. This was intended to work like tags (they are
comma-separated), but we didn't consider that they would still end up in
the same column of the output.
This means, for example the values for "place of origin" could be as
precise as "Hannover, Niedersachsen, Germany, Europe" or "Hannover,
Germany", or just "Hannover". All in the same variable!
I currently have this variable specified as a string in PSPP, which
means each represents a different value for PSPP.
I want to achieve an outcome where all these values would read the same,
i.e. the region, "Nidersachsen" in this example. Since my sample
includes ~360 cases, I'd like to find a way to do this automatically.
I might be able to do it in another program like a spreadsheet
application or GoogleRefine/OpenRefine, but maybe do you know a way to
recode this variable in PSPP? Any suggestions?
Regards,
Ben
There are a number of string functions which you could use to parse the string
variable into several other variables. See section 7.7.7 of the manual.
But I think I agree with other people, that this is best done manually. Doing
it automatically is an exercise in artificial intelligence. One could use a
self learning neural net to deal with typos: Eg: "Niedersachsen" vs.
"Nidersachsen". A clever algorithm could also deal with umlauts. But it would
be very hard to get an automated program which, for example, without being
specifically programmed, could know that "Munich" and "Muenchen" are the same
place.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature