|
From: | Mike Miller |
Subject: | uint2? |
Date: | Thu, 2 Dec 2010 18:49:51 -0600 (CST) |
User-agent: | Alpine 2.00 (DEB 1167 2008-08-23) |
Side remark: We typically count the number of minor alleles for a biallelic marker. Suppose a single-nucleotide marker has alleles A and T and suppose that A is the rarer of the two (the "minor allele"). Then we would count the number of A alleles per genotype: TT = 0, AT = 1, AA = 2.
Thus, the data could be stored using 00, 01, 10 and 11 (missing) and we could store four genotypes per byte instead of only one.
This scheme is used by the GPL-licensed program PLINK. It uses it to store data files and also to work with the data in memory. Even with the PLINK system it's pretty easy to have data that use a full gigabyte, so it provides a very significant savings in RAM.
I'm asking because I'm wondering if it is conceivable that a uint2 type could be developed for Octave. Or the type could be a special snp type where binary 11 always referred to a missing value (NA when displayed or stored in text output).
I have no idea how much work that would be. I'm willing to work on it, but I'm also not much of a programmer, so I doubt I could add much. The availability of the PLINK GPL'd code could help a lot, I suppose. I don't know if R developers have been working on this problem, but I'll find out (a lot of genetics researchers use R).
Mike
[Prev in Thread] | Current Thread | [Next in Thread] |