The analysis steps will be:
-
See if we can find an association between all (PFS+PAS+PSSD+etc) cases and controls
-
Check if each group shows an association by itself. At this point, if neither 1 nor 2 show anything significant, there is no point in looking further. Reasons for this outcome could be a) insufficient number of genomes combined with b) too many snp’s involved, or c) 23andMe array doesn’t cover snp’s relavant to our disease or d) P-etc. has no genetic component (highly unlikely).
-
Assuming 2 turns out positive, and if we have sufficient genomes, the next step will be to see if we can find a difference between “mild” cases and ones which have vast and strong symptoms. PFS etc. is what is called a continuous trait in genetics, meaning that the phenotype (presentation of the disease in our case) varies strongly between cases. The results of the survey will help us define some categories along the severity scale, and it would be extremely interesting to see if we can find alleles which predict those categories. Currently, however, it is extremely unlikely that we can get to this stage with the number of genomes we have. What you are suggesting would require even more.
At this stage, I would already be thrilled out of my mind if we could find an overall association, and perhaps demonstrate that the genotype is the same between PFS, PAS, PSSD and P-etc. groups. This would be huge beyond imagination, and help catapult us into NIH funding territory.