23andme Patient Data Analysis

A “theory neutral” analysis of the 23andMe data of chromosomes 1-22 has been completed .

That is, no particular marker was intentionally searched for with bias, but a very “by the book” associative analysis of a subset of quality-controlled genotyped and imputed SNPs from chromosomes 1-22 was performed to see if any broke past the established ‘standard’ GWAS significance threshold of p<5x10^-8.

  • Unfortunately, and quite expectedly, no single SNP was observed beyond this significance threshold.

  • There didn’t appear to be any greater correlation between either case group or control group and over-represented SNPs in any trait/disease featured on ImputeMe.

  • Nothing in the top results of the gene or pathway analyses this round sticks out as indicative of epigenetic mechanisms, known diseases, common symptoms in PFS/PAS/PSSD, steroid metabolism, AR signalling, known diseases, immunity, autoimmunity, gluten intolerance, etc…

Basically, a proverbial “needle in the haystack” wasn’t discovered in these results.

However; there were multiple SNPs in a small region of chromosome 4 in the imputed data (only 1 of these was present in the genotyped 23andMe data) that were near the significance threshold at Nx10^-7. I am going to approach an acquaintance with GWAS experience to ask if this may be suggestive of anything. It appears to be an intergenic region of little known consequence.

Links to the summary results and gene and pathway analyses will be posted soon. Be patient.

If it is advisable, some more imputed genomes could be easily added to the analysis since the process is practically automated after all this hard work. A strictly post-finasteride patient analysis could also be easily performed soon. Chromosome X analysis may or may not be worth the time and effort but I will briefly look into it.

9 Likes

Some preliminary results:

Quality-controlled independent SNPs taken from genotypes imputed against a 1kGv3 reference panel by the Impute.Me service were used for the associative analysis. All pre and post-Imputation genomes were mapped to GRCh37

Selection of controls from 23andMe data obtained from OpenSNP was performed using the R package, PCAmatchR, prior to imputation.

QC and estimation of superpopulation was conducted according to 23andMe chip version, then intersecting SNPs across all 3 chip versions used were merged for the last 2 analyses.

SNPs on CHR 1-22 of All European samples genotyped on “v5” 23andMe.
Summary statistics: https://www.dropbox.com/s/1qj7o3o0qtsresg/sorted-v5Eur-assoc2.assoc?dl=0
FUMA results: https://fuma.ctglab.nl/browse/129

SNPs on CHR 1-22 of All European samples genotyped on v3, v4, and v5 23andMe chips
Summary statistics: https://www.dropbox.com/s/oz2ey7pt37tq2sf/sorted-vAllEur-assoc2.assoc?dl=0
FUMA results: https://fuma.ctglab.nl/browse/131

SNPs on CHR 1-22 of All samples genotyped on v3, v4, and v5 23andMe chips
Summary statistics: https://www.dropbox.com/s/9ledhm9bdh7bary/sorted-All-assoc2.assoc?dl=0
FUMA results: https://fuma.ctglab.nl/browse/130

The only “hotspot” (near significance threshold) was the chr4 ~10179769-10194628 region.
The only nearby genes were characterized as a couple pseudogenes, along with WDR1 and a non-coding micro-RNA, MIR3138, which appears to be located in an intron of WDR1.

7 Likes

Another analysis of the same data may be posted in the near future.

Even though the official submission period is over and we are no longer actively requesting 23andMe data, if anyone has recent 23andMe data they have obtained for personal use that they would like to contribute, please PM me. This analysis could be performed periodically for every ~10 or so genomes to add to the statistical power.

Thanks for the update, Dubya.

When you say preliminary results, what beyond what has already been analysed could occur at a later stage (aside from more test submission)?

1 Like

It was mentioned earlier in this topic that someone with prior GWAS experience has helped a great deal as a guide with the analysis. Only about %5 (1.8 million / 36 million) of the SNPs from the imputed genomes passed the quality control checks and contributed to what was presented as preliminary results. I am hopeful that this person can help increase that number in a future analysis, without appreciably sacrificing the quality of the markers or the statistics derived from them. They have also made some good suggestions to improve interpretation of the results.

2 Likes

Is there any PFS-related DNA research going on right now that can receive european batches? I would like to participate.