Skip to main content
Fig. 2 | BMC Genomic Data

Fig. 2

From: Overestimated prediction using polygenic prediction derived from summary statistics

Fig. 2

PRS performance comparisons for Alzheimer’s disease. ΔAUC and ΔR2 denote the additive gain from introducing PRS term to Model II (refer to Materials and Methods for details). For convenience, we abbreviate the discovery and test sets as D and T, respectively. (A) AD prediction performances with and without subject overlap (D: ADSP, T: AMP-AD). All metrics of overlapping subjects are overestimated, growing in an increasing number of SNPs. (B) sPRS (D: IGAP, T: ADSP) is compared to rPRS (D: ADSP, T: ADSP). (C) AMP-AD data is another T for rPRS (D: ADSP) and sPRS (D: IGAP). D and T of ADSP data are derived from tenfold cross-validation. In both (B) and (C), sPRS performances are significantly higher than rPRS, and we suspect that some participants of IGAP are identical to a subset of ADSP or AMP-AD. (D) A simulated study is conducted with rPRS (D: ADSP, T: AMP-AD), in which a subset of D replaces a growing number of subjects in T (see Results for details). The number of SNPs in the x-axis denotes number of the LD pruned SNPs selected in the order from the lowest P-value thresholds. That is, the lower number of SNP in the left side means the stricter P value threshold and the right-most side is the most generous P value threshold (P < 0.5)

Back to article page