Polygenic scores are a promising tool for informing personalised medicine. Many methods have been developed to calculate polygenic scores, but it is unclear which methods perform best. We benchmarked a range of leading methods, evaluating their performance in a range of scenarios to guide future research and clinical implementation.
We applied polygenic scoring methods to genome-wide association study (GWAS) summary statistics for a range of outcomes, with varying genetic architecture. We evaluated the predictive utility of polygenic scores in two target samples, including UK Biobank and the Twins Early Development Study (TEDS).
In the original study, we including the following polygenic scoring methods: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR. We have subsequently added the method MegaPRS. We explored the three strategies for selecting hyperparameters within the polygenic scoring methods, including cross-validation to select a single hyperparameter, pseudo-validation using summary statistics only, or modelling polygenic scores from a range of hyperparameters using an elastic net.
We used a reference-standardised approach throughout, meaning the SNP-weights used to generate the polygenic scores are independent of the target sample. This approach improves the generalisability of polygenic score associations across studies, and enables calculation of polygenic scores for a single individual.
Note: These benchmarking results are from our publication in 2021. LDpred2 is now substantially faster. For a more up-to-date benchmark, see here.
Pain, Oliver, et al. “Evaluation of polygenic prediction methodology within a reference-standardized framework.” PLoS genetics 17.5 (2021): e1009021. https://doi.org/10.1371/journal.pgen.1009021