Summary

Polygenic scores are a promising tool for informing personalised medicine. Many methods have been developed to calculate polygenic scores, but it is unclear which methods perform best. We benchmarked a range of leading methods, evaluating their performance in a range of scenarios to guide future research and clinical implementation.

We applied polygenic scoring methods to genome-wide association study (GWAS) summary statistics for a range of outcomes, with varying genetic architecture. We evaluated the predictive utility of polygenic scores in two target samples, including UK Biobank and the Twins Early Development Study (TEDS).

In the original study, we including the following polygenic scoring methods: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR. We have subsequently added the method MegaPRS. We explored the three strategies for selecting hyperparameters within the polygenic scoring methods, including cross-validation to select a single hyperparameter, pseudo-validation using summary statistics only, or modelling polygenic scores from a range of hyperparameters using an elastic net.

We used a reference-standardised approach throughout, meaning the SNP-weights used to generate the polygenic scores are independent of the target sample. This approach improves the generalisability of polygenic score associations across studies, and enables calculation of polygenic scores for a single individual.

Polygenic scoring methods comparison for UKB target sample with 1KG reference: A) Average test-set correlation between predicted and observed values across phenotypes. B) Average difference between observed-prediction correlations for the best pT+clump polygenic score and all other methods.

Show runtime of methods

Time taken to run each polygenic scoring method on chromosome 22. A = All methods. B = Excludes PRScs and LDpred2 (without compact SFMB format) to highlight finer differences. LDpred2_compact = LDpred2 with compact SFMB format

Note: These benchmarking results are from our publication in 2021. LDpred2 is now substantially faster. For a more up-to-date benchmark, see here.

Citation

Pain, Oliver, et al. “Evaluation of polygenic prediction methodology within a reference-standardized framework.” PLoS genetics 17.5 (2021): e1009021. https://doi.org/10.1371/journal.pgen.1009021

Additional resources

Verbose summary of code and results: Original and Update including MegaPRS
Code used to prepared reference genetic data: 1KG and UK Biobank
Code used to prepared target genetic data: here
Code used to define outcomes in UKB and TEDS: here

Comparison of existing polygenic scoring methodology

Summary

Citation

Additional resources