Within King’s College London (KCL) and within other institutes, a single UKB genotype dataset is shared among multiple applications to avoid institutes to house multiple copies of the very large dataset. Each application will recieve a .fam and .sample file linking their application specific IDs to the genotype data. Therefore, as long as the order of individuals is maintained, the genotype-only data derivatives can also be shared across applications. For example, UKB genotype data that has undergone further quality control (QC) can be shared across applications, rather than each application generating their own version of the dataset. This saves times, helps people without the required expertise to use UKB genetic data, and ensures consistency across applications. However, it is essential that no application specific data is used when deriving data shared across applications, as this breaks the data agreement with UKB.

When recieved, the UKB genotype data is split into two folders:

Genotypes - Contains observed (no imputation) genotype data
Imputed - Contains imputed genotype data

1 Observed genotype data

Here, the files of main interest are the binary plink format data merged across all chromosomes:

ukb_binary_v2.bed
ukb_binary_v2.bim

2 Imputed genotype data

Here, the files of main interest are the following:

ukb_imp_chr*_v3.bgen - dosage values for imputed variants split by chromosome
ukb_mfi_chr*_v3.txt - per variant information for imputed variants split by chromosome
ukb_sqc_v2.txt - per sample quality control information
ukb_imp_chr*_v3_MAF0_INFO7.bgen - dosage values for imputed variants split by chromosome but restricted to variants with MAF > 0.01 and INFO > 0.7 (made by Joni)

Shared UKB Genotype Data

1 Observed genotype data

2 Imputed genotype data