Often analyses assume independence between observations and related individuals must be removed in advance. UKB is large and estimating relatedness within a diverse population such as UKB is challenging. UKB have already estimated the relatedness between individuals in UKB, an we will therefore use this preprepared file. Unfortunately this file is only provided with project specific IDs, so this must be done within each application, but in practise this should really be done for each study after the required phenotype data has been identified in order to maximise sample size of the analysis.
The application specific relatedness file provided by UKB is called can be downloaded as instructed here: https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=263. Ken has already downloaded this file for the ukb18177 application.
I have created a script called ukb_relative_remover.R, which formats the UKB relatedness file, and then uses Greedy related with or without a phenotype file, and relatedness threshold specified. The script takes less than 1 second to run.
mkdir /scratch/groups/ukbiobank/usr/ollie_pain/ReQC/relative_remover
sbatch -p brc,shared --mem=1G /users/k1806347/brc_scratch/Software/Rscript.sh /users/k1806347/brc_scratch/Software/MyGit/UKB-GenoPrep/Scripts/ukb_relative_remover/ukb_relative_remover.R \
--rel_file /scratch/datasets/ukbiobank/ukb18177/raw/ukb18177_rel_s488264.dat \
--rel_thresh 0.044 \
--GreedyRelated /scratch/groups/ukbiobank/Edinburgh_Data/Software/tools/GreedyRelated/GreedyRelated \
--output /scratch/groups/ukbiobank/usr/ollie_pain/ReQC/relative_remover/ukb18177
# You can also specify a keep file (--keep) to only remove related individuals within a subset of UKB.
# The output contains two columns containing the application specific ID of participants.