ldsc 0.3.1

LD Score Regression — fast Rust reimplementation of Bulik-Sullivan et al. LDSC
# Overview

## Pan-ancestry GWAS of UK Biobank[#]https://pan.ukbb.broadinstitute.org/docs/technical-overview#pan-ancestry-gwas-of-uk-biobank "Direct link to heading"

Here, we present a multi-ancestry analysis of 7,228 phenotypes using a generalized mixed model association testing framework, spanning 16,131 genome-wide association studies. We provide standard meta-analysis across all populations and with a leave-one-population-out approach for each trait. We develop a stringent quality control pipeline, identifying variants that are discrepant with gnomAD frequencies, and make recommendations for filtering these and other GWAS results.

### Multi-ancestry analysis[#]https://pan.ukbb.broadinstitute.org/docs/technical-overview#multi-ancestry-analysis "Direct link to heading"

Participants have been divided into ancestry groups to account for population stratification in GWAS analyses. Throughout these docs, these ancestry groupings are referred to by 3-letter ancestry codes derived from or closely related to those used in the 1000 Genomes Project and Human Genome Diversity Panel, as follows:

-   `EUR` = European ancestry
-   `CSA` = Central/South Asian ancestry
-   `AFR` = African ancestry
-   `EAS` = East Asian ancestry
-   `MID` = Middle Eastern ancestry
-   `AMR` = Admixed American ancestry

These codes refer only to ancestry groupings used in GWAS, not necessarily other demographic or self-reported data.

### Release data[#]https://pan.ukbb.broadinstitute.org/docs/technical-overview#release-data "Direct link to heading"

We release the summary statistics in two formats:

-   For one or a few phenotypes, we recommend using the phenotype-specific flat files: see further description [here]https://pan.ukbb.broadinstitute.org/docs/per-phenotype-files.
-   For analysis the full dataset (all phenotypes, all populations), the summary statistics are available in Hail formats: see further description [here]https://pan.ukbb.broadinstitute.org/docs/hail-format.

### Approach[#]https://pan.ukbb.broadinstitute.org/docs/technical-overview#approach "Direct link to heading"

Analysis was done using [SAIGE](https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE) implemented in [Hail Batch](https://hail.is/docs/batch/index.html) to parallelize across populations, phenotypes, and regions of the genome. More details can be found below:

-   Details about the QC process can be found [here]https://pan.ukbb.broadinstitute.org/docs/qc including [determination of ancestry groups]https://pan.ukbb.broadinstitute.org/docs/qc#ancestry-definitions.
-   Description of GWAS pipeline and implementation can be found on our [Github]https://github.com/atgu/ukbb_pan_ancestry/wiki/Batch-pipeline.

The sample size for each population and the number of phenotypes run is as follows:

| Population | Num. Individuals | Num. Phenotypes |
| --- | --- | --- |
| AFR | 6636 | 2493 |
| AMR | 980 | 1105 |
| CSA | 8876 | 2771 |
| EAS | 2709 | 1612 |
| EUR | 420531 | 7200 |
| MID | 1599 | 1372 |

Each phenotype may have fewer samples run, depending on data missingness, which can be found in the [phenotype manifest](https://docs.google.com/spreadsheets/d/1AeeADtT0U1AukliiNyiVzVRdLYPkTbruQSk38DeutU8/edit#gid=903887429), or `n_cases` and `n_controls` in the Hail MatrixTable.