TitlePopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.
Publication TypeJournal Article
Year of Publication2019
AuthorsGurinovich, A, Bae, H, Farrell, JJ, Andersen, SL, Monti, S, Puca, A, Atzmon, G, Barzilai, N, Perls, TT, Sebastiani, P
JournalBioinformatics
Volume35
Issue17
Pagination3046-3054
Date Published09/2019
ISSN1367-4811
Abstract

MOTIVATION: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery.

RESULTS: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

AVAILABILITY AND IMPLEMENTATION: PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

DOI10.1093/bioinformatics/btz017
Alternate JournalBioinformatics
PubMed ID30624692
Grant ListR21 AG056630 / AG / NIA NIH HHS / United States
U01 AG023755 / AG / NIA NIH HHS / United States
U19 AG023122 / AG / NIA NIH HHS / United States