A tag snp is a representative single nucleotide polymorphism snp in a region of the genome with high linkage disequilibrium that represents a group of snps called a haplotype. Effective tagging singlenucleotide polymorphism snpset selection is crucial to snpset analysis in genomewide association studies gwas. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms snps to improve rare variant imputation. We incorporate functional predictions of protein structure, gene regulation, splicing and mirna binding.
Selection of genetic markers this chapter focuses on single nucleotide polymorphisms snps the most common form of variation in the human genome. Selecting tagging snps for association studies using power. Tag snp selection for candidate gene association studies. One application is to select a subset of the single nucleotide polymorphism snp biomarkers from the whole snp set that is informative and small enough for subsequent association studies. Tag snp selection and association studies over the past few years, numerous disease association studies, both genomewide and. Selecting a maximally informative set of singlenucleotide polymorphisms for association analyses using linkage disequilibrium. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based featureselection. Selection of representative snp sets for genomewide association. How to select tag snps in genetic association studies. The value of genebased selection of tag snps in genomewide.
The ld measure r 2 has been used for tag snp selection 1, 12 because the statistical power of association studies is proportional to the value of r 2. The power of association studies based on tag snps using genotype data is similar to that using haplotype data. Several methods have been proposed for selecting sets of genetic markers that characterize the polymorphisms in a region of interest 9. The power of intelligent snp selection the infinium assay provides the freedom to design the most powerful genotyping panels. Snp pvalue data and finds all snps in high ld with gwas snps, so that selection is from a much larger set of snps than the gwas itself. Citeseerx tag snp selection for association studies. Pdf tag snp selection for candidate gene association. Snp selection for pharmacogenetic association studies is discussed. This power provides the fastest path to discovery and publication. Transferability of tag snps in genetic association studies in.
Selection of snp subsets for association studies in. Snp tagging and then evaluates freely available software for the selection of tag snps for genetic association studies. Most of the existing tagging snpset selection methods cannot make full use of the information hidden in common or rare variants associated diseases. Single nucleotide polymorphisms, frequently called snps pronounced snips, are the most common type of genetic variation among people. For comparing alternative tag snp selection algorithms, we use coalescent simulation to.
Tag snp selection for candidate gene association studies using hapmap and gene resequencing data article pdf available in european journal of humangenetics 1510. Jun, 2007 hapmap provides linkage disequilibrium ld information on a sample of 3. Efficiency and power in genetic association studies. Modelbased clustering for identifying diseaseassociated. In anticipation of costeffective snp genotyping technologies and the availability of databases of a large number of candidate snps, many investigators are seriously considering genomewide snp scans with the hope of performing hypothesisfree disease association studies as opposed to hypothesisdriven candidate gene or region studies. Increasing the power of association studies by imputation. Here, for any given subset of snps within a block, all pairwise r 2 values between the snps in this subset and the snps absent in this subset are calculated. Currently, typical genomewide association studies measure hundreds of thousands, or millions, of genetic variants.
Summary illuminas tag snp approach in tandem with the powerful infinium as. Many methods have been developed, and new methods for tag snp selection are continuously being developed. This reduces the expense and time of mapping genome areas. Each snp represents a difference in a single dna building block, called a nucleotide. Selecting the smallest subset of tag snps that can predict the other snps would considerably minimize the complexity of genomewide or blockbased snpdisease association studies. Tag snp selection for candidate gene association studies using. Pdf haplotype block partitioning and tag snp selection.
Tagsnp selection based on pairwise ld criteria and power. Research article open access an efficient weighted tag snpset analytical method in genomewide association studies bin yan1, shudong wang1,2,3, huaqian jia1, xing liu1 and xinzeng wang1 abstract background. Increasing power of genomewide association studies by. Tagsnp selection isan important step indesigning case control association studies. Although there is a broad literature on bayesian variableselection underhigh orultrahighdimensional. Selection and evaluation of tag snps tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. Poweroptimized tag snp selection our poweroptimized tag snp selection method is a stepwise greedy procedure to maximize power. In low and mediumbudget association studies, a limited number of tag snps are selected out of a large set of available snps previously typed in an initial cohort. A tool for selecting snps for association studies based on. Imputation aware tag snp selection to improve power for. Transferability of tag snps in genetic association studies. Dec 01, 2004 tag snp selection for association studies tag snp selection for association studies stram, daniel o. Prioritize and select snps for association studies with multistage.
Sample sizes required at different powers of detecting. Tag snp selection using particle swarm optimization chuang. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies. The use of highdensity tag snp arrays mainly illumina hap300 and hap550 for genomewide association studies has virtually revolutionized the field and led to the identification of strong susceptibility loci for several types of malignancies, including breast cancer hunter et al. R, should rise up by increasing the number of predicted snps or by decreasing the. Millions of polymorphisms are reported and freely available in public databases. Tag snp selection and its applications in association studies. For a tag snp selection problem using pairwise r 2, one can construct a graph g v,e with each vertex v i representing an snp s i. In tag snp selection problem, our goal is to achieve a feasible solution with smallest number of htsnps. The program can also identify and choose tag snps for snps not in high ld with any gwas snp. Analysis of two different sets of snp genotype data from the hapmap is used to judge the practical aspects of using.
Selection of these tag snps poses several challenges as rare variants tend to be. Structured genomewide association studies with bayesian. A tag snp is a representative single nucleotide polymorphism in a region of the genome with high linkage disequilibrium the nonrandom association of alleles at two or more loci. Multimarkerld based genetic algorithm for tag snp selection. Snps hold much promise as a basis for genomewide diseasegene association. For a candidate gene study, researchers can choose their tag snps. Imputation aware tag snp selection to improve power for multi.
Tag snp selection for association studies stram 2004. At the time of this study, genotypes based on resequencing data were available from the egp website for 52 387 snps in 391 genes from egp. Tag snp selection using particle swarm optimization. These snps are usually chosen from haplotype data and are thus called haplotype tag snps htsnps. For example, a snp may replace the nucleotide cytosine c with the nucleotide thymine t in a certain stretch of dna. Acknowledgments i would like to express my deepest gratitude to my. Abstract this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense. The tagsnp set that covers all snps is therefore the dominating set of the graph g. The recent advances in genotyping and molecular techniques have greatly increased the knowledge of the human genome structure.
The aim of this chapter is not to enumerate and detail all available methods for haplotype block partitioning and tag snp selection, but rather to focus on how to use the available methods, tools, and resources to facilitate tag snp selection in association studies. It combines the simplicity of pairwise tagging methods with the efficiency benefits of multimarker haplotype approaches. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has. Haplotype block partitioning and tag snp selection using. Genomewide association studies gwass aim to detect genetic risk factors for complex human diseases by identifying diseaseassociated singlenucleotide polymorphisms snps.
Pdf tag snp selection and its applications in association studies. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies genevieve l. It is possible to identify genetic variation and association to phenotypes without genotyping every snp in a chromosomal region. The value of genebased selection of tag snps in genome.
Recently, several methods have been published to select subsets of. The differential pattern of mf and tl variation of snps was critical to effective tag snp selection, since the top snps were clearly distinct in the histograms of those windows fig. Laboratory of molecular carcinogenesis, national institute of environmental health sciences, research triangle park, nc 27709, usa. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based feature selection. Sep 15, 2004 2 the dependence of the performance of tag snp selection methods upon the density of snp markers genotyped for the purpose of haplotype discovery and tag snp selection. Title page increasing the power of association studies by. The power of genomewide association studies can be computed using a set of tag. Genomewide association studies are a promising new tool for deciphering the genetics of complex diseases. Citeseerx document details isaac councill, lee giles, pradeep teregowda. The emergence of very large cohorts in genomic research has facilitated a focus on genotypeimputation strategies to power rare variant association. Bayesian variable selection regression for genomewide. Imputationaware tag snp selection to improve power for.
The advantage is particularly striking when the set of tag snps is sparse. In a typical gwas, an informative subset of the singlenucleotide polymorphisms snps, called tag snps, is genotyped in casecontrol individuals. Hapmap provides linkage disequilibrium ld information on a sample of 3. Significant genetic association may be interpreted as either 1 direct association, in which the genotyped snp is the true causal variant conferring disease susceptibility. These tag snps are then typed in a larger set of control and affected individuals. Tag snps are useful in wholegenome snp association studies, in. Therefore, less number of htsnps and more number of predictable snps cause a better fitness. Power calculations are important at the study design stage to ensure successful results.
Once the tag snp statistics are computed, the genomic regions that are in linkage disequilibrium ld with the most. Oct 23, 2005 we investigated selection and analysis of tag snps for genomewide association studies by specifically examining the relationship between investment in genotyping and statistical power. Power analysis for genomewide association studies bmc. Tag snp selection for association studies stram 2004 genetic. These snps are usually chosen from haplotypes and called haplotype tag snps htsnps. Linkage disequilibrium ld, which refers to the nonrandom association of alleles at different loci lewontin 1964 in haplotypes, plays a central role in genomewide association studies for. Criteria for the selection of single nucleotide polymorphisms in pathway pharmacogenetics. Therefore, it is essential to select only informative snps representing the original snp distributions in the genome tag snp selection for genome.
Software for tag single nucleotide polymorphism selection. The goal is to minimize the number of markers selected for genotyping in a particular. Using tag snps for a genomewide association study allows the investigator to maximize information content and minimize sample size without losing the power. Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the tag snp selection problem remains an important problem for computational biologists and geneticists. Tagging snpset selection with maximum information based on. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense network. A novel prediction method for tag snp selection using. Efficiency and power in genetic association studies nature. A key strategy to improve the efficiency of association studies is to select a subset of informative snps, called tag snps, for analysis johnson et al. Tag snp selection for association studies, genetic. A general question for linkage disequilibriumbased association studies is how power to detect an association is compromised when tag snps are chosen from data in one population sample and then deployed in another sample. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has emerged as a research hotspot for identifying genetic variants associated with disease susceptibility. In this case, snp bta60194nors rs41587782 was in high ld with the representative tag snp and thus, was excluded in the final step of the selection strategy.
Two vertices v i and v j are connected if and only if two corresponding snps s i and s j are correlated. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a. Selection of snp subsets for association studies in candidate. Informative snp selection problem issp given a sample s of a population p of individuals either haplotypes or genotypes on m snps, select positions of k k for any individual, one can predict non. Taylor1,2, 1epidemiology branch and 2laboratory of molecular carcinogenesis. Furthermore, we describe an innovative approach to combine both tag snp. Twostage sampling designs for gene association studies.
Accordingly, the scale and cost of genotyping are expected to be largely reduced. Optimized tag snp content and dense marker spacing mean spacing 1. Pdf linkage disequilibrium ld plays a central role in association studies for identifying the genetic variation responsible for complex human. Tag snp selection via a genetic algorithm sciencedirect. Imputationaware tag snp selection to improve power for large. But most existing methods of snpset analysis are affected by the quality of snpset, and poor quality of snpset can lead to low power in gwas. Targeting the most informative snp loci supports the most efficient study designs. An efficient weighted tag snpset analytical method in. Analysis of epidemiologic studies of genetic effects and gene. Efficient association study design via poweroptimized tag. Genomewide association studies gwas are meant to find the genetic. Snp and haplotype associations using a twostage design. Jun 01, 2011 genomewide association studies gwass have been effectively identifying the genomic regions associated with a disease trait. Because the power gauges the chance of success of an association study, selection of tag snps that yield higher power will increase the effectiveness of future association studies, at no.
Tag snp selection for association studies request pdf. Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies kui zhang,1,2 zhaohui s. For both applications either for tag or index snp selection, the corresponding problem can be formulated as follows. Tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. Tag snp selection for prediction of tick resistance in. As a result, there is now a need to identify among all these data, the relevant markers for genetic association studies. Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag snp selection, and the population of interest are required. Methods for tag snp selection based on established multivariate statistical techniques may. Association studies can determine whether a genetic variant is associated with a disease or trait. A comparative study of tag snp selection using clustering. Functionally informative tag snps for disease association studies. This work demonstrates that, while there may be limits given current reference panels, improving gwas scaffold design is an underused means to increase power in association studies.
603 1198 102 654 1170 1356 1518 471 918 1094 1447 122 574 1044 351 768 72 344 836 194 234 940 701 644 1201 388 270 682 1500 454 1072 767 132 83 1007 802 1054 663 132 1066 1402