Data Annotations in FAVOR
FAVOR provides a comprehensive set of functional annotations and annotation PCs (aPCs) for genomic variants, including clinical significance, gene information, and various functional categories.
Functional annotations and annotation PCs (aPCs)
The functional annotations provided in the FAVOR web portal are as follows:
Detailed descriptions of selected functional annotations and annotation pcs in the favor database. for numeric type of annotation marked as (+), a higher value indicates increased functionality according to that annotation. for numeric type of annotation marked as (-), a lower value indicates increased functionality according to that annotation.
Block Name | Annotation Name | Explanation | Type | Source |
---|---|---|---|---|
Basic | Variant | The unique identifier of the given variant. Reported as chr-pos-ref-alt format. | String | |
Basic | rsID | The rsID of the given variant (if exists). | String | |
Basic | TOPMed Depth | TOPMed depth of the given variant. | String | |
Basic | TOPMed QC Status | TOPMed QC status of the given variant. | String | |
ClinVar | Clinical Significance | Clinical significance for this single variant. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Clinical significance (genotype includes) | Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Disease Name | ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Disease Name (included variant) | For included variant: ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Review Status | ClinVar review status for the Variation ID. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Allele Origin | Allele origin: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Disease Database ID | Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Disease Database ID (includeded variant) | For included variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN. [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
ClinVar | Gene Reported | Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|). [@landrum2013clinvar; @landrum2017clinvar] | String | [Source][Ref1,2] |
Variant Category | Gencode Comprehensive Info | Identify whether variants cause protein coding changes using Gencode genes definition systems, it will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation. [@harrow2012gencode; @frankish2018gencode] | String | [Source1,2][Ref1,2] |
Variant Category | Gencode Comprehensive Category | Identify whether variants cause protein coding changes using Gencode genes definition systems. It will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation. [@harrow2012gencode; @frankish2018gencode] | String | [Source1,2][Ref1,2] |
Variant Category | Disruptive Missense | Identify whether the variant is a disruptive missense variant, defined as "disruptive" by the ensemble MetaSVM annotation. [@dong2014comparison] | Factor | [Source1,2][Ref] |
Variant Category | CAGE Promoter | CAGE defined promoter sites from Fantom 5. [@forrest2014promoter] | String | [Source][Ref] |
Variant Category | CAGE Enhancer | CAGE defined permissive Enhancer sites from Fantom 5. [@andersson2014atlas] | String | [Source][Ref] |
Variant Category | GeneHancer | Predicted human enhancer sites from the GeneHancer database. [@fishilevich2017genehancer] | String | [Ref] |
Variant Category | SuperEnhancer | Predicted super-enhancer sites and targets in a range of human cell types. [@hnisz2013super] | String | [Source][Ref] |
Variant Category | Gencode Comprehensive Exonic Category | Identify variants impact using Gencode exonic definition, and only label exonic categorical information like, synonymous, non-synonymous, frame-shifts indels, etc. [@harrow2012gencode; @frankish2018gencode] | String | [Source1,2][Ref1,2] |
Variant Category | Gencode Comprehensive Exonic Info | Identify variants cause protein coding changes using Gencode genes definition, and gives out detail annotation information of which exons of the variant has impacts on and how the impacts causes changes in amino acid changes. [@harrow2012gencode; @frankish2018gencode] | String | [Source1,2][Ref1,2] |
Variant Category | UCSC Info | Identify whether variants cause protein coding changes using UCSC genes definition systems, it will label the gene name of the variants has impact. If it is intergenic region, the nearby gene name will be labeled in the annotation. | String | [Source] |
Variant Category | UCSC Exonic Info | Identify variants cause protein coding changes using UCSC genes definition, and give out detail annotation information of which exons of the variant has impacts on and how the impacts causes changes in amino acid changes. | String | [Source] |
Variant Category | RefSeq Info | Identify whether variants cause protein coding changes using RefSeq genes definition systems, it will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation. | String | [Source] |
Variant Category | RefSeq Exonic Info | Identify variants cause protein coding changes using RefSeq genes definition, and give out detailed annotation information of which exons of the variant have impacts on and how the impacts cause changes in amino acid changes. | String | [Source] |
Allele Frequencies | TOPMed Bravo AF | TOPMed Bravo Genome Allele Frequency. [@taliun2019sequencing; @nhlbi2018bravo] | num | [Source][Ref] |
Allele Frequencies | GNOMAD Total AF | GNOMAD v3 Genome Allele Frequency using all the samples. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AFR GNOMAD AF | GNOMAD v3 Genome African population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMR GNOMAD AF | GNOMAD v3 Genome Ad Mixed American population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | EAS GNOMAD AF | GNOMAD v3 Genome East Asian population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | NFE GNOMAD AF | GNOMAD v3 Genome Non-Finnish European population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | FIN GNOMAD AF | GNOMAD v3 Genome Finnish European population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | SAS GNOMAD AF | GNOMAD v3 Genome South Asian population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMI GNOMAD AF | GNOMAD v3 Genome Amish population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | ASJ GNOMAD AF | GNOMAD v3 Genome Ashkenazi Jewish population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | OTH GNOMAD AF | GNOMAD v3 Genome Other (population not assigned) frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | Male GNOMAD AF | GNOMAD v3 Genome Male Allele Frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AFR Male GNOMAD AF | GNOMAD v3 Genome African Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMI Male GNOMAD AF | GNOMAD v3 Genome Amish Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMR Male GNOMAD AF | GNOMAD v3 Genome Ad Mixed American Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | ASJ Male GNOMAD AF | GNOMAD v3 Genome Ashkenazi Jewish Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | EAS Male GNOMAD AF | GNOMAD v3 Genome East Asian Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | FIN Male GNOMAD AF | GNOMAD v3 Genome Finnish European Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | NFE Male GNOMAD AF | GNOMAD v3 Genome Non-Finnish European Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | OTH Male GNOMAD AF | GNOMAD v3 Genome Other (population not assigned) Male frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | SAS Male GNOMAD AF | GNOMAD v3 Genome South Asian Male population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | Female GNOMAD AF | GNOMAD v3 Genome Female Allele Frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AFR Female GNOMAD AF | GNOMAD v3 Genome African Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMI Female GNOMAD AF | GNOMAD v3 Genome Amish Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | AMR Female GNOMAD AF | GNOMAD v3 Genome Ad Mixed American Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | ASJ Female GNOMAD AF | GNOMAD v3 Genome Ashkenazi Jewish Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | EAS Female GNOMAD AF | GNOMAD v3 Genome East Asian Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | FIN Female GNOMAD AF | GNOMAD v3 Genome Finnish European Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | NFE Female GNOMAD AF | GNOMAD v3 Genome Non-Finnish European Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | OTH Female GNOMAD AF | GNOMAD v3 Genome Other (population not assigned) Female frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | SAS Female GNOMAD AF | GNOMAD v3 Genome South Asian Female population frequency. [@karczewski2020mutational; @gnomad2019browser] | num | [Source][Ref] |
Allele Frequencies | ALL 1000G AF | 1000 Genome Allele Frequency (Whole genome allele frequencies from the 1000 Genomes Project phase 3 data). | num | [Source] |
Allele Frequencies | AFR 1000G AF | 1000 Genomes African population frequency. | num | [Source] |
Allele Frequencies | AMR 1000G AF | 1000 Genomes Ad Mixed American population frequency. | num | [Source] |
Allele Frequencies | EAS 1000G AF | 1000 Genomes East Asian population frequency. | num | [Source] |
Allele Frequencies | EUR 1000G AF | 1000 Genomes European population frequency. | num | [Source] |
Allele Frequencies | SAS 1000G AF | 1000 Genomes South Asian population frequency. | num | [Source] |
Integrative Score | aPC-Protein-Function | Protein function annotation PC: the first PC of the standardized scores of "SIFTval, PolyPhenVal, Grantham, Polyphen2_HDIV_score, Polyphen2_HVAR_score, MutationTaster_score, MutationAssessor_score" in PHRED scale. Range: [2.970, 97.690]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Conservation | Conservation annotation PC: the first PC of the standardized scores of "GerpN, GerpS, priPhCons, mamPhCons, verPhCons, priPhyloP, mamPhyloP, verPhyloP" in PHRED scale. Range: [1.478E-09, 99.451]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Epigenetics-Active | Active Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K4me1.max, EncodeH3K4me2.max, EncodeH3K4me3.max, EncodeH3K9ac.max, EncodeH3K27ac.max, EncodeH4K20me1.max,EncodeH2AFZ.max,” in PHRED scale.Range: [0, 99.451].[@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Epigenetics-Repressed | Repressed Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K9me3.max, EncodeH3K27me3.max” in PHRED scale. Range: [0, 99.451]. (Li et al., 2020). [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Epigenetics-Transcription | Transcription Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K36me3.max, EncodeH3K79me2.max” in PHRED scale. Range: [0, 99.451]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Local-Nucleotide-Diversity | Local nucleotide diversity annotation PC: the first PC of the standardized scores of "bStatistic, RecombinationRate, NuclearDiversity" in PHRED scale. Range: [0, 99.451]. [@li2020dynamic] | num | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Mutation-Density | Mutation density annotation PC: the first PC of the standardized scores of "Common100bp, Rare100bp, Sngl100bp, Common1000bp, Rare1000bp, Sngl1000bp, Common10000bp, Rare10000bp, Sngl10000bp" in PHRED scale. Range: [0, 99.451]. [@li2020dynamic] | num | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Transcription-Factor | Transcription factor annotation PC: the first PC of the standardized scores of "RemapOverlapTF, RemapOverlapCL" in PHRED scale. Range: [1.185, 99.451]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Mappability | Mappability annotation PC: the first PC of the standardized scores of "umap_k100, bismap_k100, umap_k50, bismap_k50, umap_k36, bismap_k36, umap_k24, bismap_k24" in PHRED scale. Range: [0.185, 99.451]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | aPC-Proximity-To-TSS-TES | Proximity to TSS (Transcription Starting Site) and TES (Transcription Ending Site) annotation PC: the first PC of "minDistTSS, minDistTSE" in PHRED scale. Range: [0, 99.451]. [@li2020dynamic] | num (+) | Individual annotation channels in the FAVOR database. |
Integrative Score | CADD RawScore | The CADD raw score (integrative score). A higher CADD score indicates more deleterious. Range: [-237.102, 22.763]. [@kircher2014general; @rentzsch2018cadd] | num (+) | [Source][Ref1,2] |
Integrative Score | CADD PHRED | The CADD score in PHRED scale (integrative score). A higher CADD score indicates more deleterious. Range: [0, 99]. [@kircher2014general; @rentzsch2018cadd] | num (+) | [Source][Ref1,2] |
Integrative Score | LINSIGHT | The LINSIGHT score (integrative score). A higher LINSIGHT score indicates more functionality. Range: [0.215, 0.995]. [@huang2017fast] | num (+) | [Source][Ref] |
Integrative Score | FATHMM-XF | The FATHMM-XF score (integrative score). A higher FATHMM-XF score indicates more functionality. Range: [0.405, 99.451]. [@rogers2017fathmm] | num (+) | [Source][Ref] |
Integrative Score | Funseq Value (impact score) | A flexible framework to prioritize regulatory mutations from cancer genome sequencing (integrative score). [@fu2014funseq2] | num (+) | [Source][Ref] |
Integrative Score | Funseq Description (annotation) | Funseq annotation pints out whether given mutation falls in coding or non-coding region (integrative score). [@fu2014funseq2] | String | [Source][Ref] |
Integrative Score | Aloft Value (impact score) | ALoFT provides extensive annotations to putative loss-of-function variants (LoF) in protein-coding genes including functional, evolutionary and network features (integrative score). [@balasubramanian2017using] | num (+) | [Source][Ref] |
Integrative Score | Aloft Description (annotation) | ALoFT annotation can predict the impact of premature stop variants and classify them as dominant disease-causing, recessive disease-causing and benign variants (integrative score). [@balasubramanian2017using] | String | [Source][Ref] |
Protein Function | PolyPhenCat | PolyPhen category of change. [@adzhubei2010method] | Factor | [Source][Ref] |
Protein Function | PolyPhenVal | PolyPhen score: It predicts the functional significance of an allele replacement from its individual features. Range: [0, 1] (default: 0). [@adzhubei2010method] | num (+) | [Source][Ref] |
Protein Function | Polyphen2_HDIV | Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. HumDiv is Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity). Range: [0, 1] (default: 0). [@adzhubei2010method] | num (+) | [Source1,2,3][Ref] |
Protein Function | Polyphen2_HVAR | Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. HumVar is all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect. Range: [0, 1] (default: 0). [@adzhubei2010method] | num (+) | [Source1,2,3][Ref] |
Protein Function | Grantham | Grantham score: oAA, nAA. It attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary distance. A higher Grantham score reflects a greater evolutionary distance, and is considered more deleterious. Range: [0, 215] (default: 0). [@grantham1974amino] | num (+) | [Source1,2][Ref] |
Protein Function | MutationTaster | MutationTaster is a free web-based application to evaluate DNA sequence variants for their disease-causing potential. The software performs a battery of in silico tests to estimate the impact of the variant on the gene product/protein. Range: [0, 1] (default: 0). [@schwarz2014mutationtaster2] | num (+) | [Source1,2,3][Ref] |
Protein Function | MutationAssessor | Predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. Range: [-5.135, 6.490] (default: -5.545). [@reva2011predicting] | num (+) | [Source1,2,3][Ref] |
Protein Function | SIFTcat | SIFT category of change. [@ng2003sift] | Factor | [Source][Ref] |
Protein Function | SIFTval | SIFT score, ranges from 0.0 (deleterious) to 1.0 (tolerated). Range: [0, 1] (default: 1). [@ng2003sift] | num (-) | [Source][Ref] |
Conservation | priPhCons | Primate phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 0.999] (default: 0.0). [@siepel2005evolutionarily] | num (+) | [Source][Ref] |
Conservation | mamPhCons | Mammalian phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 1] (default: 0.0). [@siepel2005evolutionarily] | num (+) | [Source][Ref] |
Conservation | verPhCons | Vertebrate phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 1] (default: 0.0). [@siepel2005evolutionarily] | num (+) | [Source][Ref] |
Conservation | priPhyloP | Primate phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-10.761, 0.595] (default: -0.029). [@pollard2010detection] | num (+) | [Source][Ref] |
Conservation | mamPhyloP | Mammalian phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-20, 4.494] (default: -0.005). [@pollard2010detection] | num (+) | [Source][Ref] |
Conservation | verPhyloP | Vertebrate phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-20, 11.295] (default: 0.042). [@pollard2010detection] | num (+) | [Source][Ref] |
Conservation | GerpN | Neutral evolution score defined by GERP++. A higher score means the region is more conserved. Range: [0, 19.8] (default: 3.0). [@davydov2010identifying] | num (+) | [Source][Ref] |
Conservation | GerpS | Rejected Substitution score defined by GERP++. A higher score means the region is more conserved. GERP (Genomic Evolutionary Rate Profiling) identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. These deficits are referred to as "Rejected Substitutions". Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element. GERP estimates constraint for each alignment column; elements are identified as excess aggregations of constrained columns. Positive scores (fewer than expected) indicate that a site is under evolutionary constraint. Negative scores may be weak evidence of accelerated rates of evolution. Range: [-39.5, 19.8] (default: -0.2). [@davydov2010identifying] | num (+) | [Source][Ref] |
Epigenetics | EncodeDNase | Maximum Encode DNase-seq level over 12 cell lines. Range: [0, 118672] (default: 0.0). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K27ac | Maximum Encode H3K27ac level over 14 cell lines. Range: [0.010, 1442.690] (default: 0.36). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K4me1 | Maximum Encode H3K4me1 level over 13 cell lines. Range: [0.010, 227.81] (default: 0.37). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K4me2 | Maximum Encode H3K4me2 level over 14 cell lines. Range: [0.010, 774.99] (default: 0.37). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K4me3 | Maximum Encode H3K4me3 level over 14 cell lines. Range: [0.010, 1093.75] (default: 0.38). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K9ac | Maximum Encode H3K9ac level over 13 cell lines. Range: [0.010, 1340.42] (default: 0.41). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH4K20me1 | Maximum Encode H4K20me1 level over 11 cell lines. Range: [0.010, 226.64] (default: 0.47). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH2AFZ | Maximum Encode H2AFZ level over 13 cell lines. Range: [0.020, 468.98] (default: 0.42). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K9me3 | Maximum Encode H3K9me3 level over 14 cell lines. Range: [0.010, 226.64] (default: 0.38). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K27me3 | Maximum Encode H3K27me3 level over 14 cell lines. Range: [0.010, 193.38] (default: 0.47). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K36me3 | Maximum Encode H3K36me3 level over 10 cell lines. Range: [0.020, 246.88] (default: 0.39). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodeH3K79me2 | Maximum Encode H3K79me2 level over 13 cell lines. Range: [0.020, 553.06] (default: 0.34). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | EncodetotalRNA | Maximum Encode totalRNA-seq level over 10 cell lines (minus and plus strand separately). Range: [0, 385096] (default: 0.0). [@encode2012integrated] | num (+) | [Source][Ref] |
Epigenetics | GC | Percent GC in a window of +/- 75bp. Range: [0, 1] (default: 0.42). | num (+) | [Source] |
Epigenetics | CpG | Percent CpG in a window of +/- 75bp. Range: [0, 0.604] (default: 0.02). | num (+) | [Source] |
Transcription Factors | RemapOverlapTF | Remap number of different transcription factors binding. Range: [1, 350] (default: -0.5). | int (+) | [Source] |
Transcription Factors | RemapOverlapCL | Remap number of different transcription factor - cell line combinations binding. Range: [1, 1068] (default: -0.5). | int (+) | [Source] |
Chromatin States | cHmm E1 | Number of 48 cell types in chromHMM state E1_poised. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E2 | Number of 48 cell types in chromHMM state E2_repressed. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E3 | Number of 48 cell types in chromHMM state E3_dead. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E4 | Number of 48 cell types in chromHMM state E4_dead. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E5 | Number of 48 cell types in chromHMM state E5_repressed. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E6 | Number of 48 cell types in chromHMM state E6_repressed. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E7 | Number of 48 cell types in chromHMM state E7_weak. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E8 | Number of 48 cell types in chromHMM state E8_gene. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E9 | Number of 48 cell types in chromHMM state E9_gene. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E10 | Number of 48 cell types in chromHMM state E10_gene. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E11 | Number of 48 cell types in chromHMM state E11_gene. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E12 | Number of 48 cell types in chromHMM state E12_distal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E13 | Number of 48 cell types in chromHMM state E13_distal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E14 | Number of 48 cell types in chromHMM state E14_distal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E15 | Number of 48 cell types in chromHMM state E15_weak. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E16 | Number of 48 cell types in chromHMM state E16_tss. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E17 | Number of 48 cell types in chromHMM state E17_proximal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E18 | Number of 48 cell types in chromHMM state E18_proximal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E19 | Number of 48 cell types in chromHMM state E19_tss. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E20 | Number of 48 cell types in chromHMM state E20_poised. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E21 | Number of 48 cell types in chromHMM state E21_dead. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E22 | Number of 48 cell types in chromHMM state E22_repressed. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E23 | Number of 48 cell types in chromHMM state E23_weak. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E24 | Number of 48 cell types in chromHMM state E24_distal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Chromatin States | cHmm E25 | Number of 48 cell types in chromHMM state E25_distal. (default: 1.92). [@ernst2015large] | num | [Source][Ref] |
Local Nucleotide Diversity | RecombinationRate | Recombination rate measures the probability of how likely the region tends to undergo recombination. Range: [0, 54.96] (default: 0). [@gazal2017linkage] | num (+) | [Ref] |
Local Nucleotide Diversity | NuclearDiversity | Nuclear diversity measures the probability of how likely the region diversify. Range: [0.05, 60.25] (default: 0). [@gazal2017linkage] | num (+) | [Ref] |
Local Nucleotide Diversity | bStatistic | Background selection score. A background selection (B) value for each position in the genome. B indicates the expected fraction of neutral diversity that is present at a site, with values close to 0 representing near complete removal of diversity as a result of selection and values near 1000 indicating little effect of selection. Range: [0, 1000] (default: 800). [@mcvicker2009widespread] | int (+) | [Source][Ref] |
Mutation Density | Common100bp | Number of common (MAF > 0.05) BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 100. Range: [0, 14] (default: 0). | int (+) | [Source] |
Mutation Density | Rare100bp | Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 100. Range: [0, 31] (default: 0). | int (+) | [Source] |
Mutation Density | Sngl100bp | Number of single occurrence of BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 100. Range: [0, 99] (default: 0). | int (+) | [Source] |
Mutation Density | Common1000bp | Number of common (MAF > 0.05) BRAVO SNVs in the nearby1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 1000. Range: [0, 73] (default: 0). | int (+) | [Source] |
Mutation Density | Rare1000bp | Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 1000. Range: [0, 74] (default: 0). | int (+) | [Source] |
Mutation Density | Sngl1000bp | Number of single occurrence of BRAVO SNVs in the nearby 1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 1000. Range: [0, 658] (default: 0). | int (+) | [Source] |
Mutation Density | Common10000bp | Number of common (MAF > 0.05) BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 10000. Range: [0, 443] (default: 0). | int (+) | [Source] |
Mutation Density | Rare10000bp | Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 10000. Range: [0, 355] (default: 0). | int (+) | [Source] |
Mutation Density | Sngl10000bp | Number of single occurrence of BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 10000. Range: [0, 4750] (default: 0). | int (+) | [Source] |
Mappability | Umap (k100, k50, k36, k24) | Mappability of unconverted genome. It measures the extent to which a position can be uniquely mapped by sequence reads. Lower mappability means the estimates of genomic and epigenomic characteristics from sequencing assays are less reliable, and the region has increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Range: [0, 1] (default: 0). [@karimzadeh2018umap] | num (+) | [Source][Ref] |
Mappability | Bismap (k100, k50, k36, k24) | Mappability of the bisulfite-converted genome. Bisulfite sequencing approaches used to identify DNA methylation introduce large numbers of reads that map to multiple regions. This annotation identifies mappability of the bisulfite-converted genome. Range: [0, 1] (default: 0). [@karimzadeh2018umap] | num (+) | [Source][Ref] |
Proximity Table | minDistTSS | Distance to closest Transcribed Sequence Start (TSS). Range: [1, 3604063] (default: 1e7). | num (-) | [Source] |
Proximity Table | minDistTSE | Distance to closest Transcribed Sequence End (TSE). Range: [1, 3608885] (default: 1e7). | num (-) | [Source] |
Alphamissense | protein_variant | Amino acid change induced by the alternative allele, in the format POS_aa Alternative amino acid (e.g. V2L). POS_aa is the 1-based position of the residue within the protein amino acid sequence. | String | [Source] |
Alphamissense | AM_pathogenicity | Calibrated AlphaMissense pathogenicity scores (ranging between 0 and 1), which canbe interpreted as the predicted probability of a variant being clinically pathogenic. | String | [Source] |
Alphamissense | AM_class | Classification of the protein_variant into one of three discrete categories: 'likely_benign','likely_pathogenic', or 'ambiguous'. These are derived using the following thresholds:'likely_benign' if alphamissense_pathogenicity < 0.34; 'likely_pathogenic' ifalphamissense_pathogenicity > 0.564; and 'ambiguous' otherwise. | String | [Source] |
Mutation Rate | filter | Low: Low quality regions as determined by gnomAD sequencing metrics. Mappability(0.5;overlap with 50nt simple repeat;ReadPosRankSum)1;0 SNVs in 100bp window. SFS_bump: Pentamer context with abnormal SFS. The fraction of high-frequency SNVS Range [0.0005, 0.2] is greater than 1.5x mutation rate controlled average. Tends to be repetitive contexts. TFBS: Transcription factor binding site as determined by overlap with ChIP-seq peaks. | String | [Source] |
Mutation Rate | PN | Pentanucleotide context | num (+) | [Source] |
Mutation Rate | MR | Roulette mutation rate estimate | num (+) | [Source] |
Mutation Rate | MG | gnomAD mutation rate estimate (Karczewski et al. 2020) | num (+) | [Source] |
Mutation Rate | MC | Carlson mutation rate estimate (Carlson et al. 2018) | num (+) | [Source] |
cCREs | accession | Accession number of the cCRE | String | [Source] |
cCREs | annotation | Promoter-like (PLS) | String | [Source] |
cCREs | annotation | All Candidate Enhancers (pELS & dELS) | String | [Source] |
cCREs | annotation | Proximal enhancer-like (pELS) | String | [Source] |
cCREs | annotation | Distal enhancer-like (dELS) | String | [Source] |
cCREs | annotation | Chromatin Accessible with CTCF (CA-CTCF) | String | [Source] |
cCREs | annotation | Chromatin Accessible with H3K4me3 (CA-H3K4me3) | String | [Source] |
cCREs | annotation | Chromatin Accessible with TF (CA-TF) | String | [Source] |
cCREs | annotation | Chromatin Accessible Only (CA) | String | [Source] |
cCREs | annotation | TF Only (TF) | String | [Source] |
cCREs | annotation | CTCF-Bound cCREs | String | [Source] |
CATlas | Signal_Value | Activity signal strength measured in the tissue | num | [Source] |
CATlas | P_value | P-value of the signal significance | num | [Source] |
CATlas | Q_value | Q-value (FDR adjusted P-value) | num | [Source] |
CATlas | Peak | Peak ID or rank associated with signal | num | [Source] |
CATlas | Tissue | Tissue type in which the signal or linkage is observed | String | [Source] |
CATlas | cCREs_Region | Linked candidate cis-regulatory element (cCRE) region | String | [Source] |
CATlas | Promoter_Region | Promoter region linked to cCRE via ABC model | String | [Source] |
CATlas | ABC_Score | ABC score estimating enhancer–promoter interaction strength | num | [Source] |
CATlas | Linked Gene | Gene name linked to the cCRE region via promoter | String | [Source] |
CATlas | Distance | Genomic distance between cCRE and linked promoter | num | [Source] |
EpiMap | BSSID | Unique biosample state identifier | String | [Source] |
EpiMap | State | Full chromatin state name (e.g., EnhA1, TssA), describing regulatory role | String | [Source] |
EpiMap | Group | Broad category grouping the sample (e.g., cancer, normal) | String | [Source] |
EpiMap | Extended_Info | Extended tissue/cell line description (e.g., CANCER PROSTATE) | String | [Source] |
EpiMap | Sample_Name | Specific sample name with treatment condition (e.g., A549, 22Rv1 treated with 10 nM 17b-hydroxy) | String | [Source] |
pgBoost | Linked Gene | Gene symbol linked to the variant | String | [Source] |
pgBoost | pg_boost | Probabilistic score of SNP-gene link from pgBoost (gradient boosting model trained on multiome fine-mapping data using SCENT, Signac, Cicero, distance) | num (+) | [Source] |
pgBoost | pg_boost_percentile | Percentile ranking of the pgBoost score across all SNP-gene pairs | num (+) | [Source] |