Variants

On this page, we'll dive into the /v1/variants endpoint you can use to fetch variants programmatically. We'll look at how to query variants contacts.

GET/v1/variants

Retrieve a variant

This endpoint allows you to retrieve a single variant. The variant_vcf must be specified in chromosome-position-ref-alt format, e.g. 1-1000-A-T. Refer to the list at the bottom of of this page to see which properties are included with variant objects.

Path parameters

  • Name
    variant_vcf
    Type
    string
    Description

    The variant in chromosome-position-ref-alt format, e.g. 1-1000-A-T.

Request

GET
/v1/variants
curl -G https://api.genohub.org/v1/variants/19-44908822-C-T

Response

{
    "variant_vcf": "19-44908822-C-T",
    "chromosome": "19",
    "position": "44908822",
    "bravo_an": 264690,
    "bravo_ac": 20678,
    "bravo_af": 0.0781216,
    "filter_status": "PASS",
    "rsid": "rs7412",
    "genecode_comprehensive_category": "exonic",
    "genecode_comprehensive_info": "APOE",
    "genecode_comprehensive_exonic_category": "nonsynonymous SNV",
    "genecode_comprehensive_exonic_info":
    "APOE:ENST00000446996.5:exon4:c.C526T:p.R176C,APOE:ENST00000434152.5:exon4:c.C604T:p.R202C,APOE:ENST00000252486.9:exon4:c.C526T:p.R176C,APOE:ENST00000425718.1:exon3:c.C526T:p.R176C,",
    "ucsc_info": "ENST00000252486.8,ENST00000425718.1,ENST00000434152.5,ENST00000446996.5",
    "ucsc_exonic_info":
    "ENST00000446996.5:ENST00000446996.5:exon4:c.C526T:p.R176C,ENST00000434152.5:ENST00000434152.5:exon4:c.C604T:p.R202C,ENST00000425718.1:ENST00000425718.1:exon3:c.C526T:p.R176C,ENST00000252486.8:ENST00000252486.8:exon4:c.C526T:p.R176C,",
    "polyphen2_hdiv_score": 1,
    // ...
}

GET/v1/rsids

Retrieve a variant using rsID

This endpoint allows you to retrieve variants using rsID. The rsid must be specified in rs format, e.g. rs7412. Refer to the list at the bottom of of this page to see which properties are included with variant objects.

Path parameters

  • Name
    rsid
    Type
    string
    Description

    The rsID, e.g. rs7412.

Request

GET
/v1/rsids
curl -G https://api.genohub.org/v1/rsids/rs7412

Response

[{
"variant_vcf": "19-44908822-C-T",
"chromosome": "19",
"position": "44908822",
"bravo_an": 264690,
"bravo_ac": 20678,
"bravo_af": 0.0781216,
"filter_status": "PASS",
"rsid": "rs7412",
"genecode_comprehensive_category": "exonic",
"genecode_comprehensive_info": "APOE",
"genecode_comprehensive_exonic_category": "nonsynonymous SNV",
// ...
}]

POST/v1/variants

Retrieve multiple variants

This endpoint allows you to retrieve multiple variants.

Request body

  • Name
    email
    Type
    string
    Description

    Your email address.

  • Name
    organization
    Type
    string
    Description

    Your organization.

  • Name
    file-upload
    Type
    text
    Description

    A file containing a list of variants in chromosome-position-ref-alt format, e.g. 1-1000-A-T. The file must be in .txt or .txt.gz format and each variant must be on a separate line.

    variants.txt

    1-1000-A-T
    1-1001-A-T
    1-1002-A-T
    
  • Name
    file-upload-type
    Type
    text
    Description

    The content type of the file. It can be either text/plain or application/gzip, application/x-gzip.

  • Name
    coordinate-system
    Type
    string
    Description

    The coordinate system of the variant. It can be either 1-base or 0-base.

  • Name
    left-normalization
    Type
    boolean
    Description

    Whether to left-normalize the variant. It can be either true or false.

Request

POST
/v1/variants
curl --location 'api.genohub.org/v1/variants' \
--form 'email="your_email"' \
--form 'organization="your_organization"' \
--form 'file-upload=@"filename.txt"'
--form 'file-upload-type="text/plain"'
--form 'coordinate-system="1-base"'
--form 'left-normalization="false"'

The variant model

The variant model contains all the information about the variant, such as functional scores, genecode comprehensive info, etc.

Properties

  • Name
    variant_vcf
    Type
    string
    Description

    The unique identifier of the given variant. Reported as chr-pos-ref-alt format.

  • Name
    chromosome
    Type
    string
    Description

    The chromosome where the variant is located

  • Name
    position
    Type
    string
    Description

    The position where the variant is located

  • Name
    bravo_an
    Type
    null.Int
    Description

    TOPMed Bravo Genome Allele Number. (NHLBI TOPMed Consortium, 2018; Taliun et al., 2019)

  • Name
    bravo_ac
    Type
    null.Int
    Description

    TOPMed Bravo Genome Allele Count.

  • Name
    bravo_af
    Type
    null.Float
    Description

    TOPMed Bravo Genome Allele Frequency. (NHLBI TOPMed Consortium, 2018; Taliun et al., 2019)

  • Name
    filter_status
    Type
    string
    Description

    TOPMed QC status of the given variant.

  • Name
    rsid
    Type
    string
    Description

    The rsID of the given variant (if exists).

  • Name
    genecode_comprehensive_category
    Type
    string
    Description

    Identify whether variants cause protein coding changes using Gencode genes definition systems. It will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation.

  • Name
    genecode_comprehensive_info
    Type
    string
    Description

    Identify whether variants cause protein coding changes using Gencode genes definition systems, it will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation.

  • Name
    genecode_comprehensive_exonic_category
    Type
    string
    Description

    Identify variants impact using Gencode exonic definition, and only label exonic categorical information like, synonymous, non-synonymous, frame-shifts indels, etc.

  • Name
    genecode_comprehensive_exonic_info
    Type
    string
    Description

    Identify variants impact using Gencode exonic definition, and only label exonic categorical information like, synonymous, non-synonymous, frame-shifts indels, etc.

  • Name
    ucsc_info
    Type
    string
    Description

    Identify whether variants cause protein coding changes using UCSC genes definition systems, it will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation.

  • Name
    ucsc_exonic_info
    Type
    string
    Description

    Identify variants cause protein coding changes using UCSC genes definition, and gives out detail annotation information of which exons of the variant has impacts on and how the impacts causes changes in amino acid changes.

  • Name
    polyphen2_hdiv_score
    Type
    null.Float
    Description

    Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. HumDiv is Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity). Range: [0, 1] (default: 0).

  • Name
    polyphen2_hvar_score
    Type
    null.Float
    Description

    Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. HumVar is all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect. Range: [0, 1] (default: 0).

  • Name
    mutation_taster_score
    Type
    null.Float
    Description

    MutationTaster is a free web-based application to evaluate DNA sequence variants for their disease-causing potential. The software performs a battery of in silico tests to estimate the impact of the variant on the gene product/protein. Range: [0, 1] (default: 0).

  • Name
    mutation_assessor_score
    Type
    null.Float
    Description

    Predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. Range: [-5.135, 6.490] (default: -5.545).

  • Name
    metasvm_pred
    Type
    string
    Description

    Description for MetasvmPred

  • Name
    refseq_info
    Type
    string
    Description

    Identify whether variants cause protein coding changes using RefSeq genes definition systems, it will label the gene name of the variants has impact, if it is intergenic region, the nearby gene name will be labeled in the annotation.

  • Name
    refseq_exonic_info
    Type
    string
    Description

    Identify variants cause protein coding changes using RefSeq genes definition, and give out detailed annotation information of which exons of the variant have impacts on and how the impacts cause changes in amino acid changes.

  • Name
    cage_enhancer
    Type
    string
    Description

    CAGE defined permissive Enhancer sites from Fantom 5.

  • Name
    cage_promoter
    Type
    string
    Description

    CAGE defined promoter sites from Fantom 5.

  • Name
    genehancer
    Type
    string
    Description

    Predicted human enhancer sites from the GeneHancer database.

  • Name
    super_enhancer
    Type
    string
    Description

    Predicted super-enhancer sites and targets in a range of human cell types.

  • Name
    clnsig
    Type
    string
    Description

    Clinical significance for this single variant. (Landrum et al., 2017, 2013)

  • Name
    clnsigincl
    Type
    string
    Description

    Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance. (Landrum et al., 2017, 2013)

  • Name
    clndn
    Type
    string
    Description

    Clinical disease name

  • Name
    clndnincl
    Type
    string
    Description

    Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.

  • Name
    clnrevstat
    Type
    string
    Description

    ClinVar review status for the Variation ID.

  • Name
    origin
    Type
    string
    Description

    Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512

    • tested-inconclusive.
  • Name
    clndisdb
    Type
    string
    Description

    Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN.

  • Name
    clndisdbincl
    Type
    string
    Description

    For included variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN.

  • Name
    geneinfo
    Type
    string
    Description

    Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|).

  • Name
    linsight
    Type
    null.Float
    Description

    The LINSIGHT score (integrative score). A higher LINSIGHT score indicates more functionality. Range: [0.215, 0.995].

  • Name
    fathmm_xf
    Type
    null.Float
    Description

    The FATHMM-XF score (integrative score). A higher FATHMM-XF score indicates more functionality. Range: [0.405, 99.451].

  • Name
    gc
    Type
    null.Float
    Description

    Percent GC in a window of +/- 75bp. Range: [0, 1] (default: 0.42)

  • Name
    cpg
    Type
    null.Float
    Description

    Percent CpG in a window of +/- 75bp. Range: [0, 0.6] (default: 0.02).

  • Name
    min_dist_tss
    Type
    null.Int
    Description

    Distance to closest Transcribed Sequence Start (TSS). Range: [1, 3604058] (default: 1e7).

  • Name
    min_dist_tse
    Type
    null.Int
    Description

    Distance to closest Transcribed Sequence End (TSE). Range: [1, 3610636] (default: 1e7).

  • Name
    sift_cat
    Type
    string
    Description

    SIFT category of change.

  • Name
    sift_val
    Type
    null.Float
    Description

    SIFT score, ranges from 0.0 (deleterious) to 1.0 (tolerated). Range: [0, 1] (default: 1).

  • Name
    polyphen_cat
    Type
    string
    Description

    PolyPhen category of change.

  • Name
    polyphen_val
    Type
    null.Float
    Description

    PolyPhen score: It predicts the functional significance of an allele replacement from its individual features. Range: [0, 1] (default: 0).

  • Name
    priphcons
    Type
    null.Float
    Description

    Primate phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 0.999] (default: 0.0).

  • Name
    mamphcons
    Type
    null.Float
    Description

    Mammalian phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 1] (default: 0.0).

  • Name
    verphcons
    Type
    null.Float
    Description

    Vertebrate phastCons conservation score (excl. human). A higher score means the region is more conserved. PhastCons considers n species rather than two. It considers the phylogeny by which these species are related, and instead of measuring similarity/divergence simply in terms of percent identity. It uses statistical models of nucleotide substitution that allow for multiple substitutions per site and for unequal rates of substitution between different pairs of bases. Range: [0, 1] (default: 0.0).

  • Name
    priphylop
    Type
    null.Float
    Description

    Primate phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-10.761, 0.595] (default: -0.029)

  • Name
    mamphylop
    Type
    null.Float
    Description

    Mammalian phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-20, 4.494] (default: -0.005).

  • Name
    verphylop
    Type
    null.Float
    Description

    Vertebrate phyloP score (excl. human). A higher score means the region is more conserved. PhyloP scores measure evolutionary conservation at individual alignment sites. The scores are calculated by comparing with the evolution expected under neutral drift. Positive scores: measure conservation, i.e., slower evolution than expected, at sites that are predicted to be conserved. Negative scores: measure acceleration, i.e., faster evolution than expected, at sites that are predicted to be fast-evolving. Range: [-20, 11.295] (default: 0.042).

  • Name
    bstatistic
    Type
    null.Float
    Description

    Background selection score. A background selection (B) value for each position in the genome. B indicates the expected fraction of neutral diversity that is present at a site, with values close to 0 representing near complete removal of diversity as a result of selection and values near 1000 indicating little effect of selection. Range: [0, 1000] (default: 800).

  • Name
    chmm_e1
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E1_poised. (default: 1.92).

  • Name
    chmm_e2
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E2_repressed. (default: 1.92).

  • Name
    chmm_e3
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E3_dead. (default: 1.92).

  • Name
    chmm_e4
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E4_dead. (default: 1.92).

  • Name
    chmm_e5
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E5_repressed. (default: 1.92).

  • Name
    chmm_e6
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E6_repressed. (default: 1.92).

  • Name
    chmm_e7
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E7_weak. (default: 1.92).

  • Name
    chmm_e8
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E8_gene. (default: 1.92).

  • Name
    chmm_e9
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E9_gene. (default: 1.92).

  • Name
    chmm_e10
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E10_gene. (default: 1.92).

  • Name
    chmm_e11
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E11_gene. (default: 1.92).

  • Name
    chmm_e12
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E12_distal. (default: 1.92).

  • Name
    chmm_e13
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E13_distal. (default: 1.92).

  • Name
    chmm_e14
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E14_distal. (default: 1.92).

  • Name
    chmm_e15
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E15_weak. (default: 1.92).

  • Name
    chmm_e16
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E16_tss. (default: 1.92).

  • Name
    chmm_e17
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E17_proximal. (default: 1.92).

  • Name
    chmm_e18
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E18_proximal. (default: 1.92).

  • Name
    chmm_e19
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E19_tss. (default: 1.92).

  • Name
    chmm_e20
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E20_poised. (default: 1.92).

  • Name
    chmm_e21
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E21_dead. (default: 1.92).

  • Name
    chmm_e22
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E22_repressed. (default: 1.92).

  • Name
    chmm_e23
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E23_weak. (default: 1.92).

  • Name
    chmm_e24
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E24_distal. (default: 1.92).

  • Name
    chmm_e25
    Type
    null.Float
    Description

    Number of 48 cell types in chromHMM state E25_distal. (default: 1.92).

  • Name
    gerp_n
    Type
    null.Float
    Description

    Neutral evolution score defined by GERP++. A higher score means the region is more conserved. Range: [0, 19.8] (default: 3.0).

  • Name
    gerp_s
    Type
    null.Float
    Description

    Rejected Substitution score defined by GERP++. A higher score means the region is more conserved. GERP (Genomic Evolutionary Rate Profiling) identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. These deficits are referred to as "Rejected Substitutions". Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element. GERP estimates constraint for each alignment column; elements are identified as excess aggregations of constrained columns. Positive scores (fewer than expected) indicate that a site is under evolutionary constraint. Negative scores may be weak evidence of accelerated rates of evolution. Range: [-39.5, 19.8] (default: -0.2).

  • Name
    encodeh3k4me1_sum
    Type
    null.Float
    Description

    Maximum Encode H3K4me1 level over 13 cell lines. Range: [0.015, 91.954] (default: 0.37).

  • Name
    encodeh3k4me2_sum
    Type
    null.Float
    Description

    Maximum Encode H3K4me2 level over 14 cell lines. Range: [0.024, 148.887] (default: 0.37).

  • Name
    encodeh3k4me3_sum
    Type
    null.Float
    Description

    Maximum Encode H3K4me3 level over 14 cell lines. Range: [0.012, 239.512] (default: 0.38).

  • Name
    encodeh3k9ac_sum
    Type
    null.Float
    Description

    Maximum Encode H3K9ac level over 13 cell lines. Range: [0.019, 281.187] (default: 0.41).

  • Name
    encodeh3k9me3_sum
    Type
    null.Float
    Description

    Maximum Encode H3K9me3 level over 14 cell lines. Range: [0.011, 58.712] (default: 0.38).

  • Name
    encodeh3k27ac_sum
    Type
    null.Float
    Description

    Maximum Encode H3K27ac level over 14 cell lines. Range: [0.013, 288.608] (default: 0.36).

  • Name
    encodeh3k27me3_sum
    Type
    null.Float
    Description

    Maximum Encode H3K27me3 level over 14 cell lines. Range: [0.014, 87.122] (default: 0.47).

  • Name
    encodeh3k36me3_sum
    Type
    null.Float
    Description

    Maximum Encode H3K36me3 level over 10 cell lines. Range: [0.009, 56.176] (default: 0.39).

  • Name
    encodeh3k79me2_sum
    Type
    null.Float
    Description

    Maximum Encode H3K79me2 level over 13 cell lines. Range: [0.015, 118.706] (default: 0.34).

  • Name
    encodeh4k20me1_sum
    Type
    null.Float
    Description

    Maximum Encode H4K20me1 level over 11 cell lines. Range: [0.054, 73.230] (default: 0.47).

  • Name
    encodeh2afz_sum
    Type
    null.Float
    Description

    Maximum Encode H2AFZ level over 13 cell lines. Range: [0.031, 96.072] (default: 0.42).

  • Name
    encode_dnase_sum
    Type
    null.Float
    Description

    Maximum Encode DNase-seq level over 12 cell lines. Range: [0.001, 118672] (default: 0.0).

  • Name
    encodetotal_rna_sum
    Type
    null.Float
    Description

    Maximum Encode totalRNA-seq level over 10 cell lines (minus and plus strand separately). Range: [0, 92282.7]

  • Name
    grantham
    Type
    null.Float
    Description

    Grantham score: oAA, nAA. It attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary distance. A higher Grantham score reflects a greater evolutionary distance, and is considered more deleterious. Range: [0, 215] (default: 0).

  • Name
    freq100bp
    Type
    null.Float
    Description

    Number of common (MAF > 0.05) BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 100. Range: [0, 13]

  • Name
    rare100bp
    Type
    null.Float
    Description

    Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 100. Range: [0, 31] (default: 0).

  • Name
    sngl100bp
    Type
    null.Float
    Description

    Number of single occurrence of BRAVO SNVs in the nearby 100 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 100. Range: [0, 99] (default: 0).

  • Name
    freq1000bp
    Type
    null.Float
    Description

    Number of common (MAF > 0.05) BRAVO SNVs in the nearby1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 1000. Range: [0, 73] (default: 0).

  • Name
    rare1000bp
    Type
    null.Float
    Description

    Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 1000. Range: [0, 74] (default: 0).

  • Name
    sngl1000bp
    Type
    null.Float
    Description

    Number of single occurrence of BRAVO SNVs in the nearby 1000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 1000. Range: [0, 658] (default: 0).

  • Name
    freq10000bp
    Type
    null.Float
    Description

    Number of common (MAF > 0.05) BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 10000. Range: [0, 443] (default: 0).

  • Name
    rare10000bp
    Type
    null.Float
    Description

    Number of rare (MAF < 0.05) BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutations. Scores range from 0 to 10000. Range: [0, 355] (default: 0).

  • Name
    sngl10000bp
    Type
    null.Float
    Description

    Number of single occurrence of BRAVO SNVs in the nearby 10000 bp window (default: 0). A higher value indicates more mutations happen in the region and a higher likelihood of mutation. Scores range from 0 to 10000. Range: [0, 4750] (default: 0).

  • Name
    remap_overlap_tf
    Type
    null.Float
    Description

    Remap number of different transcription factors binding. Range: [1, 350] (default: -0.5).

  • Name
    remap_overlap_cl
    Type
    null.Float
    Description

    Remap number of different transcription factor - cell line combinations binding. Range: [1, 1068] (default: -0.5).

  • Name
    cadd_rawscore
    Type
    null.Float
    Description

    The CADD raw score (integrative score). A higher CADD score indicates more deleterious. Range: [-237.102, 22.763].

  • Name
    cadd_phred
    Type
    null.Float
    Description

    The CADD score in PHRED scale (integrative score). A higher CADD score indicates more deleterious. Range: [0, 99].

  • Name
    apc_conservation_v2
    Type
    null.Float
    Description

    Conservation annotation PC: the first PC of the standardized scores of “GerpN, GerpS, priPhCons, mamPhCons, verPhCons, priPhyloP, mamPhyloP, verPhyloP” in PHRED scale. Range: [0, 75.824].

  • Name
    apc_epigenetics_active
    Type
    null.Float
    Description

    Active Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K4me1.max, EncodeH3K4me2.max, EncodeH3K4me3.max, EncodeH3K9ac.max, EncodeH3K27ac.max, EncodeH4K20me1.max,EncodeH2AFZ.max,” in PHRED scale.Range: [0, 86.238].

  • Name
    apc_epigenetics_repressed
    Type
    null.Float
    Description

    Repressed Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K9me3.max, EncodeH3K27me3.max” in PHRED scale. Range: [0, 86.238].

  • Name
    apc_epigenetics_transcription
    Type
    null.Float
    Description

    Transcription Epigenetic annotation PC: the first PC of the standardized scores of “EncodeH3K36me3.max, EncodeH3K79me2.max” in PHRED scale. Range: [0, 86.238].

  • Name
    apc_local_nucleotide_diversity_v3
    Type
    null.Float
    Description

    Local nucleotide diversity annotation PC: the first PC of the standardized scores of “bStatistic, RecombinationRate, NuclearDiversity” in PHRED scale. Range: [0, 86.238].

  • Name
    apc_mappability
    Type
    null.Float
    Description

    Mappability annotation PC: the first PC of the standardized scores of “umap_k100, bismap_k100, umap_k50, bismap_k50, umap_k36, bismap_k36, umap_k24, bismap_k24” in PHRED scale. Range: [0.007, 22.966].

  • Name
    apc_mutation_density
    Type
    null.Float
    Description

    Mutation density annotation PC: the first PC of the standardized scores of “Common100bp, Rare100bp, Sngl100bp, Common1000bp, Rare1000bp, Sngl1000bp, Common10000bp, Rare10000bp, Sngl10000bp” in PHRED scale. Range: [0, 84.477].

  • Name
    apc_protein_function_v3
    Type
    null.Float
    Description

    Protein function annotation PC: the first PC of the standardized scores of “SIFTval, PolyPhenVal, Grantham, Polyphen2_HDIV_score, Polyphen2_HVAR_score, MutationTaster_score, MutationAssessor_score” in PHRED scale. Range: [2.974, 86.238].

  • Name
    apc_transcription_factor
    Type
    null.Float
    Description

    Transcription factor annotation PC: the first PC of the standardized scores of “RemapOverlapTF, RemapOverlapCL” in PHRED scale. Range: [1.185, 86.238].

  • Name
    tg_afr
    Type
    null.Float
    Description

    1000 Genomes African population frequency.

  • Name
    tg_all
    Type
    null.Float
    Description

    GNOMAD v3 Genome African population frequency. (GNOMAD Consortium, 2019; Karczewski et al., 2020)

  • Name
    tg_amr
    Type
    null.Float
    Description

    1000 Genomes Ad Mixed American population frequency.

  • Name
    tg_eas
    Type
    null.Float
    Description

    1000 Genomes East Asian population frequency.

  • Name
    tg_eur
    Type
    null.Float
    Description

    1000 Genomes European population frequency.

  • Name
    tg_sas
    Type
    null.Float
    Description

    1000 Genomes South Asian population frequency.

  • Name
    af_total
    Type
    null.Float
    Description

    GNOMAD v3 Genome Allele Frequency using all the samples.

  • Name
    af_asj_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ashkenazi Jewish Female population frequency.

  • Name
    af_eas_female
    Type
    null.Float
    Description

    Description for AfEasFemale

  • Name
    af_afr_male
    Type
    null.Float
    Description

    Description for AfAfrMale

  • Name
    af_female
    Type
    null.Float
    Description

    Description for AfFemale

  • Name
    af_fin_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome East Asian Female population frequency.

  • Name
    af_oth_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Other (population not assigned) Female frequency.

  • Name
    af_ami
    Type
    null.Float
    Description

    GNOMAD v3 Genome Amish population frequency.

  • Name
    af_oth
    Type
    null.Float
    Description

    GNOMAD v3 Genome Other (population not assigned) frequency.

  • Name
    af_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Male Allele Frequency.

  • Name
    af_ami_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Amish Female population frequency.

  • Name
    af_afr
    Type
    null.Float
    Description

    GNOMAD v3 Genome African population frequency.

  • Name
    af_eas_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome East Asian Male population frequency.

  • Name
    af_sas
    Type
    null.Float
    Description

    GNOMAD v3 Genome South Asian population frequency.

  • Name
    af_nfe_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Non-Finnish European Female population frequency.

  • Name
    af_asj_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ashkenazi Jewish Male population frequency.

  • Name
    af_oth_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Other (population not assigned) Male frequency.

  • Name
    af_nfe_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Non-Finnish European Male population frequency.

  • Name
    af_asj
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ashkenazi Jewish population frequency.

  • Name
    af_amr_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ad Mixed American Male population frequency.

  • Name
    af_amr_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ad Mixed American Female population frequency.

  • Name
    af_sas_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome South Asian Female population frequency.

  • Name
    af_fin
    Type
    null.Float
    Description

    GNOMAD v3 Genome Finnish European population frequency.

  • Name
    af_afr_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome African Female population frequency.

  • Name
    af_sas_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome South Asian Male population frequency.

  • Name
    af_amr
    Type
    null.Float
    Description

    GNOMAD v3 Genome Ad Mixed American population frequency.

  • Name
    af_nfe
    Type
    null.Float
    Description

    GNOMAD v3 Genome Non-Finnish European population frequency.

  • Name
    af_eas
    Type
    null.Float
    Description

    GNOMAD v3 Genome East Asian population frequency.

  • Name
    af_ami_male
    Type
    null.Float
    Description

    GNOMAD v3 Genome Amish Male population frequency.

  • Name
    af_fin_female
    Type
    null.Float
    Description

    GNOMAD v3 Genome Finnish European Female population frequency.

  • Name
    Bismap (k100, k50, k36, k24)
    Type
    null.Float
    Description

    Mappability of the bisulfite-converted genome. Bisulfite sequencing approaches used to identify DNA methylation introduce large numbers of reads that map to multiple regions. This annotation identifies mappability of the bisulfite-converted genome. Range: [0, 1] (default: 0).

  • Name
    Umap (k100, k50, k36, k24)
    Type
    null.Float
    Description

    Mappability of unconverted genome. It measures the extent to which a position can be uniquely mapped by sequence reads. Lower mappability means the estimates of genomic and epigenomic characteristics from sequencing assays are less reliable, and the region has increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Range: [0, 1] (default: 0).

  • Name
    recombination_rate
    Type
    null.Float
    Description

    Recombination rate measures the probability of how likely the region tends to undergo recombination. Range: [0, 54.96] (default: 0).

  • Name
    nucdiv
    Type
    null.Float
    Description

    Nuclear diversity measures the probability of how likely the region diversify. Range: [0.05, 60.25] (default: 0).

  • Name
    aloft_value
    Type
    string
    Description

    ALoFT provides extensive annotations to putative loss-of-function variants (LoF) in protein-coding genes including functional, evolutionary and network features (integrative score).

  • Name
    aloft_description
    Type
    string
    Description

    ALoFT annotation can predict the impact of premature stop variants and classify them as dominant disease-causing, recessive disease-causing and benign variants (integrative score).

  • Name
    funseq_value
    Type
    string
    Description

    A flexible framework to prioritize regulatory mutations from cancer genome sequencing (integrative score).

  • Name
    funseq_description
    Type
    string
    Description

    Funseq annotation pints out whether given mutation falls in coding or non-coding region (integrative score).

  • Name
    filter_value
    Type
    string
    Description

    Filter value. Low: Low quality regions as determined by gnomAD sequencing metrics. Mappability 0.5;overlap with 50nt simple repeat;ReadPosRankSum>1;0 SNVs in 100bp window. SFS_bump: Pentamer context with abnormal SFS. The fraction of high-frequency SNVS MAF between 0.2 and 0.0005 is greater than 1.5x mutation rate controlled average. Tends to be repetitive contexts. TFBS: Transcription factor binding site as determined by overlap with ChIP-seq peaks.

  • Name
    pn
    Type
    string
    Description

    Pentanucleotide context.

  • Name
    mr
    Type
    null.Float
    Description

    Roulette mutation rate estimate.

  • Name
    ar
    Type
    null.Float
    Description

    Adjusted Roulette mutation rate estimate.

  • Name
    mg
    Type
    null.Float
    Description

    gnomAD mutation rate estimate (Karczewski et al. 2020).

  • Name
    mc
    Type
    null.Float
    Description

    Carlson mutation rate estimate (Carlson et al. 2018).