Detailed Information for Outputted Files from Somatic Mutation Annotators(annovar 注释文件条目详细解释)
We provide here detailed Description about the files outputted from the somatic mutation annotators via ANNOVAR and SnpEff.
*_annoTable.txtfrom the annotator via ANNOVAR
| Column Names | Description | 
|---|---|
| Chr | Chromosome number | 
| Start | Start position | 
| End | End position | 
| Ref | Reference base(s) | 
| Alt | Alternate non-reference alleles called on at least one of the samples | 
| COSMIC ID | COSMIC ID | 
| Func.refGene | Regions (e.g., exonic, intronic, non-coding RNA)) that one variant hits; please click here for details. | 
| Gene.refGene | Gene name associated with one variant | 
| ExonicFunc.refGene | Exonic variant function, e.g., nonsynonymous, synonymous, frameshift insertion.please click here for details. | 
| AAChange.refGene | Amino acid change. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change. | 
| SIFT_score | SIFT score. See the dbNSFP information table for details. | 
| SIFT_pred | SIFT prediction. See the dbNSFP information table for details. | 
| Polyphen2_HDIV_score | Pholyphen2 score based on HDIV. See the dbNSFP information table for details. | 
| Polyphen2_HDIV_pred | Pholyphen2 prediction based on HDIV. See the dbNSFP information table for details. | 
| Polyphen2_HVAR_score | Polyphen2 score based on HVAR. See the dbNSFP information table for details. | 
| Polyphen2_HVAR_pred | Polyphen2 prediction based on HVAR. See the dbNSFP information table for details. | 
| LRT_score | LRT score. See the dbNSFP information table for details. | 
| LRT_pred | LRT prediction. See the dbNSFP information table for details. | 
| MutationTaster_score | MutationTaster score. See the dbNSFP information table for details. | 
| MutationTaster_pred | MutationTaster prediction. See the dbNSFP information table for details. | 
| MutationAssessor_score | MutationTaster score. See the dbNSFP information table for details. | 
| MutationAssessor_pred | MutationTaster prediction. See the dbNSFP information table for details. | 
| FATHMM_score | FATHMM score. See the dbNSFP information table for details. | 
| FATHMM_pred | FATHMM prediction. See the dbNSFP information table for details. | 
| PROVEAN_score | PROVEAN score<. See the dbNSFP information table for details./td> | 
| PROVEAN_pred | PROVEAN prediction. See the dbNSFP information table for details. | 
| VEST3_score | VEST V3 score. See the dbNSFP information table for details. | 
| CADD_raw | CADD raw score. See the dbNSFP information table for details. | 
| CADD_phred | CADD phred-like score. See the dbNSFP information table for details. | 
| DANN_score | DANN score. See the dbNSFP information table for details. | 
| fathmm-MKL_coding_score | fathmm-MKL score for one coding variant. See the dbNSFP information table for details. | 
| fathmm-MKL_coding_pred | fathmm-MKL prediction for one coding variant. See the dbNSFP information table for details. | 
| MetaSVM_score | MetaSVM score. See the dbNSFP information table for details. | 
| MetaSVM_pred | MetaSVM prediction. See the dbNSFP information table for details. | 
| MetaLR_score | MetaLR score. See the dbNSFP information table for details. | 
| MetaLR_pred | MetaLR prediction. See the dbNSFP information table for details. | 
| integrated_fitCons_score | fitCons score<. See the dbNSFP information table for details./td> | 
| integrated_confidence_value | confidence level. See the dbNSFP information table for details. | 
| GERP++_RS | GREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details. | 
| phyloP7way_vertebrate | Phylogenetic p-values for 7 vertebrate species. See the dbNSFP information table for details. | 
| phyloP20way_mammalian | Phylogenetic p-values for 20 mammalian species. See the dbNSFP information table for details. | 
| phastCons7way_vertebrate | PhastCons score for 7 vertebrate species. See the dbNSFP information table for details. | 
| phastCons20way_mammalian | phastCons p-values for 20 mammalian species. See the dbNSFP information table for details. | 
| SiPhy_29way_logOdds | SiPhy log odds score for 29 species. See the dbNSFP information table for details. | 
*_annoTable.txtfrom the annotator via SnpEff
| Column Names | Description | 
|---|---|
| CHROM | Chromosome number | 
| POS | Position | 
| ID | semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). | 
| REF | Reference base(s) | 
| ALT | Alternate non-reference alleles called on at least one of the samples | 
| EFFECT | Functional consequences of one variant, e.g., missense_variant, synonymous_variant. please clickhere for details. | 
| REGION | Regions (e.g., exonic, intronic) that one variant hits | 
| IMPACT | Putative impact of the variant (e.g. HIGH, MODERATE or LOW impact). | 
| GENE | Gene name (usually HUGO) | 
| GENEID | Gene ID) | 
| FEATURE | The type of feature is in the next field (e.g. transcript, motif, miRNA, etc.) | 
| FEATUREID | Transcript ID (preferably using version number), Motif ID, miRNA, ChipSeq peak, Histone mark, depending on the annotation. | 
| BIOTYPE | Description on whether the transcript is {“Coding”, “Noncoding”}. Whenever possible, use ENSEMBL biotypes. . | 
| HGVS_C | Variant using HGVS notation (DNA level). For example, c.352A>G stands for A to G substitution of nucleotide 352. Click here for details. | 
| HGVS_P | Coding variant using HGVS notation (Protein level). For example, p.Ile118Val stands for Isoleucine at position number 66 substitution to Valine. p.Ile118Val can be also be represented by p.I118V using the 1-letter symbol here. Click here for details. | 
| SIFT_score | SIFT score. See the dbNSFP information table for details. | 
| SIFT_pred | SIFT prediction. See the dbNSFP information table for details. | 
| Polyphen2_HDIV_score | Pholyphen2 score based on HDIV. See the dbNSFP information table for details. | 
| Polyphen2_HDIV_pred | Pholyphen2 prediction based on HDIV. See the dbNSFP information table for details. | 
| Polyphen2_HVAR_score | Polyphen2 score based on HVAR. See the dbNSFP information table for details. | 
| Polyphen2_HVAR_pred | Polyphen2 prediction based on HVAR. See the dbNSFP information table for details. | 
| LRT_score | LRT score. See the dbNSFP information table for details. | 
| LRT_pred | LRT prediction. See the dbNSFP information table for details. | 
| MutationTaster_score | MutationTaster score. See the dbNSFP information table for details. | 
| MutationTaster_pred | MutationTaster prediction. See the dbNSFP information table for details. | 
| MutationAssessor_score | MutationAssessor score. See the dbNSFP information table for details. | 
| MutationAssessor_pred | MutationAssessor prediction. See the dbNSFP information table for details. | 
| FATHMM_score | FATHMM score. See the dbNSFP information table for details. | 
| FATHMM_pred | FATHMM prediction. See the dbNSFP information table for details. | 
| PROVEAN_score | PROVEAN score<. See the dbNSFP information table for details./td> | 
| PROVEAN_pred | PROVEAN prediction. See the dbNSFP information table for details. | 
| VEST3_score | VEST V3 score. See the dbNSFP information table for details. | 
| CADD_raw | CADD raw score. See the dbNSFP information table for details. | 
| CADD_phred | CADD phred-like score. See the dbNSFP information table for details. | 
| MetaSVM_score | MetaSVM score. See the dbNSFP information table for details. | 
| MetaSVM_pred | MetaSVM prediction. See the dbNSFP information table for details. | 
| MetaLR_score | MetaLR score. See the dbNSFP information table for details. | 
| MetaLR_pred | MetaLR prediction. See the dbNSFP information table for details. | 
| GERP++_NR | GREP++ conservation score. See the dbNSFP information table for details. | 
| GERP++_RS | GREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details. | 
| phyloP100way_vertebrate | Phylogenetic p-values for 100 vertebrate species. See the dbNSFP information table for details. | 
| phastCons100way_vertebrate | PhastCons score for 7 vertebrate species. See the dbNSFP information table for details. | 
| SiPhy_29way_logOdds | SiPhy log odds score for 29 species. See the dbNSFP information table for details. | 
*_genelist.txtfrom the annotators via ANNOVAR and SnpEff
| Column Names | Description | 
|---|---|
| Gene | Gene name associated with each variant; one gene name may correspond to several variants. | 
| Mutations | Amino acid change information. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change.. | 
- dbNSFP Information
 
| Columns of Annotations from dbNSFP Database | Pediction Algorithm/Conservation Score | Description | Method | Categorical Prediction | Author(s) | 
|---|---|---|---|---|---|
| SIFT_pred  SIFT_score  | 
SIFT | Sort intolerated from tolerated | P(An amino acid at a position is tolerated | The most frequentest amino acid being tolerated) | D: Deleterious (sift<=0.05); T: tolerated (sift>0.05)  | 
Pauline Ng, Fred Hutchinson  Cancer Research Center, Seattle, Washington  | 
| Polyphen2_HDIV_pred  Polyphen2_HDIV_score  | 
Polyphen v2 | Polymorphism phenotyping v2 | D: Probably damaging (>=0.957),  P: possibly damaging (0.453<=pp2_hdiv<=0.956), B: benign (pp2_hdiv<=0.452)  | 
Probablistic Classifier Training sets: HumDiv | Havard Medical School/td> | 
| Polyphen2_HVAR_pred Polyphen2_HVAR_score  | 
Polyphen v2 | Polymorphism phenotyping v2 | Machine learning Training sets: HumVar | D: Probably damaging (>=0.957),  P: possibly damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452)  | 
Shamil Sunyaev Havard Medical School  | 
| LRT_pred  LRT_score  | 
LRT | Likelihood ratio test | LRT of H0: each codon evolves neutrally vs H1: the codon evovles under negative selection | D: Deleterious;  N: Neutral; U: Unknown Lower scores are more deleterious  | 
Sung Chung, Justin Fay Washington University | 
| MutationTaster_pred  MutationTaster_score  | 
MutationTaster | Bayes Classifier | A: (""disease_causing_automatic"");  D: (""disease_causing""); N: (""polymorphism [probably harmless]""); P: (""polymorphism_automatic[known to be harmless]" higher values are more deleterious"  | 
Markus Schuelke the Charité - Universitätsmedizin Berlin  | 
|
| MutationAssessor_pred  MutationAssessor_score  | 
MutationAssessor | Entropy of multiple sequence alighnment | H: high;  M: medium; L: low; N: neutral. H/M means functional and L/N means non-functional higher values are more deleterious  | 
Reva Boris Computation Biology Center Memorial Sloan Kettering Cancer Center  | 
|
| FATHMM_pred  FATHMM_score  | 
FATHMM | HMM | Functional analysis through hidden markov model HMM | D: Deleterious;  T: Tolerated; lower values are more deleterious  | 
Shihab Hashem University of Bristol, UK  | 
| PROVEAN_pred  PROVEAN_score  | 
Protein Variation Effect Analyzer | Clustering of homologus sequences | D: Deleterious;  N: Neutral higher values are more deleterious  | 
Choi Y J. Craig Venter Institute | |
| VEST3_score | VEST V3 | Variant effect scoring tool | Random forest classifier | higher values are more deleterious | Rachel Karchin John Hopkins University | 
| CADD_raw CADD_phred | CADD Combined annotation dependent depletion | Linear kernel SVM | higher values are more deleterious | Jay Shendure, Xiaohui Xie University of California - Irvine | |
| DANN_score | DANN | Deleterious Annotation of genetic variants using Neural Networks | Neural network | higher values are more deleterious | Jay Shendure, Xiaohui Xie University of California - Irvine  | 
| fathmm-MKL_coding_pred | FATHMM-MKL | predicting the effects of both coding and non-coding variants using nucleotide-based HMMs | Classifier based on multiple kernel learning | D: Deleterious;  T: Tolerated Score >= 0.5: D; Score < 0.5: T  | 
Shihab Hashem University of Bristol, UK  | 
| MetaSVM_pred  MetaSVM_score  | 
MetaSVM | Support vector machine | D: Deleterious; T: Tolerated; higher scores are more deleterious  | 
Coco Dong USC Biostatiscs Department  | 
|
| MetaLR_pred  MetaLR_score  | 
MetaLR | Logistic regression | D: Deleterious;  T: Tolerated; higher scores are more deleterious  | 
Coco Dong  USC Biostatiscs Department  | 
|
| integrated_fitCons_score  integrated_confidence_value  | 
FitCons | Fitness consequences of functional annotation | Integrate functional assays like ChIP-Seq with conservation measure of transcription factor binding sites | higher scores are more deleterious | Abriza Cold Spring Harbor Lab  | 
| GERP++_RS GERP++_NR  | 
Genome Evolutionary Rate Profiling ++ | maximum likelihood estimation procedure | higher scores are more deleterious | Eugne Davydov Stanford University, CS Department  | 
|
| phyloP7way_vertebrate | PhyloP | Phylogentic p-values | Phylogentic p-values calculated from a LRT, score-based test, GERP test Use 7 species | higher scores are more deleterious | Adam Siepel  UCSC  | 
| phyloP20way_mammalian | PhyloP | Phylogentic p-values | a phylogenetic hidden Markov model (phylo-HMM) Use 20 species | higher scores are more deleterious | Adam Siepel UCSC  | 
| phastCons7way_vertebrate | phastCons | A phylogenetic hidden Markov model (phylo-HMM) Use 7 species | higher scores are more deleterious | Adam Siepel UCSC  | 
|
| phastCons20way_mammalian | phastCons | a phylogenetic hidden Markov model (phylo-HMM) Use 20 species | higher scores are more deleterious | Adam Siepel UCSC  | 
|
| SiPhy_29_way | SiPhy | Probablistic framework, HMM Use 29 species | higher scores are more deleterious | Manual Garber Broad Institute of MIT & Harvard  | 
                    
                
                
            
        
浙公网安备 33010602011771号