E-value identity bitscore


The E-value provides information about the likelihood that a given sequence match is purely by chance. The lower the E-value, the less likely the database match is a result of random chance and therefore the more significant the match is.

Empirical interpretation of the E-value is as follows:

If E-value < 1e-50 (or 1 X 10-50), there should be an extremely high confidence that the database match is a result of homologous relationships.

If E-value is between 0.01 and 1e-50, the match can be considered a result of homology.

If E-value is between 10 and 0.01, the match is considered not significant, but may hint at a tentative remote homology relationship. Additional evidence is needed to confirm the tentative relationship.

If E-value > 10, the sequences under consideration are either unralated or related by extremely distant realtionships that fall below the limit of detection with the current method.

Because the E-value is proportionally affected by the database size, an obvious problem is that as the database grows, the E-value for a given sequence match also increases.

Because the genuine evolutionary relationship beween the two sequence remains constant, the decrease in credibility of the sequence match as the database grows means that one may "lose" previously detected homologs as the database enlarges. Thus, an alternative to E-value calculations is needed.

The E-value is very important, the lower the better



A bitscore is another prominant statistical indicator used in addition to the E-value in a BLAST output. The bitscore measures sequence similarity independent of query sequence length and database size and is normalized based on the raw pairwise alignment score. The bitscore (S) is determined by the following formula: S = (λ * S - lnK) / ln2  where λ is the Gumble distribution constant, S is the raw alignment score, and K is a constant associated with the scoring matrix used. Clearly, the bitscore (S) is linearly related to the raw alignment score (S). Thus, the higher the bit score, the more highly significant the match is. The bit score provides a constant statistical indicator for  searching different databases of different size or for searching the same database at different times as the database enlarges.



Identity 35% means that 35% of AA in your sequence match to other sequences in database, There isn't something like "acceptable percentage". It always depends on what you are looking for:

If you have unkown protein sequence and you would like to know the homology sequences, information about identity (even 35%) is valuable.

If you have known protein and you need to confirm the sequence, the identity 35% is small and may suggest that something went wrong during your analysis.


posted on 2019-08-14 15:34  0820LL  阅读(355)  评论(0编辑  收藏  举报