The frequency of synonymous, conservative and non-conservative locations in sequence space
Types of locations in sequence space
We will divide sequence space into just three types of locations, called 'synonymous', 'conservative', and 'non-conservative'.
Synonymous locations are those where none of the mutations (relative to the master sequence) changes the encoded amino acid.
We will ignore the effects due to changing the RNA itself, for this and the other types of locations. This assumption is obviously not generally true, given that codon preference is commonly observed, and that this series of pages is primarily concerned with viral cis-acting sequences. However, 'wobble' positions constitute at most about one-third of the genome, and the cis-acting sequences constitute an even smaller fraction of the viral genome. Any distortion due to ignoring them might be tolerable if all we seek is an overall view of the local sequence space.
There are 9 possible point mutations for each of 64 codons, to give a total of 576 possible point mutations. 135 of them are synonymous. So the frequency of synonymous point mutations,Y, is 135 / 576 = 0.234.
Conservative locations are those where at least one of the mutations does change the encoded amino acid, but the change is 'conservative'.
Recognizing that there are differing criteria for what constitutes a conservative replacement, one estimate of the likelihood of conservative replacements can be obtained by examining the Dayhoff and Gonnet PAM250 log-likelihood matrices. If we make the generous assumption that any non-negative value in the matrices indicates a conservative replacement, then the 'average' amino acid residues can be conservatively replaced by circa 6 other residues (the actual values are 6.75 and 5.95 for the Dayhoff and Gonnet matrices, respectively). With this assumption, the fraction of all replacement changes that are conservative = 6 / 20 = 0.3. (Other definitions of what constitutes a conservative change can be more stringent, and will yield a smaller value). Thus, the frequency of conservative mutations = C = 0.766 x 0.3 = 0.230.
Non-conservative locations are those where at least one of the mutations does change the encoded amino acid, and the change is 'non-conservative' (using the same criterion as above to decide on what is a non-conservative change).
By elimination, the frequency of non-conservative mutations = 1 - Y - C = 0.536.
This classification encapsulates the simplifying assumption that there is a hierarchy of effect on fitness. Intuitively, we expect that, on average, synonymous changes should have the little or no effect on fitness, that conservative changes should have larger (possibly beneficial) effects, and that non-conservative changes should have the largest (usually deleterious) effects.
Using this classification, we now derive the frequency of each type of location in the local sequence space.
Frequency of non-conservative locations
By definition (above), non-conservative locations correspond to genotypes with one or more non-conservative mutations.
Thus, for error class = E, they are locations with
- 1 non-conservative mutation in combination with (E-1) synonymous and/or conservative mutations,
each of which has a frequency of R1(Y+C)E-1;
plus those with
- 2 non-conservative mutations in combination with (E-2) synonymous and/or conservative mutations
each of which has a frequency of R2(Y+C)E-2;
plus those with
- 3 non-conservative mutations in combination with (E-3) synonymous and/or conservative mutations
each of which has a frequency of R3(Y+C)E-3;
etc.,
- up to those with all E mutations being non-conservative
which has a frequency of RE(Y+C)0 = RE.
In general, these locations (genotypes) have i non-conservative changes and (E-i) synonymous and/or conservative mutations, where i is between 1 and E.
Their aggregate frequency in sequence space is
[eqn 1]
after accounting for the number of each specifc combination (just like when we derived the frequency of mutants in an error class).
Frequency of conservative locations
This is derived in the same way, except that we need to exclude the non-conservative mutations.
We use eqn 1, but with the appropriate variables:

to get the frequency of all locations with one or more conservative mutations. Some of these include one or more non-conservative mutations, that are already accounted for above. We therefor omit them, by omitting all terms that contain R, to obtain the frequency of conservative locations:
Frequency of synonymous locations
We again use eqn 1, with the appropriate variables, and omit terms containing C or R. Since all that remains is the term where all E mutations are synonymous, the equation simplifies to:
YE
for the frequency of synonymous locations.
|