Calculation of the Identification
The basic principles of the numerical identification of bacteria have been described by Lapage et al. 1973.The principle is illustrated in the following example of a small matrix with 3 species and 4 biochemical reactions.
Matrix Values
Biochemical Reactions |
1 |
2 |
3 |
4 |
Species 1 |
0.91 |
0.99 |
0.77 |
0.34 |
The reaction pattern of the organism to be identified is compared with each line of the matrix. Each line represents a genus a species or biogroup of a species.
To calculate the identification probability for each species , the matrix value for a specific reaction is multiplied with the matrix values of all the other reactions in the matrix. In the case of negative results, the negative probability is calculated, which is the difference obtained when the matrix value is substracted from 1.00.
Species 1 Species 2 Species 3 |
0.91 x
(1-0.99) x 0.77 x (1-0.34) =
0.0046 0.99 x (1-0.01) x 0.98 x (1-0.02) = 0.9413 0.01 x (1-0.01) x 0.99 x (1-0.11) = 0.0087 sum = 0.9546 |
To interrelate the likelihood of one species to the likelihood of one species to the likelihood of all the other species, the likelihoods are ‘normalized’, i. e., the sum over all likelihoods = 100 %.
To calculate the Normalized Likelihood (NL) for a species, the likelihood is divided by the sum of the likelihood is divided by the sum of the likelihood of all species in the database and multiplied by 100. The result is expressed as a percentage.
Species 1 Species 2 Species 3 |
0.0046 / 0.9546
x 100 = 0.48 % 0.9413 / 0.9546 x 100 = 98.61 % 0.0097 / 0.9546 x 100 = 0.91 % |
The Unknown strain is selected to be the same species as the one with the highest
Normalized Likelihood in the matrix. In this example the strain is identified as species 2
with a likelihood of 98.61%.
The value of the Normalized Likelihood indicates the extent of similarity between the reaction pattern of the test organism and the reaction patterns of the species in the matrix. However, it may happen that even a very atypical reaction pattern will be much more similar to a certain matrix row than to all other rows and can thus receive high percentages. The result may be a high Normalized Likelihood resulting in false identification. To measure the correct match between the test organism and the reaction pattern in the matrix, the Modal Likelihood Fraction is given as an additional criterion for the quality of the identification.
As the Normalized Likelihood represents the relative similarity of the unknown organism with each species of the matrix, the Modal Likelihood Fraction proves the absolute degree of affinity to the species in the matrix.
The maximum possible likelihood is calculated as follows
Species 1 Species 2 Species 3 |
0.91 x 0.99
x 0.77 x
(1-0.34) = 0.4578 0.99 x (1-0.01) x 0.98 x (1-0.02) = 0.9413 (1-0.01) x (1-0.01) x 0.99 x (1-0.11) = 0.8636 |
The Modal Likelihood Fraction is calculated by dividing the actual obtained likelihood by the maximum possible likelihood.
Species 1 Species 2 Species 3 |
0.0046 / 0.4578 = 0.0100 0.9413 / 0.9413 =1.0 0.0087 / 0836 = 0.0100 |
MLF = 10-² MLF =1 MLF = 10-² |
The example shows that species 2 obtains the Modal likelihood Fraction of 1.0 with all reactions in agreement with the matrix row. Species 1 and 3 with one ‘atypical’ reaction in comparison to the test organism yield MLF values of approximately 10-². Deviations in reactions with matrix values which tend to be ‘variable’ give fewer low MLF values.
The magnitude to the MLF is therefore a measure of the extent of the deviation from the typical reaction patterns. The MLF is 1 if all reaction s agree, and it can become very small if many reactions differ.
The example below illustrates that an organism can be identified with a good Normalized Likelihood, although two out of four reactions differ. The Modal Likelihood Fraction allows these differences to be recognized in that assumes a small value.
The example below illustrates that an organism can be identified with a good Normalized Likelihood, although two out of four reactions differ.
Reactions | 1 | 2 | 3 | 4 |
Species A Species B Species C Test strain |
99 1 1 + |
99 99 1 - |
99 1 99 - |
99 1 1 + |
The Modal Likelihood Fraction allows these differences to be recognized in that it assumes a small value.
Calculations | Likelihood | NL | Max.lik | MLF |
Species A Species B Species C |
0.00010 0.00000099 0.00000099 | 98.01 0.99 0.99 |
0.96060 0.96060 0.96060 |
10-4 10-6 10-6 |
The test strain would be identified as species A with Normalized Likelihood of 98.01 % which would be rated as ‘Good Identification’. The above calculation of the MLF indicates, however, a poor identification, showing that 2 out of 4 reactions are not in agreement. Therefore, the MLF provides an internal check on the accuracy of the identification made by the Normalized Likelihood method.