The Journal of Biological Physics and Chemistry


Volume 5, Number 4, pp. 121-128

The theoretical basis of universal identification systems for bacteria and viruses

S. Chumakov 1, C. Belapurkar 1, C. Putonti1, T.-B. Li 1, B.M. Pettitt 1,2,3, G.E. Fox 3,4, R.C. Willson 3,4 and Yu. Fofanov 1,2

1 Department of Computer Science, University of Houston, Houston, TX 77204, USA
2 Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA
3 Department of Chemistry, University of Houston, Houston, TX 77204, USA
4 Department of Chemical Engineering, University of Houston, Houston, TX 77204, USA

It is shownthat the presence/absence pattern of 1000 random oligomers of length12–13 in a bacterial genome is sufficiently characteristic to readily andunambiguously distinguish any known bacterial genome from any other. Evengenomes of extremely closely-related organisms, such as strains of the samespecies, can be thus distinguished. One evident way to implement this approachin a practical assay is with hybridization arrays. It is envisioned that asingle universal array can be readily designed that would allow identificationof any bacterium that appears in a database of known patterns. We performed insilico experiments to testthis idea. Calculations utilizing 105 publicly-available completely-sequencedmicrobial genomes allowed us to determine appropriate values of the testoligonucleotide length, n,and the number of probe sequences. Randomly chosen n-mers with a constant G + C content wereused to form an in silicoarray and verify (a) how many n-mers from each genome would hybridize on this chip, and (b) howdifferent the fingerprints of different genomes would be. With the appropriatechoice of random oligomer length, the same approach can also be used toidentify viral or eukaryotic genomes.

Keywords: microbe identification, oligonucleotidemicroarray fingerprinting, species identification

back to contents