Cloning from a Gene Database: Bioinformatics
The most modern method of gene cloning combines the “traditional” techniques described in the previous pages with advances in robotics and computer database analysis. These methods require some hefty up-front costs to assemble a laboratory capable of this brute force approach to gene identification. However, multiple benefits can be leveraged from these facilities and the genetic data they generate.
The robotics component of database cloning can speed up traditional hands on procedures several hundred fold. A gene cloner can combine normal cloning technology with robotics to make a 20,000 clone cDNA library and then sequence the cDNA insert in each clone. The library is usually tissue and developmental stage specific. The robotics help acquire partial DNA sequence from these clones. These libraries are called expressed sequence tagged or EST libraries. The robotic assembly line takes advantage of the PCR sequencing technique. The procedure is outlined below:
1) Tissue and physiological stage-specific mRNA is isolated.
2) The mRNA is used to make cDNAs.
3) The cDNAs are isolated from bacteria colonies, copied and sequenced using automated PCR-based DNA sequencing techniques.
4) The DNA sequence from each clone is directly read into a computer.
5) The sequencing and data entry proceeds until the computer recognizes that the most recently sequenced cDNAs are from messages that have already been sampled and sequenced. (remember, cDNAs will represent the messages found in a given type of cell. Some mRNAs will be found in high copy number). Thousands of cDNAs are sometimes sequenced.
Now the library can be accessed and analyzed electronically. If a gene clone is desired and a similar gene has been cloned and sequenced previously, the sequence database can be compared to all or part of the known sequence. This is done by the computer. Clones that provide a “matching” sequence (i.e. have some homology or some stretches of homology) are identified. The homology match may be based on DNA sequence or the amino acid sequence that would be encoded by the DNA. The clones are kept in micro titer plates and the well that has the desired clone can be accessed to obtain copies of the desired DNA.
Sometimes this approach can be used to identify genes with possible functions. This takes advantage of the computer programs that identify sequence motifs in a cloned gene. A motif is a stretch of DNA sequence that is found in genes that encode proteins with a common function. These functions may be DNA binding regions in regulatory proteins, or catalytic regions in classes of enzymes.
A gene cloning and sequencing lab with the high throughput and analysis capabilities described above will often serve as a centralized source for cloned genes. A researcher can access the database through their computer, perform the desired sequence comparison inquiries and if a library has a gene with an interesting sequence match, they can request a copy. Large plant and animal biotech companies have invested money and personnel in establishing these types of labs to increase the rate at which genes can be discovered and isolated. Funding committed to centralized genome centers has resulted in their establishment in the public sector as well. The collective knowledge acquired from the discovery of genetic information through the mapping, cloning and sequencing of genes has allowed the state of the art in gene discovery to change rapidly. The analytical power of combining and connecting information will continue to change the state of the art of gene discovery in the future.