Feb. 13, 2012
High-throughput DNA sequencing technologies are leading toa revolution in how clinicians diagnose and treat cancer. The molecularprofiles of individual tumors are beginning to be used in the design ofchemotherapeutic programs optimized for the treatment of individual patients. Thereal revolution, however, is coming with the emerging capability toinexpensively and accurately sequence the entire genome of cancers, allowingfor the identification of specific mutations responsible for the disease inindividual patients.
There is only one downside. Those sequencing technologiesprovide massive amounts of data that are not easily processed and translated byscientists. That’s why Georgia Tech has created a new data analysis algorithmthat quickly transforms complex RNA sequence data into usable content forbiologists and clinicians. The RNA-Seq analysis pipeline (R-SAP) was developedby School of Biology Professor John McDonald and Ph.D. Bioinformatics candidateVinay Mittal. Details of the pipeline are published in the journal NucleicAcids Research.
“A major bottleneck in the realization of the dream ofpersonalized medicine is no longer technological. It’s computational,” saidMcDonald, director of Georgia Tech’s newly created Integrated Cancer ResearchCenter. “R-SAP follows a hierarchical decision-making procedure to accurately characterizevarious classes of gene transcripts in cancer samples.”
There are at least 23,000 pieces of RNA in the humangenome that encode the sequence of proteins. Millions of other pieces helpregulate the production of proteins. R-SAP is able to quickly determine everygene’s level of RNA expression and provide information about splice variants,biomarkers and chimeric RNAs. Biologists and clinicians will be able to morereadily use this data to compare the RNA profiles or “transcriptomes” of normalcells with those of individual cancers and thereby be in a better position todevelop optimized personal therapies.
Personalized approaches to cancer medicine are already inwidespread use for a few “cancer biomarkers” including variants of the BRAC 1gene that can be used to identify women with a high risk of developing breastand ovarian cancer.
“Our goal was to design a pipeline that is easilyinstallable with parallel processing capabilities,” said Mittal. “R-SAP canmake 100 million reads in just 90 minutes. Running the program simultaneouslyon multiple CPUs can further decrease that time.”
R-SAP is open source software, freely accessible at theMcDonald Lab website.
“This is another example of Georgia Tech’s ability tomerge computer technology with science to create an essential feature ofnext-generation bioinformatics tools,” said McDonald. “We hope that R-SAP willbe a useful and user-friendly instrument for scientists and clinicians in thefield of cancer biology.”
News Contact
Jason Maderer
Georgia Tech Media Relations
404-385-2966
maderer@gatech.edu