Synthetic biology–creating biological resources from information resources
20 August 2010
Information engineering at RIKEN extends the evolution of life into the digital realm
RIKEN Bioinformatics and Systems Engineering Division (BASE)
Databases are becoming increasingly important in the life sciences as a key tool for deriving results. The Bioinformatics and Systems Engineering Division (BASE) is drawing worldwide attention for its SciNeS information infrastructure for handling the vast amounts of data generated through routine research in the life sciences. “A database is not merely a container for data, it is also a place where even life can evolve,” says Tetsuro Toyoda, director of the BASE. “We can create useful biological resources from information resources by selecting useful genes from databases, designing new genomes, and returning them to the world of living organisms. What databases are needed to realize rational organism design? That is the question we attempt to answer.” To inspire creative use of databases for genome design, the BASE is holding its first International Rational Genome Design Contest.
Toward an age of genome design
Toyoda believes that in the future, our key source of resources will shift from oilfields to genomes. For this reason, he calls genomes the “second oilfields”. In the 1990s, Toyoda worked for a private research institute to develop anti-malarial drugs. “In drug development, the key is how to design compound shapes so that the compound can combine perfectly with disease-related proteins and control their functions. The structures of the proteins, however, are so complex that trial-and-error-based design approaches often end in failure. ‘Rational’ design is therefore needed, which involves creating logical programs and designing drugs on the basis of the computed three-dimensional structure of the relevant proteins.” However, even if a compound is designed perfectly, the compound may still not be creatable using the techniques currently available in organic chemistry. This is one of the technological barriers Toyoda has encountered.
In the 2000s, scientists embarked on the sequencing of genomes from a range of organisms, including humans. “The genome carries genetic information in the arrangement of four bases in the gene region, and proteins are produced according to this arrangement. If the arrangement can be ‘designed’, we could be able to design organisms with new functions more reliably. The information resources necessary for such an exercise are now becoming available,” says Toyoda.
Figure 1. Omics-driven evolution expands the cycle of gene evolution to the information realm.
Life is the propagation of genetic information, which evolves in the physical world by repeated replication and selection. The omics technology revolution has enabled the same replication and selection in databases. Useful biological resources can be created by selecting useful genes from the information world and returning them to the world of living organisms under proper safety control standards.
The concept of rational design has also recently begun to draw attention in the field of medicine, where medical scientists are working toward a personalized medicine approach in which drugs and treatments are designed according to the genetics of the patient. Database-supported rational approaches to design are therefore finding applications in various fields and beginning to displace the previous ‘blind’ approaches.
Believing that the age of rational design would soon come to genomics, Toyoda initiated the Genomic Knowledge Base Research Team in the RIKEN Genomic Sciences Center (GSC) in 2001. Around that time, the GSC was working on ambitious genomics projects under the leadership of Akiyoshi Wada, the first director of the GSC. The projects were generating vast amounts of data for the purpose of establishing complete databases for certain organisms in order to construct a comprehensive of each organism. Toyoda saw these databases as a place in which the evolution of new life could occur.
Databases as a realm for the evolution of life
Organisms evolve through the repeated process of replication of genetic information and natural selection. Genetic information is naturally recorded in the structure of DNA and RNA, but the same information can now be recorded in databases (Fig. 1). “Recording media have expanded from the natural physical world to the information world. We are at the point where the medium of life evolution has changed significantly.” Genetic information recorded in databases has been replicated on a global scale through the internet, allowing useful gene information to be plucked from a database. On this basis, could it be possible to create new and useful biological resources by using selected gene information to rationally design a genome that could then be transferred to an organism using DNA-synthesis technology? “A database can be regarded as a place for the replication and selection of genetic information—or a place where the evolution of life occurs,” says Toyoda.
Toyoda’s ideas are not just dreams—life is actually beginning to evolve with the help of databases (Fig. 2). “We selected the genetic information of groups of enzymes that produce the sticky paste of fermented soybeans called γPGA from a database, and rationally designed the genetic information, which was then transferred to the genome of a plant, thus successfully creating a new plant that is drought tolerant.” Induced pluripotent stem (iPS) cells, which are now expected to be applied in regenerative medicine, are also created using the same concept. The original creation of iPS cells was based on the complete record for a particular cDNA, a DNA created from a template mRNA into which a gene region of DNA is transcribed, stored in a RIKEN database. Shinya Yamanaka and his laboratory staff at Kyoto University used the cDNA record to select special genes expressed in embryonic stem (ES) cells and transferred those special genes into grown human skin cells. This process resulted in iPS cells, which, like ES cells, are able to differentiate into any cell type coded for in the organism’s genome.
Figure 2. Rational design of drugs, medicines and genomes on the basis of logically created programs.
Based on the rational design of genomes, Toyoda transferred the genomes of three enzyme groups that can produce γPGA (the sticky paste of fermented soybeans) to the genomes of Arabidopsis thaliana, a plant used often in experiments. γPGA-transferred strains absorb water more efficiently, resulting in a higher survival rate.
“Designing a database is equivalent to designing a place suitable for the evolution of life. To create new biological resources that can support the Earth and society from information resources, we need databases, and we will find it very interesting if a database is designed from the perspective that it is a place for the evolution of life.”
Establishing a data-sharing infrastructure—a global issue
The BASE was inaugurated in April 2008 following a reorganization of the GSC based on advice that came out of the 2006 RIKEN Advisory Council (RAC) meeting. The RAC is an external advisory body consisting of world-leading scientists and eminent individuals from outside of RIKEN. The RAC evaluates the overall activities of RIKEN and delivers its recommendations to the RIKEN president. “In 2006, the RAC pointed out that although high-quality data were provided in each of the 100 or so data-release web sites operated by RIKEN, the data were not presented in an effective way. Everybody was really surprised because they believed their data was disclosed properly.”
Most data-rerelease web sites operated by RIKEN were designed for people who wanted to view data directly; no connections could be made with the database to allow automated data analysis. In that regard, the databases were not being used effectively because there was no systematic system provided to standardize and share the various data sets. They were also insufficient from the perspective of displaying study results. These were the problems that the RAC identified and requested database experts to address, but they were also problems that were common among databases around the world. “The RAC asked RIKEN to solve a problem that had not been solved before, and I happened to be selected as the director responsible for solving the database-related problems.”
At that time, most data-release web sites failed to keep pace with fast-changing web standards. As the number of disorganized web sites increased, information management was quickly spiraling out of control. Database maintenance costs were also becoming a heavy burden. “We needed to integrate our databases, but that was the most difficult issue,” says Toyoda. “I have seen many failures with respect to integration approaches. There are already hundreds of databases, and it is impossible to standardize all of them. So I adopted a new concept and started to develop an integration database consisting of a versatile database container that is compatible with international standards. This container automatically enables the standardization, collection and disclosure of data, and also facilitates data sharing when data is moved into it.”
Figure 3. Total Incubation Center for Databases (SciNeS).
SciNeS provides incubation functions from database construction to the integration of databases in computing clouds or a group of large-scale servers, and discloses databases using interfaces compatible with international standards, thus contributing to the establishment of cyber-infrastructure for integrating worldwide databases.
SciNeS captures the world’s attention
Toyoda started by developing a ‘total incubation infrastructure system’ for life science-related databases called the Scientists’ Networking System, or SciNeS (Fig. 3). “The greatest advantage of SciNeS is the adoption of the semantic web, a next-generation international web standard, and cloud computing.”
The semantic web is an extension of the widely used world wide web (WWW). The WWW is suitable for use by people, who read, understand and search for information by following hyperlinks to documents. However, automated computer-based data analysis is ineffective using hyperlinks because there are no relationships defined among documents. In the semantic web, all data has meaning, and every link refers to the relationships between the data, enabling computers to search data effectively for automated data analysis.
Cloud computing is a complementary technology that provides a new way of using computer applications through web browsers. Researchers do not need to maintain their own servers; they instead prepare a virtual laboratory in the SciNeS and enter their data, which are then processed automatically and disclosed as a database that meets international standards. “Papers are published through the medium of academic journals, but no dedicated medium has been established in the world of databases. SciNeS thus became the world’s first academic medium for databases,” says Toyoda.
As soon as SciNeS was made operational in March 2009, it attracted attention from around the world. “The semantic web has been known for many years, but building large-scale databases for the semantic web was said to be difficult. We succeeded in building such large-scale databases for the first time by adding a new function that enabled security management on a per-item basis.” Database sharing and the financing of maintenance costs are universal issues, but each research body conducts its own activities and deals with its own field-specific characteristic data, so there has been little organization until now. “The world’s attention is now focusing on SciNeS because it is a total incubation infrastructure system for databases that can be used by all fields based on the semantic web and cloud computing.”
International Rational Genome Design Contest
The virtual laboratories in SciNeS can be used for many purposes: as a substitute for personal databases, a repository for electronic laboratory notes of unreported data, or for joint research or ‘medical clouds’—an electronic health chart network among medical specialists and clinicians. Such uses are supported by the per-item security function. Another use for SciNeS virtual laboratories is the International Rational Genome Design Contest, or GenoCon. “RoboCon is a well-known robot contest in which individually developed robots compete on the basis of excellence in certain skills. GenoCon is the life-science version, where researchers are expected to compete on the excellence of their rational skills in designing genome base sequences.”
GenoCon has been running since the end of May 2010 and will continue through to the end of September 2010. The assignment: to design a DNA sequence conferring to the model organism Arabidopsis thaliana the functionality to effectively eliminate and detoxify airborne formaldehyde, which causes sick building syndrome. Participants need to take advantage of genome and protein databases in SciNeS to find out which genes should be optimized to enhance functionality for eliminating and detoxifying airborne formaldehyde. They also need to program via a web browser in order to rationally design part of the genome. The best design results will be used by RIKEN and other research institutes and actually transferred into a plant for functional verification under proper statutory safety control standards. The invitation to participate has been extended not only to researchers and university students in Japan and around the world, but also to high-school students. “I will be pleased if GenoCon could give high-school students with good programming skills the opportunity to become interested in life science and join the world of life sciences to become ‘genome designers’. Many useful genes have been patented, but all current patents will expire by 2030, and this will bring about a genome design boom. Genome designer will become a glamorous job in the near future.”
GenoCon will provide participants with the opportunity to enjoy the most advanced science and take on an open optimization challenge. Although genome designs for conferring the functionality to effectively eliminate and detoxify airborne formaldehyde to a plant have been published and some even patented, there may be better embodiments of the technology. The contest aims to search for better and more suitable embodiments with easier practical applications.
Toyoda is also a member of the RIKEN Biomass Engineering Program, which was initiated in April 2010. Through the program, Toyoda aims to improve the efficiency of producing bioplastic materials based on rational genome design methods for plants. Genome design methods and programs collected through GenoCon would also be used for that purpose.
“We intend to establish infrastructure for synthetic biology,” says Toyoda. “Synthetic biology is a newly emerging field of science in which bioinformatics and biology are combined, and deals with the whole range of information and biological resources. We are now required to use the collaboration network of SciNeS to connect groups at RIKEN’s technical bases, and to establish a structure that enables the creation of useful biological resources as a social asset from information resources. To begin with, I want to create an easily grown plant that can yield environmentally friendly bioplastic materials.”