Bioinformatics and computational modeling are essential to meeting the challenges and developing the potential of synthetic biology. A new generation of computer-aided design tools for validation of synthetic networks is rapidly emerging.
Synthetic biology brings the life sciences together with engineering to design and optimize cellular behavior. By building biological systems from component parts, synthetic biology is expected to produce proteins and molecules with specifically engineered functions by designing or reprogramming genetic circuits and metabolic networks. The market potential for products resulting from synthetic biology technology is high and the impact on healthcare will be significant. Simpler, lower-cost production of biopharmaceuticals, novel drug delivery systems and effective therapies for chronic illnesses are examples. Synthetic biology also provides promising solutions to environmental problems such as sustainable, economical production of chemicals and biofuels, and bioremediation using engineered microorganisms. In order to realize this potential, partnerships and collaborations between industry and academia as well as between firms will be essential.
The engineering challenge of synthetic biology lies in the design of genetic networks that will rewire cells to produce the expected molecules in a timely way. Modeling the synthetic networks and simulating their behavior prior to their implementation is a critical step made possible by bioinformatics, which has become the indispensable computer-aided design tool of synthetic biology.
In order to perform meaningful, systems-level analyses and design viable predictive models, it is necessary to access and connect diverse data on the networks of interacting entities involved. Data integration therefore becomes essential for dealing with the complexity of these networks and extracting useful knowledge quickly. Integrated software and databases that connect and structure information on the links between genes, proteins and biochemical data – compounds, reactions and pathways – enable researchers to productively analyze their data.
Genostar’s MicroB database and Metabolic Pathway Builder (MPB) software integrate powerful, user-friendly methods for effective navigation, querying and analysis of genomes, proteomes and metabolomes. MicroB gathers genomic, proteic and metabolic data extracted from multiple reference data sources including GenomeReviews (EMBL), UniProtKB and KEGG, and connects these data together with the help of functional classifications such as gene ontology (GO) and the enzyme classification from Enzyme. MicroB was developed in partnership with the Swiss Institute of Bioinformatics (SIB) and the French National Institute for Research in Computer Science and Control (INRIA).
Metabolic Pathway Builder (MPB) software integrates public and proprietary methods to facilitate comparative genomics and differential metabolic analyses. Genomic sequences can be annotated or reannotated through the application of CDS prediction methods which have been optimized for prokaryotic organisms. The functions of the genes can be characterized through comparison with the reference genomes and proteomes of MicroB, or by applying specific predictive methods, as it is the case with enzymatic activities.
The genomic sequences of two strains of Listeria monocytogenes are compared. The conserved and specific regions have been identified. One of the strains has been reannotated and the predicted CDS are displayed on the map, together with the coding probability curve.
Dedicated visualizers allow for the interactive display of data and results and offer a real advantage to the biologist. For instance, it becomes easy to use different colors to highlight, in metabolic maps, the reactions which are known to be catalyzed in a set of organisms. Typically, this set includes the organism for which the genome has been newly sequenced or reannotated, together with other reference organisms described in MicroB. Organism-specific reactions are readily identified, either visually or through systematic queries, together with the genes which code for the enzymes. Conserved syntenies between selected organisms can also be computed and displayed.
Reactions which are catalyzed in Listeria monocytogenes, L. innocua and L. welshimeri are highlighted (respectively in red, green and yellow) in a KEGG metabolic map. A metabolic pathway appears to be complete in L. monocytogenes, and partial in L. innocua. In the genomic maps, the results of the computation of conserved syntenies are displayed as colored rectangles
MicroB is a relational database and the set of queries it supports is open. However, built-in queries are offered by Metabolic Pathway Builder for the most frequent cases. It is thus straightforward to search for a set of compounds which share a chemical structure, to retrieve the metabolic reactions which involve these compounds, and then the genes which code for the associated enzymes.
When extended with Genostar's Expression Data Solution, Metabolic Pathway Builder becomes a very powerful tool for differential analysis of expression data. Statistical methods help characterize the functions of the genes and select restricted sets for further analysis. These sets of genes, together with their expression levels, can be mapped and displayed on metabolic pathways, for one or several organisms simultaneously, for a given point in time or a succession of measurements.
Expression data can be mapped on metabolic pathways and on chromosomes. The expression level is displayed as a line whose thickness is proportional to the level. It can also be displayed in the genomic map: each CDS is colorized accordingly.
Exploration, querying and visualization of interacting entities are necessary steps in designing biological systems. The next step is to understand and predict their dynamics through mathematical modeling and simulation. Genostar’s Genetic Network Analyzer (GNA) software was designed to build, simulate and validate molecular regulatory network models. Many current computational modeling tools require significant, quantitative data in order to accurately simulate the dynamical system being studied. The state of the art at present, however, is such these quantitative data are frequently unavailable. GNA provides an elegant solution to this problem and offers an excellent balance between data requirements and predictive capabilities through the use of well-validated, qualitative modeling technology developed by the INRIA.
Each gene is associated with a state variable which measures the expression level of the gene as the concentration of its product, be it mRNA or protein. A differential equation describes how this concentration varies as influenced by the concentrations of the products of the other genes, i.e. by the values of the other state variables. GNA uses a restricted class of differential equations – piece-wise linear differential equations – so that the qualitative computation of the behavior of these state variables is possible without the numerical values of the various parameters which appear in the equations. The knowledge of the inequalities between these parameters is indeed sufficient for GNA to predict the behaviors of the genetic network. Of course, the trajectories are also qualitative and the behaviors are not deterministic: a trajectory is only one of all the possible paths in the graph of the successive states.
The simulation of a GNA model generates a graph of the successive states of its possible behaviors. The qualitative models are not deterministic any more: one state may be followed by several ones. By selecting a path between an initial state and a terminal state, a particular behavior can be studied. For each gene, the evolution of the concentration of its product in time, together with the sign of its derivative, is displayed. The model-checker helps in exploring the complex state graph and checks if a behavior can or cannot be generated by the model.
GNA also provides a powerful, model-checking tool for comparing these behaviors with the experimental expression data, i.e. with the observed behaviors. The model-checker is able to test, with a friendly interface, whether a trajectory described in an adequate formalism can or cannot be produced by a GNA model (G. Batt, D. Ropers, H. de Jong, J. Geiselmann, R. Mateescu, M. Page, D. Schneider (2005), Validation of qualitative models of genetic regulatory networks by model checking: Analysis of the nutritional stress response in Escherichia coli, Bioinformatics, 21(Suppl 1) :i19-i28)
GNA has been validated in numerous modeling projects. D. Ropers et al. describe and simulate the interaction network involved in the carbon starvation stress of E. coli (2006, BioSystems, 84(2):124-152). GNA has been used by many research laboratories worldwide for modeling and understanding phenomena and systems, including the biochemical network underlying quorum sensing in human-pathogenic Pseudomonas aeruginosa (A. Usseglio Viretta, M. Fussenegger, 2004, Biotechnology Progress, 20(3):670-678), or the infectious transition of the bacteria E. chrysanthemi when hosted in a plant (J-A. Sepulchre, S. Reverchon, W. Nasser (2007), 'Modelling the onset of virulence in a pectinolytic bacterium,' Journal of Theoretical Biology, 44(2):239-257).
Modeling and simulation of dynamical systems are thus confirmed as powerful tools for understanding molecular networks, shall they be natural or the result of a synthetic biology design process.
Synthetic biology has tremendous potential to improve the environment and healthcare. The importance of partnerships that facilitate the sharing of knowledge necessary to realize this potential, both between firms and between industry and academia, is clear. Genostar participates in COBIOS, a European-funded project which aims to develop well-characterized, engineered, synthetic biology devices for therapeutic use, in particular for timely insulin delivery. The COBIOS project brings together biologists, mathematicians and bioinformaticians from academic institutions and private industry in four European countries and the United States to meet the goal of designing, verifying and validating synthetic genetic networks.
Designing a molecular network requires interactive tools for editing the network, generating a set of equations and simulating the resulting dynamic models. It is important to keep track of the links between the nodes and the edges of the network with the biological entities and relationships they represent.
In this framework, Genostar has developed editing tools for the incremental building of molecular networks. Every node of a network is related to an object in the Metabolic Pathway Builder workspace, so that the resulting GNA model remains connected to its supporting data.
Comparative genomic analysis, gene function characterization, expression analysis, modeling and simulation of molecular networks: the suite composed of Metabolic Pathway Builder and GNA, in connection to the MicroB reference database, prefigures the structure of efficient computer-aided design tools for synthetic biology.