In 2020, Borkowski et al.1 published a breakthrough in the field of synthetic biology on active learning-guided optimization of cell-free systems, yet standardization and democratization are highly demanded in the field2. This motivated us to design an intelligible and universal tool for the optimization of biological systems with minimal experimental data. Therefore, we envisioned the METIS active learning workflow i) modular and easy-to-use to everyone, ii) dependent on minimal experimental data, iii) versatile for application on different biological targets with numerical and/or categorical features, and iv) based on Google Colab Python notebooks that do not require installation, registration, and local computation power,
Fig. 1: Overview of the METIS active learning workflow.
We started the work by testing different algorithms as the predictive model of active learning and found that a tree-based gradient boosting algorithm (XGBoost) outperforms others including multilayer perceptrons used by Borkowski et al.1. We verified this model by experimental setup optimizing the E. coli cell-free system using only one-fifth of the data size previously reported1. After finding an efficient model, we built a modular Google Colab notebook structure in a standard way such that separate modules are run stepwise with a few parameters/variables defined by users according to their custom application and laboratory equipment. By sharing this with several colleagues, we incorporated numerous options into the workflow to make it as versatile as we could envision for different applications, biological targets, and laboratory equipment. At this time, we sought to demonstrate the versatility of METIS by using it for optimization of different biological targets.
Prior to this work, in a collaboration with Gorochowski’s team at the University of Bristol, we tested multilayer controller gene circuits in the E. coli cell-free system leading to an improvement of the circuit output rate (Greco et al.3), yet the fold change of the output was low. We sought to optimize the fold change of the gene circuits as the first application of METIS. During active learning, using modules of METIS we saw that some components of the system, although necessary, had a negative effect on the system. We hypothesized and using additional experiments showed that resource competition is the reason for the observed behavior. By replacing the gene causing the resource depletion with a purified protein, we achieved a substantial improvement in the fold change. This is an example of how beyond active learning, METIS can be used for the hypothesis-driven improvement of a system.
We then applied METIS to more distinct examples; i) the sequence of a transcription and translation unit in the cell-free system, also used as prototyping for the in vivo implementation, ii) combinatorial enzyme engineering using a dataset of 800 mutants from a previous study in our lab4 (only simulation), and iii) simulation of the predictive model on a dataset of the PURE (purified recombinant elements) cell-free system5. In these examples, we could demonstrate application of METIS on categorical features rather than numerical (i and ii), simulation using an existing dataset rather than experimental work (ii and iii), and prediction of the target output instead of optimization (iii).
The ever-increasing possibilities of synthetic biology fueled by databases and innovative data process strategies led to designer pathways superior to those found in nature. While mix-and-matching enzymes from different metabolic backgrounds opened up the pathway solution space, it comes with challenges e.g., side reactions in vitro and in vivo. The identification of those side reactions is a tedious work requiring broad knowledge on reaction mechanisms as well as analytical resources for identification of side product formation. In 2016, Schwander et al.6 published the first artificial CO2-fixation (CETCH) cycle in vitro. This pathway was manually optimized over several rounds mainly via testing enzyme homologues, enzyme engineering and metabolic proofreading. Despite being a closed system and rather small compared to the metabolism of cells, the combinatorial space quickly increases to levels uncontrollable for rational optimization.
To explore the combinatorial space of the CETCH cycle we envisioned a screening platform to test hundreds of conditions. Given the complexity of our 27-component assay and the cost of the resources (purified proteins, NADPH, CoA, etc.), the first step was to find a suitable setup to downscale the assay. We used an ECHOⓇ acoustic liquid handler capable of transferring volumes as low as 25 nL. After verifying the compatibility of the ECHOⓇ technology with our components, we successfully reduced the volume of our assay to 10 µL working in 384-well plates.
With this platform in hand, we were confident to generate enough data for METIS to optimize the CETCH cycle. After testing 625 conditions during five iterative rounds, we improved the cycle more than ten-fold compared to the already manually optimized version published in 20166. Testing another 375 conditions allowed for the optimization of efficiency. To achieve this, we transformed our data from the first five rounds by dividing the glycolate yield (CETCH output/product) by the amount of enzymes used and performed three additional rounds. For the identification of bottlenecks, we used METIS modules as we used for the gene circuits optimization. The most unexpected outcome was the negative impact on pathway performance by elevated concentrations of 4-hydroxybutyryl-CoA synthetase (Hbs). As increased enzyme concentrations usually lead to increased flux, an opposite behavior hints towards some interference with the pathway, most likely due to side reactivity. Such information can be extremely valuable for in vivo implementation, where testing different conditions is much more time-, cost-, and labor-intensive.
Conclusively, this effort led to the establishment of a very powerful and universal pathway optimization workflow by combining METIS with a liquid handler. This opens doors for the future prototyping of new pathways by solely measuring the final product, without requiring intermediate detection to reveal bottlenecks, as applies in the manual optimization6. All this information will ultimately help to gain faster insight into new-to-nature pathways and pave the way for their transplantation into cells.
- Borkowski, O. et al. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat. Commun. 11, 1872 (2020).
- Marchisio, M. A. & Stelling, J. Computational design tools for synthetic biology. Current Opinion in Biotechnology vol. 20 479–485 (2009).
- Greco, F. V., Pandi, A., Erb, T. J., Grierson, C. S. & Gorochowski, T. E. Harnessing the central dogma for stringent multi-level control of gene expression. Nat. Commun. 12, 1738 (2021).
- Nattermann, M. et al. Engineering a Highly Efficient Carboligase for Synthetic One-Carbon Metabolism. ACS Catalysis vol. 11 5396–5404 (2021).
- Matsuura, T., Kazuta, Y., Aita, T., Adachi, J. & Yomo, T. Quantifying epistatic interactions among the components constituting the protein translation system. Mol. Syst. Biol. 5, 297 (2009).
- Schwander, T., Schada von Borzyskowski, L., Burgener, S., Cortina, N. S. & Erb, T. J. A synthetic pathway for the fixation of carbon dioxide in vitro. Science 354, 900–904 (2016).