HUST-19 for predicting COVID-19 clinical outcomes
We integrate heterogeneous computed tomography (CT) and clinical feature (CF) datasets of patients with or without COVID-19 disease, and develop an engineering framework of Hybrid-learning for UnbiaSed predicTion of COVID-19 patients (HUST-19) to predict morbidity and mortality outcomes.
On the night of February 22, 2020, my cell phone was ringing when I used the WeChat App on it to chat with my friends outside Wuhan. I repeatedly told them that I felt very well at home, and just ate up a large dish of German-style pork chops cooked by my wife. On the other end of the line, Dr. Lin Wang, the director of the Department of Clinical Laboratory at Union Hospital, told me that her lab had accumulated a number of chest computed tomography (CT) images and clinical features (CFs) from patients with or without coronavirus disease 2019 (COVID-19). She deeply worried about the severity of the pandemic, and asked me whether we could collaboratively do something to combat the disease. I politely responded that I could not access the computer in my office, and it would be impossible for me to do anything only using the laptop. I hung up the phone.
After 10 minutes, I decided to give a phone to Wanshan Ning, a gifted student in my lab. Wanshan got a B.S. degree of Electronic Information Engineering (EIE) from Fuzhou University, and then entered my lab to start his PhD study in Bioinformatics. Later I asked him why he decided to transfer from EIE to Bioinformatics. He told me that his girlfriend, Jingjing Yang, was a medical student, and he also wanted to do something involving biomedicine. After listening to my words, Wanshan told me that he wanted to do some projects to contain COVID-19, because his girlfriend was just recovered from COVID-19. Also, his girlfriend’s mentor and the director of the Department of Respiratory and Critical Care Medicine at Liyuan Hospital, Dr. Yulan Zeng, also obtained a pile of CT and CF data from patients. The combination of the data from two different hospitals would efficiently eliminate the potential bias from a single hospital.
Then I phoned back to Dr. Lin Wang and said that my student and I would be pleased to start a collaborative study on COVID-19. Then, we set up a WeChat group, which allowed us to discuss everything together at any time. Even after the publication of the study, the members of different groups still did not take a face-to-face meeting. Before starting the project, we met a big problem. Since Wanshan and I stayed at home, who could collect and prepare the clinical data for us? Dr. Zheng Wang, the husband of Lin and an experienced surgeon and clinical scientist of Union Hospital, suggested that his talented student, Shijun Lei, could participate in the study and prepare the data sets. Also, Dr. Yulan Zeng and her student Jingjing Yang agreed to prepare the data in Liyuan Hospital. After the approvement by the institutional ethical committees of the two hospitals, we immediately started the collection of the data, with the help of 5 attending physicians by manually checking and confirming the daily medical records of cases. According to the Guidance for COVID-19 (6th edition) released by the National Health Commission of China, the 5 attending physicians classified all confirmed COVID-19 cases into mild, regular, severe and critically ill forms, and took unconfirmed patients as suspected cases. Such a careful and detailed classification not only enabled us to predict patients with or without COVID-19, but also allowed us to accurately distinguish the different morbidity outcomes (mild vs. severe).
Then we met the second problem. Wanshan lived in Xiaogan, a small county nearby Wuhan. There was no internet at his home. So how to transfer the huge amount of CT imaging data to him? We did not find any other methods, and Wanshan had to use his mobile phone as a wireless hotspot to connect his computer to the internet. Then Shijun continuously transferred the data to Wanshan by QQ, a communication tool. It cost many days for us to transfer the data, and the internet was eventually installed at Wanshan’s home until the mid of March. After Wanshan got all the CT and CF data, we met the third problem: How to label and interpret the CT slices? By the invitation of Lin and Zheng, Dr. Heshui Shi and his student Yukun Cao joined the study, and in total there were four experienced radiologists participate in labeling and interpreting the CT slices. Finally, we prepared a labeled data set containing three types of CT slices, including (i) non-informative CT (NiCT) slices without segmentable lung parenchyma, (ii) positive CT (pCT) slices with unambiguously COVID-19-associated imaging features, (iii) negative CT (nCT) images without COVID-19 features. Using this data set, we implemented a 13-layer convolutional neural network (CNN) model for classification of individual CT slices, and then transformed the slice-based prediction into the CT-based prediction of clinical outcomes, using an additional 13-layer CNN framework.
After we implemented a 7-layer deep neural network (DNN) for the CF-based prediction of clinical outcomes, we met the fourth problem. At that time, several groups have deposited their manuscripts to preprint severs, and reported that the CT imaging data was informative for accurate classifying patients with or without COVID-19. Unexpectedly, we found that using the CF data could also generate a highly promising accuracy. Which one would be better? CT, or CF, that is the question. Our solution was using the both data types, which were integrated by a hybrid learning architecture that incorporated CNN and DNN models into a single framework. Interestingly, using both CT and CF data considerably increased the accuracy for predicting morbidity or mortality outcomes than singly used. We named this engineering framework as Hybrid-learning for UnbiaSed predicTion of COVID-19 patients (HUST-19), and its source code was made available under a CC BY-NC 4.0 license at http://ictcf.biocuckoo.cn/HUST-19.php.
Our fifth and the final problem was how to share our data set to the academic community? It was not a big problem because my lab developed a number of biological databases before. So we constructed an online database named integrative CT images and CFs for COVID-19 (iCTCF) to present all raw data involved in this study at http://ictcf.biocuckoo.cn/. We hope HUST-19 and iCTCF could be a help for improving diagnosis and treatment of COVID-19 patients.
Yu Xue and Wanshan Ning were also invovled in another COVID-19-related study, by collborating with Drs. Xi Zhou and Yang Qiu at Wuhan Institute of Virology, Dr. You Shang at Union Hospital of Tongji Medical College, and Dr. Ding-Yu Zhang at Wuhan Jinyintan Hospital. By performing a comprehensive proteomic profiling of plasma samples from patients with or without COVID-19, we identified protein biomarker combinations to accurately distinguish and predict different COVID-19 outcomes. The published paper was recommended as "Featured Article" and "Most Read" by Immunity.
1. Ning W, Lei S, Yang J, Cao Y, Jiang P, Yang Q, Zhang J, Wang X, Chen F, Geng Z, Xiong L, Zhou H, Guo Y, Zeng Y, Shi H, Wang L, Xue Y, Wang Z. (2020) Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning. Nature Biomedical Engineering, https://doi.org/10.1038/s41551-020-00633-5.
2. Shu T, Ning W, Wu D, Xu J, Han Q, Huang M, Zou X, Yang Q, Yuan Y, Bie Y, Pan S, Mu J, Han Y, Yang X, Zhou H, Li R, Ren Y, Chen X, Yao S, Qiu Y, Zhang DY, Xue Y, Shang Y, Zhou X. (2020) Plasma Proteomics Identify Biomarkers and Pathogenesis of COVID-19. Immunity, 53(5):1108-1122.e5.