Parsing overlap between neurodegenerative disorders

Evaluating a data-driven diagnostic schema for neurodegenerative disease that acknowledges phenotypic heterogeneity.

Like Comment
Read the paper

Neurodegenerative diseases are a leading cause of death among older Americans and a significant burden on the health care system1. Brain autopsy is the current gold standard for diagnosing different neurodegenerative diseases based on the presence of specific misfolded proteins (Table 1). However, multiple disease-defining proteins are often found on autopsy (referred to as “copathology”), which complicates the aim of treating a specific disease process in individual patients.

We were already familiar with pronounced phenotypic heterogeneity in psychiatry, where 50% of patients meet criteria for multiple Diagnostic Statistical Manual (DSM) diagnoses and disease mechanisms are unknown2,3. Unsupervised approaches reveal reproducible patterns of co-occurring symptoms4 and an overarching “general psychiatric unwellness” factor (p-factor)2 that may explain psychopathology more parsimoniously than the DSM. Thus, we wondered whether existing neuropathologic diagnoses truly capture demarcations in disease experienced by different patient groups, or whether alternative diagnostic boundaries would more succinctly describe the causes of neurodegeneration and identify groups susceptible to intervention (Figure 1).


Associated Misfolded Protein

Alzheimer's disease

Tau and amyloid-β5

Parkinson's disease


Frontotemporal dementia

TDP-43 or tau7

Amyotrophic Lateral Sclerosis


Progressive supranuclear palsy, corticobasal degeneration, Pick's disease


Table 1. Summary of pathogenic proteins used to define major neurodegenerative disease diagnoses.

Figure 1. (i) Neurodegenerative diseases are ultimately caused by acquired or innate dysfunction of cellular processes leading to protein aggregation, which is difficult to measure directly at autopsy. (ii) At autopsy, we quantify the regional levels of different protein aggregates, which are phenotypes resulting from these unmeasured pathophysiologic processes. (iii) We use a clustering algorithm to group patients by similarity in phenotypes, then visualize patients in a low-dimensional t-distributed stochastic neighbor embedding (t-SNE) space to aid interpretation. (iv) We compare the clusters and existing diagnoses in their ability to separate patients in a clinically useful manner. (v) Finally, one can attempt to infer likely mechanisms for the observed latent phenotypic structure based on the literature. CBD, corticobasal degeneration. FTLD-TDP, frontotemporal lobar dementia with TDP-43 aggregates. PART, primary age-related tauopathy. PSP, progressive supranuclear palsy.

As an initial step towards answering this question, we assessed 895 autopsy cases from patients with neurodegenerative disease, each with regional measurements of tau, α-synuclein, amyloid-β, and TDP-43 aggregation. We applied a data-driven clustering algorithm to identify patterns of copathology between these proteins, revealing 6 patient groups that reflected (1) primary tauopathies, (2) combined tau and amyloid-β pathology characteristic of Alzheimer’s disease, (3) TDP-43 proteinopathies, (4) synucleinopathies, (5) tau-α-synuclein copathology, and (6) minimal cerebral pathology (Figure 2). The composition of the first 4 patient groups or clusters provides a data-driven confirmation of Table 1, but the 5th cluster’s composition suggests that tau-α-synuclein copathology may be a unique entity. Cluster 5 had strong corticolimbic α-synuclein and amyloid-β pathology, strong limbic tauopathy, and moderate cortical tauopathy. Patients in Cluster 5 had longer lifespans than patients with pure synucleinopathy (Cluster 4), suggesting that this copathology pattern might preferentially occur in more indolent synucleinopathies or protect against neuron loss.

Figure 1. Data-driven clusters accounting for co-pathology in 895 autopsy cases. (a) Representative vector of pathology scores for each cluster (cluster centroid) demonstrate distinct profiles of pathology that map to underlying molecular drivers of disease, including tau, amyloid-β, TDP-43, and α-synuclein. (b) Composition of each cluster in terms of primary histopathologic diagnoses. Each cluster is comprised of disease entities that are putatively caused by the protein most highly represented in the cluster’s centroid. Counts placed above stacked bars indicate the number of patients in each cluster. ADNC = Alzheimer’s disease neuropathologic change identified through ABC staging, CBD = corticobasal degeneration, FTLD = frontotemporal lobar dementia with TDP-43 inclusions, LBD = Lewy body disease, PSP = progressive supranuclear palsy. See paper for definitions of Tau-Other and Other.

After characterizing the pathological findings associated with each data-driven patient group or cluster, we trained statistical models to use pathology measures to predict scores on clinical cognitive assessments8 obtained from these patients in vivo. We found that our data-driven groupings were better predictors of cognitive scores than the stage of Alzheimer’s or Parkinson’s disease, and were almost as informative as the entire set of pathology scores used traditionally in the clinic, suggesting that copathology is critically relevant for understanding patient experiences and outcomes.

Finally, we forecasted each patient’s data-driven group at autopsy from in vivo CSF protein levels, cognitive scores, and genotypes at the APOE and MAPT loci, in an attempt to automate the diagnosis of neurodegenerative disease. This analysis revealed that the tauopathy cluster (Cluster 1) and the TDP-43 cluster (Cluster 3) could be more easily distinguished from other clusters than their constituent diseases, suggesting that broader disease categories can be more reliably identified than highly specific syndromes using gold standard clinical biomarkers.

Our study provides two useful contributions to neurodegenerative disease research and clinical data science. First, our prediction of future pathology from cognitive phenotypes and CSF protein levels may facilitate more accurate diagnoses and the development of therapies that target specific proteins in patients living with neurodegenerative disease. Second, we demonstrated a general framework for using existing tools to gain new insights about heterogeneous diseases with unmeasurable or unknown pathophysiology (Figure 1). Future studies could use our approach with expanded, quantitative clinical, neuropathological, and genomic datasets, or in an entirely different field of medicine, such as cardiovascular disease, epilepsy, or cancer.


  1. Association, A. (2019). 2019 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia 15, 321–387
  2. Caspi, A. et al. (2014). The p Factor: One General Psychopathology Factor in the Structure of Psychiatric Disorders? Clinical psychological science : a journal of the Association for Psychological Science 2, 119–137
  3. Sanchez-Roige, S. & Palmer, A. A. (2020). Emerging phenotyping strategies will advance our understanding of psychiatric genetics. Nature Neuroscience 23, 475–480
  4. Xia, C. H. et al. (2018). Linked dimensions of psychopathology and connectivity in functional brain networks. Nature Communications 9, 3003
  5. Montine, T. J. et al. (2012). National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease: a practical approach. Acta neuropathologica 123, 1–11
  6. McKeith, I. G. et al. (2017). Diagnosis and management of dementia with Lewy bodies: Fourth consensus report of the DLB Consortium. Neurology 89, 88–100
  7. Irwin, D. J. et al. (2015). Frontotemporal lobar degeneration: defining phenotypic diversity through personalized medicine. Acta neuropathologica 129, 469–91
  8. Nasreddine, Z. S. et al. (2005). The Montreal Cognitive Assessment, MoCA: A Brief Screening Tool For Mild Cognitive Impairment. Journal of the American Geriatrics Society 53, 695–699

Eli J. Cornblath

MD/PhD Student, University of Pennsylvania

I am an MD/PhD student at the University of Pennsylvania interested in cognitive neuroscience, network models of brain activity, and data-driven diagnostic tools. I recently completed my PhD in Danielle Bassett's lab and am currently working on finishing up medical school.

No comments yet.