Simultaneous estimation of bi-directional causal effects and heritable confounding from GWAS summary statistics

Like Comment
Read the paper

If Mendelian Randomisation (MR), the elegant genetic equivalent to randomised control trials (RCTs) seems too good to be true at first, that is probably because it is. Both RCTs and MR aim to identify and quantify the effects of risk factors that act on common complex diseases, which may inform public health policies. RCTs, the gold standard to estimate causal effects that risk factors (exposures) have on diseases (outcomes), eliminate the biases that come with observational study designs, such as confounding and reverse causation. It does so by randomising the splitting of participants into an exposure- and a control group. After a measured period of time, any discovered difference in the outcome can be attributed confidently to the exposure. Despite the clean design, RCTs are also expensive, time-consuming and not always ethical.

Figure 1 – Schematic representing randomised control trials

One alternative to such trials is MR, which similarly avoids confounding and reverse causation by using genetic variants as instrumental variables (IV) to infer the causal relationship between an exposure and an outcome. Since genetic variants are randomly distributed in a population, we can use i) them to obtain naturally divided groups, and ii) their associations to different exposures in order to mimic an intervention.

As a simple example, we can look at the aldehyde dehydrogenase 2 SNP: rs671, where allele A results in a protein defective for alcohol metabolism, leading to nausea or headache due to the body’s inability to eliminate acetaldehyde after the consumption of alcohol. Thus, people with this variant tend to drink less in comparison to others without this variant, resulting in natural randomised groups of people. At a certain time-point, we can assess cardiovascular health in a genotyped population where A-allele carriers can be contrasted to non-carries in order to estimate the causal effect of the exposure (alcohol consumption) on an outcome (e.g. blood pressure, cholesterol). In several studies[1,2] of this example, MR suggested that drinking alcohol was indeed causal for increased predisposition to cardiovascular problems. 

Figure 2 – Schematic mirroring a randomised control trial set up in the form of a Mendelian randomisation set up.

MR causal inference is contingent on several assumptions, the most important one being the strong association of the instrumental variable to the exposure trait of interest (relevance). The second being the absence of a pleiotropic pathway from the SNP to both the exposure and the outcome, and similarly the absence of an association between the SNP and a confounding factor(s) of the exposure-outcome relationship. If these conditions hold, then the only pathway of association between the SNP and the outcome is through the exposure.

Thanks to well-powered Genome Wide Association Studies (GWAS), several genetic instruments are associated with hundreds of traits of interest, facilitating the discovery of genetic instruments fulfilling the relevance assumption. Furthermore, several MR methods have been developed to calculate the causal effect estimate between pairs of traits, such as Inverse Variance Weighted (IVW), Mode- and Median-based MR, etc. While some of these methods aim to relax the basic assumptions of MR, most suffer from two major limitations; their under-exploitation of genome-wide markers by only choosing IVs that are strongly associated with the exposure, and their sensitivity to the presence of a heritable confounder acting on both exposure and outcome.

For these reasons, we developed a structural equation model (SEM) that mimics MR yet takes into account the possible violation of its assumptions by incorporating the presence of a latent heritable confounder (termed U) of the exposure-outcome relationship in the model. Our aim was to properly disentangle the confounder effect from the true causal effect that biases standard MR causal estimates when present.

Figure 3 – Schematic representation of the extended structural equation model. X and Y are two complex traits, and U is a latent (heritable) confounder with causal effects tx and ty on them. G represents a genetic instrument, with effects γx, γy and γu, respectively. Traits X and Y have causal effects on each other, which are denoted by a and b.

Our model, termed Latent Heritable Confounder MR (LHC-MR), estimates the confounder effect on the exposure and the outcome (termed X and Y respectively henceforth), the bidirectional causal effect simultaneously between X and Y, as well as the direct heritability of each of the traits. LHC-MR takes as input observed genome-wide association summary statistics for X and Y and optimises the likelihood function associated with the SEM in order to estimate the parameters mentioned previously. This approach can be viewed as the integration of a popular heritability estimation method, Linkage Disequilibrium score regression (LDsc), and classical MR. As it uses genome-wide markers, LHC-MR accounts for LD as well as sample overlap. Finally, LHC-MR uses block jackknife in order to calculate the standard error of its estimated parameters.

We ran exhaustive simulation scenarios where we violated standard MR assumptions as well as our own assumptions (e.g. presence of multiple confounders while modelling only one, normality assumption of SNP effects). In these simulations, we compared our results against various standard and robust MR methods. In the overwhelming majority of scenarios, LHC-MR produced causal effect estimates with less bias and variance than other MR methods. Notably, LHC-MR was not affected by winner’s curse and weak instrument bias (both a result of IV selection) unlike standard MR methods. Its causal effect estimates remained robust in the presence of a heritable confounder too. Moreover, LHC-MR remained immune to the presence of a reverse causal effect with opposite effect sign or the presence of more than one discordant or concordant confounders.

Satisfied with our simulation results, we then applied LHC-MR to summary statistics of 13 complex traits from large cohorts such as the UK biobank. We estimated the bidirectional causal effects between the pairs, compared the results to standard and more recent MR approaches, as well as investigated any potential confounders that were evidenced. Generally, LHC-MR had a good agreement in causal effect estimates with standard MR methods when both showed significant estimates. However, LHC-MR had more significant estimates between traits pairs, as expected considering its use of full-genome SNPs as opposed to genome-wide significant SNPs of standard MR methods. Furthermore, LHC-MR identified confounders between 16 trait pairs, many of which were in opposite direction to the estimated causal effect, a phenomenon that partially masked causal effects for standard MR methods which failed to pick up such relationships. An example of this is seen between HDL and SBP, where the causal effect of HDL on SBP estimated by LHC-MR is -0.13 (p-value = 5.38E-05) with a significant positive confounder acting on the two traits, and the standard MR methods show a non-significant (and attenuated) negative effect.

LHC-MR is the fruit of two and a half years of labour. Although the idea of distinguishing heritable confounder effects (that typically underlie genetic correlations) from bi-directional causal effects has been a natural next step in MR, its modelling has evolved several times over the course of this project. This is in large thanks to reviewers’ comments, the continuous discussions with the scientific community (not only in conferences, but Twitter has proven to be a great platform). Our conception of a realistic genetic architecture, the diverse nature of pleiotropy, and our ever-evolving understanding of all these components combined has shaped our modelling assumptions tremendously.

In its current form, LHC-MR aids in reducing the bias that can affect causal effect estimates between trait pairs due to the presence of heritable confounders or when the identification of robust genetic instruments is problematic. Still, as all methods, LHC-MR has its limitations when it comes to identifying several confounders, especially if they are of opposite yet equal effects. We are working on developing the approach to be applied to more than two traits (and more than one confounder) at a time, in the hopes of identifying and measuring causal effects between a (partially observed) network of traits with varying degrees of associations amongst each other.

 You can find an R-package of LHC-MR here:


1 – Taylor, A., Lu, F., Carslake, D. et al. Exploring causal associations of alcohol with cardiovascular and metabolic risk factors in a Chinese population using Mendelian randomization analysis. Sci Rep 5, 14005 (2015).

2 – Cho, Y., Shin, SY., Won, S. et al. Alcohol intake and cardiovascular risk factors: A Mendelian randomisation study. Sci Rep 5, 18422 (2015).

Liza Darrous

PhD Candidate, University of Lausanne