Based on a SEMnet discussion with Holger Steinmetz.
The single-door criterion helps to identify single path coefficients in structural equation models (SEMs). It is especially useful in cases where the SEM as a whole is not "identified", i.e., where it is impossible to find a unique set of coefficients for the whole model that best explains the observed covariance matrix. In this tutorial, we illustrate the single-door criterion with four examples.
Let G be any recursive causal graph in which α is the path coefficient associated with link X → Y, and let Gα denote the diagram that results when X → Y is deleted from G. The coefficient α is identifiable if there exists a set of variables Z such that
Let's start to illuminate this definition using our examples.
All graphs on this page are clickable; clicking on a variable shows the effect of conditioning on it, i.e., including it into the set Z.
We first consider a simple mediation model.
M 1 @-0.361,-0.183 X E @-1.884,-0.236 Y O @1.091,-0.125 M Y X M
This model is clearly identified, and has positive degrees of freedom. Also the single-door criterion tells us that the parameter on the path X → M is identified because after removing the path X → M, X becomes unconditionally separated (i.e., conditionally separated given the empty set Z={}) from M:
M 1 @-0.361,-0.183 X E @-1.884,-0.236 Y O @1.091,-0.125 M Y
The parameter M → Y is also identified for the same reason.
We now add a direct effect to the simple mediation model.
M 1 @-0.354,-0.559 X 1 @-1.884,-0.236 Y 1 @1.041,-0.270 M Y X M Y
This model is just-identified (df=0). The coefficient X → M is identified as when we remove the path X → M, we get
M O @-0.354,-0.559 X E @-1.884,-0.236 Y 1 @1.041,-0.270 M Y X Y
and again, X and M are unconditionally separated. Importantly, if we include Y into our set Z, this would violate the first condition of the single-door criterion, as we get a biasing path from X to M via the collider Y:
M O @-0.354,-0.559 X E @-1.884,-0.236 Y A @1.041,-0.270 M Y X Y
Hence, X → M is no longer identified if Y is added to the regression.
The coefficient M → Y is identified as M and Y would become separated by holding X constant:
M E @-0.354,-0.559 X A @-1.884,-0.236 Y O @1.041,-0.270 X M X Y
And finally, X → Y is also identified, as holding M constant separates X and Y.
Now it becomes trickier: We add an error covariance between M and Y to model a confounder C that influences M and Y.
digraph G { M [pos="-0.521,-0.265"] X [exposure,pos="-1.749,-0.238"] Y [outcome,pos="1.029,-0.228"] M <-> Y [pos="0.645,-0.279"] X -> Y X -> M -> Y }
This model is un-identified with -1 df. Does this mean that we can not identify any coefficients? No, the single-door criterion tells us that we can anyway just regress M on X to identify the coefficient X → M. Just to check, this is what the graph looks like after removing the X → M arrow:
dag G { M [outcome,pos="-0.521,-0.265"] X [exposure,pos="-1.749,-0.238"] Y [pos="1.029,-0.228"] M <-> Y [pos="0.645,-0.279"] X -> Y M -> Y }
Indeed, X and M are independent now, so the criterion is fulfilled. But what about the other coefficients? Let us first check X → Y. After removing this path, we get
and we see that X and Y are not separated. Our only chance to separate them would be by conditioning on M, which does not work either -- we would close one path via M, but open another one:
For the same reason, we cannot identify the error covariance M ↔ Y using the single-door criterion.
Conclusion: The single-door criterion can help us identify some, but not all, coefficients in un-identified structural equation models.
Finally, let's add an instrument W for M to identify all coefficients:
digraph G { M [pos="-0.521,-0.265"] W [pos="-1.75,-0.3"] X [pos="-1.75,-0.23"] Y [pos="1.029,-0.228"] M <-> Y [pos="0.645,-0.279"] W -> M X -> M -> Y X -> Y }
This model is again just-identied; when simulating it, all parameter estimates are correct. But what will the single-door criterion tell us about the identifiability of the parameters?
The coefficient W → M (effect of the instrument) is identified as by deleting the referring path, W becomes unconditionally independent from M:
digraph G { M [outcome,pos="-0.521,-0.265"] W [exposure,pos="-1.75,-0.3"] X [pos="-1.75,-0.23"] Y [pos="1.029,-0.228"] M <-> Y [pos="0.645,-0.279"] X -> M -> Y X -> Y }
In the same way, we can identify the coefficient X → M.
But what about X → Y? Here nothing has changed compared to model 3: The only candidate for Z would be M - but again, holding M constant opens a path between X and Y via the collider M:
digraph G { M [adjusted,pos="-0.521,-0.265"] W [pos="-1.75,-0.3"] X [exposure,pos="-1.75,-0.23"] Y [outcome,pos="1.029,-0.228"] W -> M M <-> Y [pos="0.645,-0.279"] X -> M -> Y X -> Y }
For similar reasons, we cannot apply the criterion to the coefficients M → Y and M ↔ Y.
When coefficients are not identifiable using the single-door criterion, they can still be identifiable by other means, such as instrumental variables or even by estimating the whole model.