d-Separation Without Tears

Adapted from the original in Judea Pearl's book "Causality" with his permission.

Introduction

d-separation is a criterion for deciding, from a given a causal graph, whether a set X of variables is independent of another set Y, given a third set Z. The idea is to associate "dependence" with "connectedness" (i.e., the existence of a connecting path) and "independence" with "unconnected-ness" or "separation". The only twist on this simple idea is to define what we mean by "connecting path", given that we are dealing with a system of directed arrows in which some vertices (those residing in Z) correspond to measured variables, whose values are known precisely. To account for the orientations of the arrows we use the terms "d-separated" and "d-connected" (d connotes "directional").

We start by considering separation between two singleton variables, x and y; the extension to sets of variables is straightforward (i.e., two sets are separated if and only if each element in one set is separated from every element in the other).

1. Unconditional separation

Rule 1: x and y are d-connected if there is an unblocked path between them.

By a "path" we mean any consecutive sequence of edges, disregarding their directionalities. By "unblocked path" we mean a path that can be traced without traversing a pair of arrows that collide "head-to-head". In other words, arrows that meet head-to-head do not constitute a connection for the purpose of passing information, such a meeting will be called a "collider".

Examples

All graphs on this page are clickable; clicking on a variable shows the effect of conditioning on it.

x and y are d-connected (shown as thick edges) because r is not a collider. This path represents an indirect causal effect.

x E @0,0
y O @6,6
r 1 @3,3

x r
r y

x and y are not d-connected (shown as thin edges) because r is a collider. This path does not transmit any correlation or other statistical relationship.

x E @0,0
y O @6,6
r 1 @3,3

x r
y r

x and y are d-connected because there is a path between them that does not contain a collider. This path transmits a correlation that does not reflect a causal effect (biasing path).

x E @0,0
r 1 @1,1
s 1 @2,2
t 1 @3,3
u 1 @4,4
v 1 @5,5
y O @6,6

s r
t s
u v t
v y
r x

x and y are not d-connected because t is a collider. Again, no statistical correlation can result from this path.

x E @0,0
r 1 @1,1
s 1 @2,2
t 1 @3,3
u 1 @4,4
v 1 @5,5
y O @6,6

r s
s t
u v t
v y
x r

This graph contains one collider, at t. The path x-r-s-t is unblocked, hence x and t are d-connected. So is also the path t-u-v-y, hence t and y are d-connected, as well as the pairs u and y, t and v, t and u, x and s etc.... However, x and y are not d-connected; there is no way of tracing a path from x to y without traversing the collider at t. Therefore, we conclude that x and y are d-separated, as well as x and v, s and u, r and u, etc. (The ramification is that the covariance terms corresponding to these pairs of variables will be zero, for every choice of model parameters).

2. Blocking by conditioning

Motivation: When we measure a set Z of variables, and take their values as given, the conditional distribution of the remaining variables changes character; some dependent variables become independent, and some independent variables become dependent. To represent this dynamics in the graph, we need the notion of "conditional d-connectedness" or, more concretely, "d-connectedness, conditioned on a set Z of measurements".

Rule 2: x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z. If no such path exists, we say that x and y are d-separated by Z, We also say then that every path between x and y is "blocked" by Z.

Example

x E @0,0
r A @1,1
s 1 @2,2
t 1 @3,3
u 1 @4,4
v A @5,5
y O @6,6

r s
s t
u v t
v y
x r

Let Z be the set {r, v} (marked in gray in the figure). Rule 2 tells us that x and y are d-separated by Z, and so are also x and s, u and y, s and u etc. The path x-r-s is blocked by Z, and so are also the paths u-v-y and s-t-u. The only pairs of unmeasured nodes that remain d-connected in this example, conditioned on Z, are s and t and u and t. Note that, although t is not in Z, the path s-t-u is nevertheless blocked by Z, since t is a collider, and is blocked by Rule 1.

Exercise: Is it necessary to condition for r and v in the above example to d-separate x and y? If not, which is the smallest set Z that d-separates x and y in the graph above?

3. Conditioning on colliders

Motivation: When we measure a common effect of two independent causes, the causes becomes dependent, because finding the truth of one makes the other less likely (or "explained away"), and refuting one implies the truth of the other. This phenomenon (known as Berkson paradox, or "explaining away") requires a slightly special treatment when we condition on colliders (representing common effects) or their descendants (representing effects of common effects).

Rule 3: If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.

Example

x E @0,0
r A @1,0
w 1 @1,1
s 1 @2,0
t 1 @3,0
p A @3,1
u 1 @4,0
v 1 @5,0
q 1 @5,1
y O @6,0

r s w
s t
u v t
v y
x r
t p
v q

Let Z be the set {r, p} (again, marked in gray). Rule 3 tells us that s and y are d-connected by Z, because the collider at t has a descendant (p) in Z, which unblocks the path s-t-u-v-y. However, x and u are still d-separated by Z, because although the linkage at t is unblocked, the one at r is blocked by Rule 2 (since r is in Z).

Exercise: Click on r in the above graph to remove the conditioning. This will render x and y d-connected. Then find another node that you can condition on to d-separate x and y again.

This completes the definition of d-separation, and the reader is invited to try it on some more intricate graphs.

Typical application: Suppose we consider the regression of y on p, r and x,

y = c₁ p + c₂ r + c₃x

and suppose we wish to predict which coefficient in this regression is zero. From the discussion above we can conclude immediately that c₃ is zero, because y and x are d-separated given p and r, hence the partial correlation between y and x, conditioned on p and r, must vanish. c₁ and c₂, on the other hand, will in general not be zero, as can be seen from the graph: Z={r, x} does not d-separate y from p, and Z={p, x} does not d-separate y from r.

Remark on correlated errors: Correlated exogenous variables (or error terms) need no special treatment. These are represented by bi-directed arcs (double-arrowed) and their arrowheads are treated as any other arrowhead for the purpose of path tracing. For example, if we add to the graph above a bi-directed arc between x and t, then y and x will no longer be d-separated (by Z={r, p}), because the path x-t-u-v-y is d-connected — the collider at t is unblocked by virtue of having a descendant, p, in Z.