One of the recent issues that I’ve been reading on is the concept of causality, initiated by the motivation that (hopefully) I can have a view about it on my own. The great thing about reading on causality is that, the more you try to construct your own view of it, more you have to digest information coming from different fields, such as philosophy, statistics, cognitive sciences, physics, etc. Additionally, the concept of causality comes with many other concepts like time and universal determinism, which must be investigated deeply by their own. What I am interested in, is to have a mathematical point of view whether there might be a necessary connection between events or not, in terms of the information coming from nature. This question also involves the mathematical definition of the concept of time, whether it exists, and if it does, the direction of its flow. Although there has been, and still is, a huge discussion on causality among philosophers, I find it sufficient to mention the ideas of David Hume and Immanuel Kant, in order to go on with the mathematical perspective of causality. For those who are interested in the history of the philosophical discussions upon this topic, I would suggest them to take a look at [1] for a very brief introduction.
Let us begin with what Hume suggested on causal relations. Assume that you are standing on a cliff, throwing rocks over the side, and a few seconds later, you hear them crash. When you repeat this experiment ninety-nine times, you would expect to hear the sound of crash at the a hundredth time if you throw a rock over the cliff. This is because you would think that throwing a stone causes the sound of crash, due to the past ninety-nine realizations. You had experienced it long enough to assume that there is a causal connection between the event of throwing a rock over the cliff, and the event of hearing the sound of crash. Hume, on the other hand, claims that, the causal connection you’ve just derived between these two events are nothing more but a constant conjuction. They may seem conjoined, but this does not and cannot imply that these events are connected. This phrase may sound similar to people who are involved with statistics, in which an equivalent phrase is used : “Correlation does not imply causation.”
Hume claims that the If one event always follows another, we have an intention, a habit, to believe the first causes the second; but it is impossible to prove, empirically or logically, that the first event is the cause of the second. It is the force of habit that makes us derive necessary connections from constant conjunctions. More we perceive these constant conjunctions, we develop a stronger expectation about how the chain of events will continue.
"It appears, then, that this idea of a necessary connexion among events arises from a number of similar instances which occur of the constant conjunction of these events; nor can that idea ever be suggested by any one of these instances, surveyed in all possible lights and positions. But there is nothing in a number of instances, different from every single instance, which is supposed to be exactly similar; except only, that after a repetition of similar instances, the mind is carried by habit, upon the appearance of one event, to expect its usual attendant, and to believe that it will exist. This connexion, therefore, which we feel in the mind, this customary transition of the imagination from one object to its usual attendant, is the sentiment or impression from which we form the idea of power or necessary connexion. Nothing farther is in the case [2].” - David Hume.
There is also a long discussion on Immanuel Kant’s view on causality, and his answer to Hume, which can be found in [3]. Kant also agrees with Hume that a causal connection cannot be proved, but he claims that human mind applies the concept of causal connection as a precondition to understand and make sense of the experimental data. Causal connections are not habits driven empirically from constantly conjoined events. We have the concept of causal connection a priori to our experiences. Kant also went one step further, and he identified causality with the rule of natural law, which the rules that scientists discover. He claimed that the causal sequences of events are lawful sequences of events, and through these laws, causally connected experiences become possible.
Interpreting Hume and Kant, we can conclude that there isn’t any concrete methodology to claim that an event causes another, even though they are in a temporal order and conjoined. But the more interesting part (for me, at least) is to prove (or disprove) these ideas based on the physical phenomenas that we observe in nature. Up until now, we haven’t studied neither the mathematical nor the physical interpretation of causality. So I would like to re-examine the concept of causality in terms of thermodynamics, relativity and quantum mechanics; where the concept of arrow of time is also comes into picture.
In order to assume that an event causes the other, we automatically think that causes precede their effects, i.e, they have to be in a temporal order, where the cause comes first and the effect comes later. The argument, “causes precede their effects”, could have been a good starting point, if we were to derive a methodology on how to find causal relationships between events. If you still believe in Newtonian physics, you are allowed to jump to the second step, since according to Newton, there exists an absolute time which is independent of any observer, and progresses at a consistent pace through the universe. But today, thanks to Einstein, we have the special theory of relativity that challenges many of our intuitive beliefs about time. Unlike Newton’s absolute time, for Einstein, temporal order in which two events occur is dependent on the observer’s frame of reference. As a result, there is no universal order in which events can be temporally aligned. So if we cannot align events temporally in a universal manner, we cannot assume causes to precede their effects as a prior condition for a causal relation. This brings up the concept of retrocausality, which I will discuss in detail later.
As a result, we have two important properties that belong to a pair of events, which are not sufficient to claim a causal relationship nor by themselves, neither together. The first one is correlation, which is present in any causal relation, but is not sufficient on its own to claim that if two events are correlated, they are also causally related. The second one is the temporal order of events, which depends on our reference of frame, due to special relativity. If I observe event A, after I observe event B; I might confidently conclude that another subject may observe event B, after he observes event A. This gives me the freedom to align two events in both temporal directions. Which, in return, gives rise to problems in terms of thermodynamics, and time-asymetic events.
"The second law of thermodynamics states that in a natural thermodynamic process, there is an increase in the sum of the entropies of the participating systems."
In other words, if you drop black ink into a glass of water, the drop will diffuse, and the water will become grey. If you ignore special relativity, you would never see the events in a reversed order, due to the second law of thermodynamics. How the second law of thermodynamics is possible in a relativistic universe, i.e., the relativistic understanding of thermodynamic time asymmetry, is still an open question, both philosophically and physically [4].
If Newton, and his absolute view of time was right, second law of thermodynamics could provide us, at least, a thermodynamical arrow of time, which we could use as the basis for a methodology on finding causal relationships. You may be willing to sacrifice special relativity, and assume all the events are temporally aligned in a universal manner, and use the time-assymetry and irreversibility (which claims that the chain of events wouldn’t be realizable if we change the order of events) principles to deduce a causal relationship between two events mathematically. Unfortunately, your victory wouldn’t last long. Even though the special relativity is out of the picture, your results would seem to be valid for a small set of coarse-grained [5] systems, such as the systems we observe in our daily lives, containing large-scaled bodies. When you try to apply your model (which acknowledges the universal arrow of time) to small-scaled systems, phenomenas observed in quantum mechanics causes you trouble.
At this point, in order to avoid any confusion, it’s important to mention that if special relativity is out of the picture, the concept of causality still implies that causes precede their effects in a universal temporal order. On the contrary, phenomenas observed in quantum mechanics (QM) disproves the necessity for causes to precede their events, and leave room for the concept of retrocausality [6], i.e., allows an effect to occur before its cause, without the use of the theory of special relativity. A demonstrative example for a retrocausal toy model would be the polarizer experiment, which we will discuss in great detail.
Let us begin with the description of Bell’s Theorem, which states that “The predictions of QM cannot be reproduced by any locally causal mathematical description”. To prove this mathematically, assume that we have a system with a photon source, and two measurement apparatuses, as sketched in the Figure below.

Each measurement apparatus contains a polarizing beam splitter, with a preferred orientation angle denoted by a for the one on the left, and by b for the one on the right. The measurement result on the left is denoted by \(A=+1\) if the photon is detected to be polarized along \(a\), and \(A= -1\), if its polarization if found to be perpendicular to \(a\). The result of the measurement on the right is similarly denoted by \(B\).
For a given choice of the orientation angles \(a\) and \(b\), QM provides the probabilities \(p(A,B|a,b,\psi)\) for the four possible outcomes for \({A, B}\), which are \(\{+1,+1\}, \{+1,-1\}, \{-1,+1\},\) and \(\{-1,-1\}\), where \(\psi\) denotes the wave function [7]. As a result, we observe the expectation of individual outcomes as,
\begin{align}
E[A] = E[B] = 0.
\end{align}
Now, let us define a correlator in terms if QM descriptions, which is equal to the expectation of the multiplication of variables \(A\) and \(B\). Observations from QM experiments show that the variables are correlated such that
Now, let us define a correlator in terms if QM descriptions, which is equal to the expectation of the multiplication of variables \(A\) and \(B\). Observations from QM experiments show that the variables are correlated such that
\begin{align}P_{QM}(a,b) = E[AB] = \cos(2a-2b).\end{align}
To propose a local (meaning that \(A\) is independent of \(b\) and \(B\), and \(B\) is independent of \(a\) and \(A\)) and causal mathematical model, we must define the cause and effect in this experiment. Since the we assume that observation of the variable \(A\) (or \(B\)) occurs after the photons are emitted from the source, (recall that we assume the temporal order of events are independent of the observer in this experiments) it’s convenient to define causes as the inputs \(a\) and \(b\), and the effects as the outputs, which are \(A\) and \(B\), or rather their probability distributions, \(p(A)\) and \(p(B)\), which are associated with the time of measurement. What Bell did, was to introduce the variable \(\lambda\), which represents the set of all properties of each pair of photons just before the measurement is made on them. So as a result, assumption of causality would be violated if \(\lambda\) (or its probability distribution \(p(\lambda)\) were to depend on the free variables associated with the time of measurement. Thus, a general locally causal description specifies the distributions, \(p(\lambda)\), \(p(A|a,\lambda)\), and \(p(B|b,\lambda)\), which must be nonnegative and normalized. Now, let us check if these distributions are consistent with the measurements obtained from the QM experiment, i.e., consistent with (1) and (2).
Writing (2) in terms of the set of distributions we have just defined yields to
\begin{align}E[AB] = \int d \lambda p(\lambda) \sum_{AB}ABP(A|a,\lambda)P(B|b,\lambda).\end{align}
To go further, Bell assumes the validity of the EPR paradox [8], and deduces that \(A\) (and similarly \(B\)) cannot be stochastic, and must instead be completely determined by a and \(\lambda\). In other words, if \(a=b\) , then \(P(A=B)=1\) must hold. Consequently, we can use a deterministic function \(F\) such that
\begin{align*}A=F(a, \lambda),\end{align*}
\begin{align*}B=F(b,\lambda).\end{align*}
Notice that \(F\) is equal to either \(-1\) or \(+1\), which guarantees that \(F^2= 1\). Now we can re-write the correlator expression, which is
\begin{align}P_{Bell}(a,b) = \int d \lambda p(\lambda) F(a,\lambda)F(b,\lambda).\end{align}
Let us introduce a third orientation, c, and note the form
\begin{align}P_{Bell}(a,b) - P_{Bell}(a,c) = \int d \lambda p(\lambda) F(a,\lambda)[F(b,\lambda)-F(c,\lambda)].\end{align}
Recall that \(F^2=1\), which lets us to write
\begin{align}P_{Bell}(a,b) - P_{Bell}(a,c) = \int d \lambda p(\lambda) F(a,\lambda)[F(b,\lambda)-F^2(b,\lambda)F(c,\lambda)],\end{align}
\begin{align}P_{Bell}(a,b) - P_{Bell}(a,c) = \int d \lambda p(\lambda) F(a,\lambda)F(b,\lambda)[1-F(b,\lambda)F(c,\lambda)].\end{align}
Since \(F\) is equal to either \(-1\) or \(+1\),
\begin{align}|F(\cdot, \lambda)| =1,\end{align}
and
\begin{align}|F(a, \lambda)F(b, \lambda) | =1.\end{align}
As the final step, taking the absolute value of the integrant and substituting (9) to the result yields to
\begin{align}|P_{Bell}(a,b) - P_{Bell}(a,c)| \leq \int d \lambda p(\lambda)[1-F(b,\lambda)F(c,\lambda)], \end{align}
\begin{align}|P_{Bell}(a,b) - P_{Bell}(a,c)| \leq 1 - P_{Bell}(a,c).\end{align}
Equation (11) is also known as Bell’s inequality. If the predictions of this experiment could be reproduced by any local and causal mathematical description, we would expect (11) to be consistent with the QM correlator given in (2). But if you insert (1) and (2) in (11), you will see that the inequality is violated. As a result, this tells us that there is no local and causal mathematical description that satisfies the observational outcomes of this experiment. We must relax our model in order to reproduce the experimental predictions of QM.
Let us introduce the retrocausal toy-model, where we choose \(\lambda\) to be the angle of emission of the photons belonging to each pair, which accepts one of the values \(a\), \(a+\pi/2\), \(b\), and \(b+\pi/2\), with equal probabilities, such that
\begin{align}P(\lambda| a,b) = \frac{1}{4}[ \delta( \lambda - a)+\delta( \lambda - a -\frac{\pi}{2})+\delta( \lambda - b)+\delta( \lambda - b -\frac{\pi}{2})].\end{align}
By limiting the values of \(\lambda\) so that the predictions of the model would be consistent with the experiment, the model assumes that the photons are emitted by the source with polarizations which anticipate the directions of apparatuses to be encountered in the future, which is a explicit violation of causality.
Photons’ interaction with each apparatus follows the standard probability rules governed by the Malus’ Law [9], which leads to
\begin{align*}P(A=+1|a,\lambda) = \cos^2(a-\lambda)\end{align*}
\begin{align*}P(A=-1|a,\lambda) = \sin^2(a-\lambda)\end{align*}
which is similarly valid for the variable B.
What we expect, is this distribution to be consistent with the outcomes of QM, which are (1) and (2). Substituting each possible value of \(\lambda\) separately for calculating \(E[A]\) , \(E[B]\) , and \(E[AB]\) , verifies that the retrocausal model described above is consistent with the QM predictions.
In conclusion, mathematical models which reproduce the quantum correlations of pairs of photons can be either directly non-local, or retrocausal. Since non-locallity clashes with general relativity, we would go with the retrocausal representation. Let us conclude this section about the polarizer experiment with Bell’s own words,
“ The more closely one looks at the fundamental laws of physics, the less one sees of the laws of thermodynamics. The increase of entropy emerges only for large complicated systems, in an approximation depending on ‘largeness’ and ‘complexity.’ Could it be that causal structure emerges only in something like a ‘thermodynamic’ approximation, where the notions ‘measurement’ and ‘external field’ become legitimate approximations? Maybe that is part of the story, but I do not think it can be all. Local commutativity does not for me have a thermodynamic air about it. …” - Bell.
Lets put the pieces together. Special relativity tells us that the temporal order of events are subjective, therefore if we are trying to make a causal relation between two events, using the information about their order in time would be meaningless. Even though we ignore special relativity and go with the absolute view of time, phenomenas observed in quantum mechanics leave room for cases where effects occur before their causes.
If you think it the other way around, it would actually be a great achievement, if we could truly prove that there are cases, in which the effects occur before their causes. Because this would mean, we have proven that an event is the cause of another, even though they are aligned in the opposite direction of thermodynamical arrow of time. (To be honest, if we could only prove that an event is the cause of the other, I wouldn’t mind about their temporal orientation.) At this point, we must be careful, and interpret the meanings of cause and effect in the polarizer experiment in more detail. We took the inputs (the initial conditions, \(\lambda\)) as causes, and outputs (variables \(A\) and \(B\)) as effects. What makes us think that the choice of initial conditions constitutes a cause for the outcomes of \(A\) and \(B\) If the mathematically causal model was consistent with QM observations, would this prove that causal relations could be deduced from this experiment? Recall that the causal description specifies the distributions, \(p(\lambda)\), \(p(A|a,\lambda)\), and \(p(B|b,\lambda)\). If \(p(A|a,\lambda) \neq p(A)\) (or \(p(B|b,\lambda) \neq p(B)\) ), these distributions can only imply that the variable \(A\) (or \(B\)) is dependent on the variables \(\lambda\), and \(a\) (or \(b\)). However, statistical dependence is not sufficient to demonstrate the presence of such a causal relationship. Correlation does not imply causation.
Consequently, we have to find a ground property, which, by its own, can give us a sense of a necessarily causal relation between two events, regardless of their temporal order. Notice that this inference re-defines the conventional concept of causality, by excluding the condition that causes must precede their effects.
At this point, in order to go further, it is useful to make a distinction between necessary and sufficient causes.
In logic, if event A is a necessary cause of event B, then the presence of B necessarily implies the presence of event A. The presence of event A, however, does not imply that event B will occur. On the other hand, If event A is a sufficient cause of event B, then the presence of event A necessarily implies the presence of event B. However, another cause event C may alternatively cause event B. Thus the presence of event B does not imply the presence of event A [10].
Let event A be a necessary cause of event B, and as an observer, assume that I have no information about this causal relationship. If I observe event B, this means that I have to observe event A at some point in my own temporal arrow, to at least predict some association between these two events. So you see the problem here. If there is no restriction on the temporal order (or if we are considering the possibility of retrocausality), I might never observe event A, and never be able to make a causal connection in between. Furthermore, I might observe another event, say event C, which is also associated with event B, and I might mistakenly conclude that event C causes event B, even though there is no causal connection in between.
Up until this point, since we have discussed the mathematical aspect of causality in terms of thermodynamics, relativity, and quantum mechanics; temporal order of events caused us a big trouble. One may reasonably wonder, if we could make causal connections between events in a way that we can pragmatically infer the outcomes, and apply these inferences to our daily little lives, where we assume causes precede their events as a god given property of causality. As a result, the question boils down to the following: If correlation doesn’t imply causation alone, can correlation plus the temporal order of events give us an idea about causation?
Event though there is a huge research carried out by statisticians on this field, sadly, the answer is still no.
In order to understand why, let us consider the following famous scenario proposed by A. Fischer [11]. Assume that you have the event of smoking, and you have the event of lung cancer, which are strongly correlated. You also have the temporal order of these events, coming from your observed data (which are lung cancer patients), that the action of smoking precedes the disease. Since we assumed that causes precede their events, we may reasonably conclude that smoking causes lung cancer; which is a conclusion that Fischer disagrees.
What if I told you, that there is a hidden factor in between, i.e., a lurking variable, a so called smoking-cancer-gene, which causes both lung cancer and intention to smoke (i.e., nicotine craving)? If that was the case, although we would still observe a strong correlation between smoking and lung cancer, the decision to smoke or not would have no impact on whether you got the disease. This is also called the Simpson’s Paradox [12], which is the case that including a lurking variable causes you to re-think the direction of an association.
To help addressing problems like what I just described above, Judea Pearl [13], introduced a causal calculus [14] (also called do calculus). He argues that the problem with the lurking variable, is that we try to predict the outcomes of an interventional problem, by using the observational data. What it means is the following: If you collect 100 people from the street, half of them being smokers and half of them not, you cannot jump to the conclusion that smoking causes cancer, due to any lurking variable, like our smoking-cancer-gene. To break causal connections between any lurking variables and smoking, you have to intervene, that is to say, collect 100 non-smokers from the street, and force 50 of them to smoke. This is also called a randomized controlled experiment [15], which is, as you might guess, hard to realize in practice. So even though in our absolute time coarse-grained simple world, we are back to the problem of ambiguity in interpreting observational data in terms of causal relations.
Obviously, with mathematical tools available today, it seems hard to deduce a methodology for determining a causal relation. At this point, I think we should stop and think, why do we need to detect the causal relations anyway? As Hume suggested, learning from habit is actually necessary for us to maintain our lives. If I witness people getting lung cancer after they began smoking, having the idea that these two events are only correlated but nothing more, may also convince me to stop smoking and probably save my life. Of course, this may not be the case, and a smoking-cancer-gene may be a valid explanation for the cause of getting lung cancer, but still, I wouldn’t loose anything. So I think, we should give our cognitive intentions to deduce causal relations the credit they deserve. Even though they cannot be proven mathematically, physically, or logically; ability to interpret these correlated events is one of the fundamental functions that helped us survive the natural selection. So instead of trying to derive sharp and exact mathematical descriptions for causal relations, maybe we should relax the conditions a little bit, and try to find with how much reliability we can deduce that an event causes the other. Before jumping to distinct conclusions, we still have a lot to do with what association and correlation provides to us.
References
[2] David Hume, An Enquiry Concerning Human Understanding, Of the Idea of necessary Connexion, Part ||.