smallscaled: 2015

11/04/2015

How long should you wait?

"The really unusual day would be one where nothing unusual happens."
-Persi Diaconis

This quote above is from the very beginning of the book The Improbability Principle, written by Prof. David J. Hand. It is inarguably a must-read book, if you are interested in how humans perceive probability theory. Basically, the book explains why incredibly unlikely events happen, and keep on happening, like people winning lotteries multiple times, or lightening repeatedly striking the same unfortunate man, etc.

8/22/2015

Stosszahlansatz

"Facts are stubborn things, but statistics are pliable"
-Mark Twain

All theories of microscopic physics are governed by laws that are time reversible. Evolution of a system can be traced back into the past by the same evolution equation that governs the prediction in the future. Then the macroscopic laws arise, and we see that there is indeed something like an arrow of time, indicating that the macroscopic phenomena, unlike the microscopic ones, are not invariant under reversal of time, due to the second law of thermodynamics. In other words, both Figure 1.a and 1.b seem reasonable, whereas Figure 2.b seems quite impossible. So the natural question arises, how come the laws that govern two atoms are invariant under time reversal, i.e., Figure 1 makes sense in both directions of time, but Figure 2 doesn't?

7/14/2015

Turing Patterns

There are times that you discover that someone has actually named the phenomenon you have been thinking about for a significant amount of time. This name opens the doors of an infinite set of new books, names and ideas associated with the original one, and maybe, most importantly, you feel that you are not the only one thinking about it.

I guess, in this context, the first word I got stuck with was "asymmetry", since I was interested in particle physics in high school more than I never was. Of course I am not talking about the literal meaning of these words, and I don't even remember how old I was when I first heard the word "asymmetry". What I mean is more of a discovery of what these words actually mean in nature. As expected, then came the word "randomness", which also bothered me for a long time. These words are quite powerful, they shape the path of your curiosity, and make you discover more and more words as you dig into them. In my case, I ended up with the word "evolution" when I began college. And when I started my MSc, then came the word "diffusion". Since then, these four words are frequently present in all the books I read, as well as all the things I write. And I am incredibly happy that they made me discover my last phrase, which is hopefully my future-research to be, "mathematical biology".

So here comes the inspirational talk of this post, which contains all the words I have mentioned above. The talk is given by Philip Maini, a mathematical biologist, whom I enjoy to listen to his talks a lot. I found the chance to listen to this talk in a summer school I have recently attended, and listening to him for the second time gave me the motivation to investigate Alan Turing's paper on morphogenesis and pattern formation more carefully, and simulate it in order to understand it.

It was quite impressive to find out that Alan Turing, who is known as the father of computer by most people, had also spent quite amount of time thinking on symmetry breaking and pattern formation in nature. If you look around carefully, it is incredibly easy to see millions of examples for asymmetry. Take the branches on a tree, for instance. Their point of branching out of the trunk breaks the circular symmetry of the trunk itself. Black and white patterns of a cow is also a quite good example for spatial asymmetry. Same phenomenon is also present in the early stages of embryonic development. It all begins with a single cell, but you end up having a nervous system, muscles, bones, etc. At some point, symmetry is broken, and your initial cells begin to differentiate from each other. You can see an increadibly cool video of the embryonic development of Fruit fly Drosophila, where at the beginning, all the cells at the surface are the same, and suddenly, they begin to migrate to the places they need to be, and they begin to form a nervous system (you can see the vertical stripes developing a pattern on the surface). This whole process is called morphogenesis, and what Turing was trying to do was to model these patterns that occur through this process, mathematically.

What may be the cause of such differentiation? Recall that all the cells were daughters of the initial cell, which means that their genetic material is identically the same. In order to differentiate, these cells need an external stimulus, some kind of a chemical for instance (growth hormones can be a good example), which will trigger their process of differentiation. Turing named these chemicals morphogens, and assumed that cells begin to behave in a different way when the concentration of this morphogen is high, and vice versa. The only way for such a chemical to reach these cells, is of course, via diffusion.

Indubitably, this system is full of different morphogens interacting with each other, which also must be considered in the system model. Introducing reactions between morphogens plus diffusion gives us a reaction-diffusion system. Simplest model containing two morphogens can be described with the two following equations \begin{align} \frac{\partial u}{\partial t} &= D_{u}\nabla^2 u+f(u,v),\\ \frac{\partial v}{\partial t} &= D_{v}\nabla^2 v+g(u,v), \end{align} where $u$ and $v$ denote the concentrations of these two morphogens, $D_{u}$ and $D_{v}$ denote the diffusion coefficents of these morphogenes, and $f(u,v)$ and $g(u,v)$ denote the reaction kinetics between these morphogens, respectively. As you can see, without the reaction kinetics, this would be a system considering only the good old diffusion equation.

So how does this system, governed by these two equations, becomes instable, and produce these patterns? In order to answer that, we need to think of what stability is in terms of a chemical system. We need to investigate the cases where we have only reaction, only diffusion, and both, in terms of stability.

To begin with, let us assume we have a reaction-only system, which means that $D_{u} = D_{v} = 0$, and the morphogens interact in such a way that if you mix these chemicals to be homogeneous enough, there would be a spatially uniform steady state for both $u$ and $v$, which is stable to perturbations [1]. This means that if I add random noise to this well-mixed system, the system will come back to its steady state. As a result, a reaction-only system can not produce any patterns, since it is stable, and comes back to a homogenous state in case of a perturbation. In other words, if the cow was grey, and I draw little black and white dots on it, it will become grey again, not due to the diffusion of these black and white dots (remember that for this case we have removed diffusion completely), but the interaction between them (you can assume such a scenario that they simply produce a grey dot, or destruct each other).

Now let us assume a diffusion-only system, where $f(u,v)=g(u,v)=0$. This is a bit more intuitive, since heuristics tells us that if we drop a black ink into the water, after a while, the water turns grey. You can refer to this post for more information on diffusion, and how it stabilizes the system and tends to make everything homogenous as time elapses. So in this case, if the cow was grey, and I draw little black and white dots on it, it will become grey again, not due to the interaction between these dots, but due to their diffusion on the surface of the cow.

So none of these cases by themselves, reaction- or diffusion-only, gives us the big dotted cow we want. What is more counterintuitve though, is the linear combination of these systems, a reaction-diffusion system, is actually capable of doing it. If you have a stable reaction system, and you add diffusion onto it, which is also stabilizing, you will have an instable system, i.e., a real cow. In other words, stability plus stability, leads to instability.

In Turing's own words,

"It is suggested that a system of chemical substances, called morphogens, reacting together and diffusing through a tissue, is adequate to account for the main phenomena of morphogenesis. Such a system, although it may originally be quite homogeneous, may later develop a pattern or structure due to an instability of the homogeneous equilibrium, which is triggered off by random disturbances."[2]

So how does these chemicals should interact such that they can create patterns? In order to have a pattern visually, absence and presence of these two chemicals must be competitive. Turing hypothesized the existence of two molecules, an activator and an inhibitor, and in order to create patterns, these molecules must interact in a way such that,

Activator has an ability to enhance its own production,
Activator promotes the production of the inhibitor,
Inhibitor impedes the production of the activator,
Activator propagates slower than the inhibitor, which corresponds to the difference between diffusion coefficients, $D_{u}$ and $D_{v}$, in the system model.

If all the conditions above are met, in case of a small perturbation in the activator concentration (the random noise introduced to the initial homogeneous concentration), this perturbation will amplify because the activator will enhance its own production. But at the same time, since activator also promotes the production of the inhibitor, and since the inhibitor propagates faster than the activator, it will run and conquer its own areas where the activator can no longer be present. As a result, a pattern will form where there are inhibitor and activator peaks, spatially separated, which is also called "short range activation, long range inhibition".

Numerical Simulation

In order to simulate this kind of a behaviour in 2D, we will discretize the system, and make use of finite element methods to find approximate solutions to the partial differential equations described in Equations (1) and (2). Since the system is discretized, we will approximate the Laplacian ( the $\nabla^2$ operator ) using the five-point stencil finite difference method. We will also impose zero-flux boundary conditions, where none of the chemicals can diffuse out of the domain. The initial distribution of both $u$ and $v$ will be homogeneous, with a random noise added in order to give a perturbation for patterns to occur. Explicit equations employed in the simulations are as follows \begin{align} \frac{\partial u}{\partial t} &= 2.8 \times 10^{-4}\nabla^2 u+ 0.6 u- u^{3}-v,\\ \frac{\partial v}{\partial t} &= 5 \times 10^{-2}\nabla^2 v+1.5u-2v. \end{align} You can see the simulation results of such a system (with slightly different parameters) in Figure 1, where initially the system is homogeneous, and then random noise comes into play. You can also simulate reaction- and diffusion-only systems too, and end up with a red monochrome heatmap at the end.

Figure 1: Pattern formation in a reaction-diffusion system.

So do the morphogen pairs really exist as Turing envisioned? They exist in chemistry, that's for sure. These simulated patterns can also be created in a petri dish. But do they exist in biology? It is still an open question.

Code

Data generation is done in C++ using XCode, and visualizations are done using MATLAB. I am giving both codes below, where you can generate the data, and use the MATLAB script to read it from the directory you have specified in order to create heatmaps.

C++ Script (XCode)

//
//  main.cpp
//  turing patterns
//
//  Created by Burcu Tepekule on 02/12/14.
//  Copyright (c) 2015 Burcu Tepekule. All rights reserved.
//

#include <iostream>
#include <random>
#include <fstream>
#include <cmath>
#include <tgmath.h>
#include <cstdlib>
#include <ctime>
#include <stack>
#include <random>
#include <math.h>
using namespace std;

int main(int argc, const char * argv[]) {

    // SAY WHERE TO SAVE (Specify your own directory)
    string directory;
    directory = "/Users/burcu/Dropbox/TuringSims/";
    stringstream myRoot;
    myRoot << directory;
    stringstream stream;
    string fileName;
    FILE *result;
    
    
    // INITIALIZE
    int i,k,j,t,r,m;
    srand((unsigned)time(NULL));

    
    int size      = 50; // Grid will be 50 x 50. Don't make it more than 100 for a reasonable runtime.
    double dx     = 2/(double)size;
    int totalTime = 10;
    double dt     = .9 *(dx*dx)/2;
    int timeSteps = (totalTime/dt);
    
    double **u_old;
    double **v_old;
    double **u_new;
    double **v_new;
    
    // Allocate memory
    u_old = new double*[size];
    v_old = new double*[size];
    u_new = new double*[size];
    v_new = new double*[size];
    
    for (i=0;i<size;i++){
        u_old[i] = new double[size];
        v_old[i] = new double[size];
        u_new[i] = new double[size];
        v_new[i] = new double[size];
    }
    
    // Assign values -  initial matrix is random
    for (i=0;i<size;i++){
        for (j=0;j<size;j++){
            u_old[i][j] = 3+((double)rand()/(RAND_MAX));
            v_old[i][j] = 3+((double)rand()/(RAND_MAX));
        }
    }
    
    // Save the initial matrices
    stream.str(string());
    stream << myRoot.str() << "u_initial.dat";
    fileName = stream.str();
    result = fopen(fileName.c_str(),"w+");
    for (i=0;i<size;i++){
        for (j=0;j<size;j++){
            fprintf(result, "%f\n", u_old[i][j]);
        }}
    fclose(result);
    
    stream.str(string());
    stream << myRoot.str() << "v_initial.dat";
    fileName = stream.str();
    result = fopen(fileName.c_str(),"w+");
    for (i=0;i<size;i++){
        for (j=0;j<size;j++){
            fprintf(result, "%f\n", v_old[i][j]);
        }}
    fclose(result);

    double Du = 2.8e-4;
    double Dv = 5e-2;

    for(t=0;t<timeSteps-1;t++){

        // calculate 2D laplacian
        double **deltaU = new double*[size-1];
        double **deltaV = new double*[size-1];
        
        for(m=0;m<size-1;m++){
            deltaU[m] = new double[size-1];
            deltaV[m] = new double[size-1];
        }
        
        for(k=1;k<size-1;k++){
            for(r=1;r<size-1;r++){
                deltaU[k-1][r-1] = (u_old[k-1][r]+u_old[k+1][r]+u_old[k][r-1]+u_old[k][r+1]-4*u_old[k][r])/(dx*dx);
                deltaV[k-1][r-1] = (v_old[k-1][r]+v_old[k+1][r]+v_old[k][r-1]+v_old[k][r+1]-4*v_old[k][r])/(dx*dx);
            }
        }
        
        for(k=1;k<size-1;k++){
            for(r=1;r<size-1;r++){

                // ONLY DIFFUSION SYSTEM
//                u_new[k][r]  = u_old[k][r] + dt*(Du*deltaU[k-1][r-1]);
//                v_new[k][r]  = v_old[k][r] + dt*(Dv*deltaV[k-1][r-1]);
                
                // ONLY REACTION SYSTEM
//                u_new[k][r]  = u_old[k][r] + dt*(0.6*u_old[k][r]- pow(u_old[k][r],3)-v_old[k][r]);
//                v_new[k][r]  = v_old[k][r] + dt*(1.5*u_old[k][r]-2*v_old[k][r]);
                
                // REACTION-DIFFUSION SYSTEM
                u_new[k][r]  = u_old[k][r] + dt*(Du*deltaU[k-1][r-1] + 0.6*u_old[k][r]- pow(u_old[k][r],3)-v_old[k][r]);
                v_new[k][r]  = v_old[k][r] + dt*(Dv*deltaV[k-1][r-1] + 1.5*u_old[k][r]-2*v_old[k][r]);


            }
        }
        
        
        for(k=0;k<size;k++){
            // ZERO FLUX BOUNDARY CONDITION
            u_new[0][k]       = u_old[1][k];
            u_new[k][0]       = u_old[k][1];
            v_new[0][k]       = v_old[1][k];
            v_new[k][0]       = v_old[k][1];
            u_new[k][size-1]  = u_old[k][size-2];
            u_new[size-1][k]  = u_old[size-2][k];
            v_new[k][size-1]  = v_old[k][size-2];
            v_new[size-1][k]  = v_old[size-2][k];
            
        }
        
        delete deltaU; delete deltaV;

        for(k=0;k<size;k++){
            for(j=0;j<size;j++){
                u_old[k][j]=u_new[k][j];
                v_old[k][j]=v_new[k][j];
            }
            
        }
      
    };
    
    // Save the last matrices
    stream.str(string());
    stream << myRoot.str() << "u.dat";
    fileName = stream.str();
    result = fopen(fileName.c_str(),"w+");
    for (i=0;i<size;i++){
        for (j=0;j<size;j++){
            fprintf(result, "%f\n", u_new[i][j]);
        }}
    fclose(result);
    
    stream.str(string());
    stream << myRoot.str() << "v.dat";
    fileName = stream.str();
    result = fopen(fileName.c_str(),"w+");
    for (i=0;i<size;i++){
        for (j=0;j<size;j++){
            fprintf(result, "%f\n", v_new[i][j]);
        }}
    fclose(result);
    return 0;
}

MATLAB Script

clear all;clc;close all;
%Specify the same directory with the C++ code.
directory = '/Users/burcu/Dropbox/TuringSims/';
u  = readFile('u',directory);
v  = readFile('v',directory);
u_initial  = readFile('u_initial',directory);
v_initial  = readFile('v_initial',directory);
size = sqrt(length(v));
u=reshape(u,size,size);
v=reshape(v,size,size);
u_initial=reshape(u_initial,size,size);
v_initial=reshape(v_initial,size,size);
HeatMap(v,'Colormap','hsv');
HeatMap(u,'Colormap','hsv');

function [ variable_out ] = readFile(varname,directory)
longname  = [directory varname '.dat']; 
fid  = fopen(longname,'rt');
variable  = textscan(fid, '%f', 'HeaderLines',0);
variable_out  = variable{1};
fclose(fid);
end

References

[1] P. K. Maini, T. E. Woolley, R. E. Baker, E. A. Gaffney and S. Seirin Lee (2012), "Turing’s model for biological pattern formation and the robustness problem." Roy. Soc. Interface Focus "Computability and the Turing Centenary" 2(4):487-496.

[2] Alan Turing, "The chemical basis of morphogenesis".

6/01/2015

Definition of well defined

This is not a post on mathematics, statistical physics, nor evolutionary biology. So there won't be any equations or theorems involved, but the reasons why we may be enjoying them. Similarly, how we may alienate from our own curiosity, become more distant to something we were dying to be a part of, right after we finally force ourselves to become a part of it.

I've been into science since I was a kid. I remember the feeling of reading something that I absolutely don't understand, yet it feels beautiful to try to jump into it. Diving into books on particle physics, string theory, evolutionary biology, or challenging mathematical problems were fun, even though there were times that none of them really made sense. But the books are here to show you what you can't understand, not the things that you've already understood. By any means, reading on science was fun. I was doing it just for the sake of doing it. Books, theorems, and equations were good places to hide when your high school teacher does his job with a negative degree of excitement. It was pure enthusiasm for me. It still is.

Now I understand why our excitements, independent of what they are towards to, should not be transformed into intangible accomplishments. I guess this is true for any kind of enthusiasm in life.

Each process has a unique tempo in nature. Forcing them to operate faster simply distorts them. Similarly, slowing things down may be as dangerous as forcing them to speed up. And all processes are dynamic with respect to their agents as well. They evolve with the things they interact, affect, and be affected by. Thus, they become impossible to predict with infinite precision. As Susskind says, we begin to talk in probabilities when we don't understand how a dynamical system interacts with its surroundings with absolute certainty.

On the contrary, what we do is to asses a fixed time for a process, say academic research, depending on how much we value its possible outcomes. If a small problem takes longer than its supposed to, we leave it. If there are no papers to publish on the horizon, again, we leave it. Value of dealing with a problem depends how long it survives to remain as a problem, not what it actually means to solve it. We have well defined time intervals, well defined academic accomplishments, well defined funding options, and as a result, all these nicely fixed numbers make us think we are dealing with well defined problems.

A well defined problem is a problem such that its solution solves the core problem itself, not the other problems such as GPAs, publications, or citations. And unfortunately, when your priorities transform into these other problems, it becomes unbelievably easy and seems extremely reasonable to generate scientific problems which are fundamentally not scientific, nor problems.

Enthusiasm, towards anything, is a gift. It gives you the nerve to read something again and again and practically torture yourself until you understand it. It is capable of forcing you to go out of bed at the middle of the night and run to pen and paper for a slight chance of being right for this time. It is capable of making you feel unbounded.

So if you have it, do not exchange it for anything quantifiable.

5/18/2015

Microscopic theory of Diffusion

I have recently encountered with the great lecture series given by Prof. V. Balakrishnan, who is a theoretical physicist and a distinguished lecturer. You can access to the first part of the lecture series on the diffusion equation here. What I would like to do in this post is to write down his derivations in a more systematic manner, like lecture notes taken while I was listening to him. I strongly recommend watching the lecture though.

Let us begin with what diffusion means. Diffusion is the movement of water molecules you observe on the paper when you accidentally spill your water on it. It is the process that each dye molecule is exposed to when you put a drop of dye in a glass of water. Thanks to diffusion, you can smell the food in the kitchen from your living room. In more general terms, diffusion is the movement of a diffusing substance put inside a solvent, which can be solid, liquid, or gas, due to molecular collisions.

Diffusion tends to make eveything homogenious. It is not that it wants to, but it obeys the second law of thermodynamics, as any other process in the physical world we know. From a microscopic point of view, each molecule moves randomly, pushing and pulling each other around, exhibiting Brownian motion. But as a whole, the diffusing substance in question moves down through the concentration gradient, i.e., from high concentration to low, until it becomes homogenious in the solvent¹. Mathematically speaking, we can express this behaviour with the following equation \begin{align} \vec{J}(\vec{r},t) = - D \nabla \rho(\vec{r},t), \label{eq:law2} \end{align} where $\vec{J}(\vec{r},t)$, $\rho(\vec{r},t)$, and $D$ denotes the flux (the amount of molecules crossing through the unit area per unit time), volume density (concentration), and diffusion constant, respectively. This equations tells you that the molecules flow from higher concentration to lower concentration, which is nothing but the Fick's second law of diffusion. Note that diffusion coefficient is there to even out the dimensions of the right and the left hand side of the equation.

Another, yet the simplest rule of this system, is the Fick's first law of diffusion : material never dissapears. This law leads to the following equation \begin{align} \frac{\partial \rho(\vec{r},t)}{\partial t} + \nabla \cdot \vec{J}(\vec{r},t) = 0, \label{eq:law1} \end{align} which is also known as the equation of continuity. It describes the transport of a conserved quantity, which is, in this case, the total number of diffusing molecules. You can also interpret (\ref{eq:law1}) as follows : in order concentration to increase at a given cross section, or surface, in the medium, either inward flow of molecules through that surface must increase, or magically, more molecules must be created within that surface. Since the latter is not possible, change in the concentration at each point with respect to time, plus the change in the amount of molecules crossing through unit area per unit time, i.e., the change in flux, with respect to the spatial position of this surface, is equal to zero.

If you put (\ref{eq:law2}) back in (\ref{eq:law1}), you will have \begin{align} \frac{\partial \rho(\vec{r},t)}{\partial t} = D \nabla^2 \rho(\vec{r},t), \label{eq:diffEq} \end{align} which is indeed the diffusion equation.

What we have discussed so far is actually the macroscopic picture. We have talked in terms of concentrations, which contains the information of the average behaviour of all the molecules, rather than the temporal or spatial information of each molecule. So what would be the microscopic interpretation? In other words, what would you ask for if you would like to gain information about only one molecule? You know that you can not look for its exact position at a given time, since you know that the movement of each molecule is random. But although their movement is random, luckly, we know the rules of this random process. So what we can look for is the probability of observing that molecule at a position $\vec{r}$ at a given time $t$. Let us denote this probability distribution with $p(\vec{r},t)$. This probability distribution obeys exactly the same diffusion equation. All you need to do is to replace $\rho(\vec{r},t)$ with $p(\vec{r},t)$.

In order to see how, let us begin with investigating the random walk of only one single molecule on a discrete line. Our molecule will coss a toin each time it moves, and will choose to move left or right by one unit, depending on the outcome of the coin toss. This will assure the equivalence of probability of moving left and probability of moving right.

Figure 1: Discrete line that our molecule exhibits a random walk.

To begin with, let us denote the probability of being on the point $j$ at a specific time $t$ with $P(j,t)$. In order to find the left hand side of (\ref{eq:diffEq}), we should ask for the change of $P(j,t)$ with respect to $t$. So, what are the possible scenarios that can affect the change of $P(j,t)$? First, our molecule may be at position $j$, and move left or right with equal probabilities. This would definitely result in the decrease of $P(j,t)$. Second, it may be moving to point $j$, either from point $j-1$ or $j+1$ with equal probabilities, and this would obviously increase $P(j,t)$. As a result, change in $P(j,t)$ with respect to time can be written as \begin{align} \frac{\partial P(j,t)}{\partial t} &= \lambda \Bigg[\frac{1}{2}P(j-1,t)+\frac{1}{2}P(j+1,t)-\frac{1}{2}P(j,t)-\frac{1}{2}P(j,t)\Bigg], \\[1em] \frac{\partial P(j,t)}{\partial t} &= \frac{\lambda}{2}\Bigg[P(j-1,t)+P(j+1,t)-2P(j,t)\Bigg], \label{eq:lat} \\[1em] \end{align} where $\lambda$ denotes the rate of jumps, i.e., number of jumps per unit time. Right hand side of this equation is also demonstrated in Figure 1.

Now, assume that our molecule doesn't have to move by one unit per unit time, but instead, it can choose a constant, denoted by $a$. This is also called the lattice constant, since it denotes the distance between equally spaced consecutive points on the line. Considering the lattice constant, we can re-write (\ref{eq:lat}), such that \begin{align} \frac{\partial P(aj,t)}{\partial t} &= \frac{\lambda}{2} \Bigg[\big\{P(ja+a,t)-P(ja,t)\big\}-\big\{P(ja,t)-P(ja-a,t)\big\}\Bigg]. \end{align} Furthermore, we can play with this equation by multiplying and dividing the right hand side by $a$, which results in \begin{align} \frac{\partial P(aj,t)}{\partial t} &= \frac{a\lambda}{2} \Bigg[\frac{P(ja+a,t)-P(ja,t)}{a}-\frac{P(ja,t)-P(ja-a,t)}{a}\Bigg]. \label{eq:lat2}\\ \end{align} Now, what are the steps to make this discrete-space random walk look like a one dimensional Brownian motion? We should manipulate it such that it becomes continuous both in space and time. First thing to do is to make this line a continuous one, by taking the lattice constant $a$, to $0$. Second thing to do is to make this movement also continuous in time. Since $\lambda$ denotes the rate of jumps, $1/\lambda$ will give you the mean time between jumps. So we can achive a continuous time movement by taking $1/\lambda$ to $0$, i.e., taking $\lambda$ to infinity.

Let us see what will happen when we take the limits. If you look at the right hand side of (\ref{eq:lat2}) more carefully, you can see that the taking the limit as $a$ goes to $0$ leads to the right as left derivative of $P(ja,t)$, such that \begin{align} \lim_{a \to 0} \frac{P(ja+a,t)-P(ja,t)}{a} & \text{ is the right derivative},\\[.1em] \lim_{a \to 0} \frac{P(ja,t)-P(ja-a,t)}{a} & \text{ is the left derivative}. \end{align} Since we have the difference of these two derivatives, we have one missing component to make this difference the second derivative of $P(ja,t)$, which is another $a$ in the denominator. So if we divide (\ref{eq:lat2}) by another $a$, we will have \begin{align} \frac{\partial P(aj,t)}{\partial t} &= \frac{a^2\lambda}{2} \Bigg[\frac{\frac{P(ja+a,t)-P(ja,t)}{a}-\frac{P(ja,t)-P(ja-a,t)}{a}}{a}\Bigg],\\ \end{align} and when we take the limits, we will end up with the following equation \begin{align} \lim_{\substack{a\to 0 \\ \lambda \to \infty}} \frac{\partial P(aj,t)}{\partial t} &= \lim_{\substack{a\to 0 \\ \lambda \to \infty}} \frac{a^2 \lambda}{2}\frac{\frac{P(ja+a,t)-P(ja,t)}{a}-\frac{P(ja,t)-P(ja-a,t)}{a}}{a}, \end{align} and since $ja$ becomes a continuous variable now, probabilities will become a probability density, such that \begin{align} \frac{\partial p(x,t)}{\partial t} &= D \frac{\partial^2 p(x,t)}{\partial x^2}, \label{eq:difc} \end{align} where $x=ja$, and \begin{align} D \equiv \lim_{\substack{a\to 0 \\ \lambda \to \infty}} \frac{a^2 \lambda}{2}, \end{align} which is there to even out the dimensions of right and left hand side of the equation. Note that we ended up with exactly the same diffusion equation, (\ref{eq:diffEq}), where concentration is used rather than the probability density.

Now we should solve this equation.

In order to make a general solution, let use increase the spatial dimensions, and assume a $d$ dimensional space where the position vector is denoted with $\mathbf{\vec{r}}$, \begin{equation} \mathbf{\vec{r}}=\begin{bmatrix} r_{1}, r_{2}, \dots, r_{d} \end{bmatrix} ^{T}, \end{equation} and (\ref{eq:difc}) becomes, \begin{align} \frac{\partial p(\, \mathbf{\vec{r}}, t \,)}{\partial t} = D \nabla ^2 p(\, \mathbf{\vec{r}}, t \,) \end{align} Since this is a partial differential equation, first order in time and second order in space, we will an initial condition, and two boundary conditions. For the boundary conditions, we will assume the space is infinite, such that the probability of being at $\pm \infty$ is $0$, and write \begin{align} lim_{\, \lVert {\mathbf{\vec{r}}} \rVert \to \infty} \, p(\, \mathbf{\vec{r}}, t \,) = 0. \end{align} Note that we could also define different boundary conditions, such as reflecting or absorbing surfaces at $\pm \infty$.

Regarding the initial condition, we will assume that we have a single molecule at the origin at $t=0$, such that \begin{align} p(\, \mathbf{\vec{r}}, 0 \,) = \delta^{(d)}(\,{\mathbf{\vec{r}}}\,). \end{align} Equation (\ref{eq:difc}) contains partial derivaties with respect to both time and space. Since the space we have defines is infitine, position of our molecule in each dimension varies from $-\infty$ to $+\infty$. On the other hand, time flows only in the positive direction, i.e., $t : 0 \to \infty$. So in order to solve this equation, we must take both the Fourier transform and Laplace transform, former with respect to space, and latter with respect to time.

Let us define the Laplace transform of $p(\, \mathbf{\vec{r}}, t \,)$ first, such that \begin{align} \mathcal{L}\big[p(\, \mathbf{\vec{r}}, t \,)\big] = \tilde{p}\,(\,\mathbf{\vec{r}},s\,). \end{align} So when we take the Laplace transform of the whole diffusion equation given as (\ref{eq:difc}), by taking the Laplace transform properties into consideration, we will have \begin{align} s\,\tilde{p}\,(\,\mathbf{\vec{r}},s\,) - p(\, \mathbf{\vec{r}},0 \,) &= D \nabla ^2 p(\, \mathbf{\vec{r}}, s \,),\\[.1em] (s-D \nabla ^2\,)\tilde{p}\,(\,\mathbf{\vec{r}},s\,) &= p(\, \mathbf{\vec{r}},0 \,) ,\\[.1em] (s-D \nabla ^2\,)\tilde{p}\,(\,\mathbf{\vec{r}},s\,) &= \delta^{(d)}(\,{\mathbf{\vec{r}}}\,) \label{eq:takef}, \end{align} Now, as the next step, we should take the Fourier Transform of (\ref{eq:takef}) for the spatial direction, which results in \begin{align} (s+D\mathbf{{k}^2}\,)\,\tilde{f}\,(\,\mathbf{\vec{k}},s\,) &= 1. \label{eq:ft1} \end{align} where $\tilde{f}\,(\,\mathbf{\vec{k}},s\,)$ denotes the Fourier Transform of $\tilde{p}\,(\,\mathbf{\vec{r}},s\,)$, defined as \begin{align} \mathscr{F}[{\tilde{p}\,(\,\mathbf{\vec{r}},s\,)}] = \tilde{f}\,(\,\mathbf{\vec{k}},s\,) = \int_{-\infty}^{\infty} \tilde{p}\,(\,\mathbf{\vec{r}},s\,) \,e^{-i\mathbf{\vec{k}}\cdot\mathbf{\vec{r}}}\mathop{}\!\mathrm{d}^{d}\mathbf{\vec{r}}. \end{align} You can refer to the appendix section at the end of this post for the proof of $\nabla ^2\,\tilde{p}\,(\,\mathbf{\vec{r}},s\,) = -\mathbf{{k}^2}\,\tilde{f}\,(\,\mathbf{\vec{k}},s\,)$. We can re-write (\ref{eq:ft1}), which leads to the equation \begin{align} \tilde{f}\,(\,\mathbf{\vec{k}},s\,) &= \frac{1}{(s+D\mathbf{{k}^2}\,)} \label{eq:fin01}. \end{align} So in order to find the solution for $p(\mathbf{\vec{r}},t)$, we need to inverse all the transforms we have taken so far.

Which one to inverse first? If you inverse the Fourier transform first, than you have to deal with a $d$ dimensional integral, since the space is $d$ dimensional. On the other hand, if you invert the Laplace transform first, the only dimension you have to deal with will be time. So we will go with inverting the Laplace transform first, such that \begin{align} f\,(\,\mathbf{\vec{k}},t\,) = \mathcal{L}^{-1}\Big[\,{\tilde{f}\,(\,\mathbf{\vec{k}},s\,)}\Big]=\mathcal{L}^{-1}\Bigg[{ \frac{1}{(s+D\mathbf{{k}^2}\,)}}\Bigg] &= e^{-D\mathbf{k^2}t}, \end{align} where $\mathbf{k}^2 = k_{1}^2 + k_{2}^2 + \dots + k_{d}^2$.

Since we can not avoid the Fourier transform forever, we need to invert it too. Taking the inverse Fourier transform leads to \begin{align} p(\, \mathbf{\vec{r}}, t \,) = \mathscr{F}^{-1}\big[e^{-D\mathbf{k^2}t}\big] = \frac{1}{(2\pi)^{d}} \int_{-\infty}^{\infty}e^{-D\mathbf{k^2}t}\, e^{i\mathbf{\vec{k}}\cdot\mathbf{\vec{r}}}\, \mathop{}\!\mathrm{d}^{d}\mathbf{\vec{k}}, \end{align} which is nothing but the multiplication of $d$ integrals when you choose Cartesian coordinates to integrate. This multiplication can be written as \begin{align} p(\, \mathbf{\vec{r}}, t \,) &= \frac{1}{(2\pi)^{d}} \Bigg(\int_{-\infty}^{\infty} e^{ik_{1}r_{1}-Dk_{1}^{2}t} \mathop{}\!\mathrm{d} k_{1}\Bigg) \dots \Bigg(\int_{-\infty}^{\infty} e^{ik_{d}r_{d}-Dk_{d}^{2}t} \mathop{}\!\mathrm{d} k_{d}\Bigg), \\[.1em] &= \prod_{n=1}^{d}\frac{1}{2\pi}\int_{-\infty}^{\infty} e^{ik_{n}r_{n}-Dk_{n}^{2}t} \mathop{}\!\mathrm{d} k_{n}. \label{eq:pro} \end{align} Notice that each multiplier in (\ref{eq:pro}) is indeed a Gaussian integral, if you complete the square of the power of each exponential function. After completing the square for each exponential, we will have \begin{align} p(\, \mathbf{\vec{r}}, t \,) &= \prod_{n=1}^{d}\frac{1}{2\pi}\int_{-\infty}^{\infty} e^{-Dt\big[{k_{n}^{2} -\frac{ik_{n}r_{n}}{Dt} + \frac{i^2r_{n}^2}{4D^2t^2} -\frac{i^2r_{n}^2}{4D^2t^2}}\big]} \mathop{}\!\mathrm{d} k_{n}, \\[.1em] &= \prod_{n=1}^{d}\frac{1}{2\pi}\int_{-\infty}^{\infty} e^{-Dt\big[{k_{n}-\frac{ir_{n}}{2Dt}}\big]^2 + \frac{i^2r_{n}^2}{4Dt}} \mathop{}\!\mathrm{d} k_{n}, \\[.1em] &= \prod_{n=1}^{d}\,\frac{1}{2\pi}e^{-\frac{r_{n}^2}{4Dt}}\int_{-\infty}^{\infty} e^{-Dt\big[{k_{n}-\frac{ir_{n}}{2Dt}}\big]^2} \mathop{}\!\mathrm{d} k_{n}, \\[.1em] &= \prod_{n=1}^{d}\,\frac{1}{2\pi}e^{-\frac{r_{n}^2}{4Dt}} \sqrt{\frac{\pi}{Dt}}, \\[.1em] &= \frac{1}{(4\pi Dt)^{d/2}}e^{\frac{-\mathbf{r}^2}{4Dt}} \label{eq:finend}. \end{align} which is the solution for the diffusion equation.

In order to gain some intution about this solution, take a look at the animation below. Looks like we are watching a drop of coffee diffusing through a piece of paper, right? This is indeed the animation of (\ref{eq:finend}) in a $2$ dimensional space, captured for $4$ seconds.

APPENDIX

Let us define the Fourier transform of $\nabla ^2 f(x,t)$, such as \begin{align} \mathscr{F}\big[\nabla ^2 f(x,t)\big] &= \int_{-\infty}^{\infty}e^{-i\omega x}\frac{\partial^{2}f(x,t)}{\partial x^2} \mathop{}\!\mathrm{d} x, \end{align} which can be solved by applying the method of integration by parts, which is the utilization of the following formula \begin{align} \int u \, \mathop{}\!\mathrm{d} v = u\,v - \int v\, \mathop{}\!\mathrm{d} u. \label{eq:ibp} \end{align} We will choose $u$ and $v$ such that \begin{align} \mathop{}\!\mathrm{d} v &= \frac{\partial^{2}f(x,t)}{\partial x^2} \mathop{}\!\mathrm{d} x, \\[.1em] \frac{\mathop{}\!\mathrm{d} v}{\mathop{}\!\mathrm{d} x} &= \frac{\partial^{2}f(x,t)}{\partial x^2}, \\[.1em] v &= \frac{\partial f(x,t)}{\partial x}, \end{align} and \begin{align} u &= e^{-i\omega x} \\[.1em] \mathop{}\!\mathrm{d} u &= -i\omega e^{-i\omega x} \mathop{}\!\mathrm{d} x. \end{align} Plugging them into the (\ref{eq:ibp}) leads to \begin{align} \int_{-\infty}^{\infty} u \, \mathop{}\!\mathrm{d} v &= u\, v \Big|_{-\infty}^{\infty} - \int_{-\infty}^{\infty} v \,\mathop{}\!\mathrm{d} u \\[.1em] \int_{-\infty}^{\infty} e^{-i\omega x} \frac{\partial^{2}f(x,t)}{\partial x^2} \mathop{}\!\mathrm{d} x &= e^{-i\omega x} \, \frac{\partial f(x,t)}{\partial x} \Big|_{-\infty}^{\infty} +i\omega\int_{-\infty}^{\infty} \frac{\partial f(x,t)}{\partial x} \, e^{-i\omega x} \mathop{}\!\mathrm{d} x\\[.1em] &=i\omega\int_{-\infty}^{\infty} \frac{\partial f(x,t)}{\partial x} \, e^{-i\omega x} \mathop{}\!\mathrm{d} x. \end{align} We need to repeat integration by parts one more time for the right hand side of the equation. We will choose the new $u^*$ and $v^*$, such that \begin{align} \mathop{}\!\mathrm{d} v^* &= \frac{\partial f(x,t)}{\partial x} \mathop{}\!\mathrm{d} x, \\[.1em] \frac{\mathop{}\!\mathrm{d} v^*}{\mathop{}\!\mathrm{d} x} &= \frac{\partial f(x,t)}{\partial x}, \\[.1em] v^* &= f(x,t), \end{align} and \begin{align} u^* &= e^{-i\omega x}, \\[.1em] \mathop{}\!\mathrm{d} u^* &= -i\omega e^{-i\omega x} \mathop{}\!\mathrm{d} x. \end{align} Again, plugging them into the (\ref{eq:ibp}) results in \begin{align} i\omega\int_{-\infty}^{\infty} \frac{\partial f(x,t)}{\partial x} \, e^{-i\omega x} \mathop{}\!\mathrm{d} x & = i\omega \bigg\{{e^{-i\omega x} \, f(x,t) \Big|_{-\infty}^{\infty} +i\omega\int_{-\infty}^{\infty} f(x,t)\, e^{-i\omega x} \mathop{}\!\mathrm{d} x\ }\bigg\}, \\[.1em] &= i^2\omega^2 \mathscr{F}\big[f(x,t)\big], \\[.1em] &= -\omega^2\mathscr{F}\big[f(x,t)\big]. \end{align} As a result, we will have \begin{align} \mathscr{F}\big[\nabla ^2 f(x,t)\big] = -\omega^2\mathscr{F}\big[f(x,t)\big]. \end{align} Footnotes

1. You can also refer to this post for more information about this random movement, i.e., Brownian motion.

3/08/2015

Errata for the second law : Part II

Let me quickly remind what I was trying to point out in the previous post on the second law.

"Second law doesn't say that it is impossible to see a case such that the entropy decreases. Instead, it says that most of the time (usually very close to always), systems tend toward their most probable state [1], which is not the state where the entropy decreases."

From now on, we know that second law is a statistical law. An ink drop can form in an homogeneous solution, or your cold coffee can spontaneously boil. Only just if we are willing to wait long enough, strange things can happen by chance.

Recall that we also derived how long we should wait. Probability of simultaneously seeing $N$ molecules in a sphere of radius $r$ inside a volume of $\pi R^2 h$ was equal to \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} = \Big(\frac{2r^2}{R^2h}\Big)^N, \label{eq:final} \end{align} which emphasizes that the how long we should wait strongly depends on the system size, i.e., the variables $R$, $r$, and $h$. Think about it for a second, it makes perfect sense. Imagine that your system is a really really small cube which can surround only one drop. Probability of seeing that drop in your tiny cube will be much higher than probability of seeing it in your coffee mug. For the coffe mug case you may have to wait, practically forever.

Another misconception about the second law rises from the fallacy of equating disorder with entropy. If entropy (practically) always increases, how is it possible for us to see so many patterns forming around, random water molecules turning into snowflakes, or basically how life, such ordered organisms, can even exist? I know I sound like a creationist. And yes, this is exactly what they say. If we summarize the idea, creationists claim,

"The second law of thermodynamics requires that all systems and individual parts of systems have a tendency to go from order to disorder. The second law will not permit order to spontaneously arise from disorder. To do so would violate the universal tendency of matter to decay or disintegrate [2]."

This is the kind of mistake we always do when we try to abstract a mathematical concept. When an example illustrates a mathematical concept, we tend to remember the example rather than the mathematics itself. Furthermore, we extract some features from that example which seems important to us, and yet may be irrelevant with the underlying mathematical model. Afterwards, when we see these features in another phenomenon, we quickly associate it with the example in mind, and at the same time, with the mathematical concept. Long story short, since it requires less cognitive load, we tend to replace mathematics with metaphors. The pay-off is to think that creationists may have a point at all.

So next time you see a creationist talking about extracting order from disorder, tell him to try to cool down his kitchen by leaving the refrigerator door open. If he succeeds, then it means he's right.

When you cool things down, i.e., decrease the thermal energy of a system, you also narrow down the energy distribution of the molecules. You reduce entropy. Your refrigerator also does the same thing to keep your ice-cream cold. It reduces entropy, locally. Keep in mind that the second law is valid for a closed system. Your refrigerator (or you) consume electricity (just like you consume food) to cool things down (to keep your body at 37 ^$\circ$C), but it turns most of this energy into heat, plus with some chemical waste (you simply sweat and poop). Overall, your refrigerator increases entropy by a large amount, even though it locally decreases it [1].

Does life reduce entropy? Yes, it reduces entropy locally, while it increases it globally. The trick here is to keep in mind that when you analyze the entropy of an organism, you must also consider its surroundings. Otherwise, God forbid, you may end up being a creationist.

References
[1] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.
[2] http://www.talkorigins.org/faqs/thermo/creationism.html

3/07/2015

Errata for entropy

Before diving into the second part of the previous post, I thought it might be useful to fix the misconception about entropy first.

If you ever took a course in information theory, you are familiar with the concept of entropy. When I was taking the course, the concept was introduced to me as the amount of unknown information. Of course, in order to make sense out of this sentence, one must define what information is. I remember the very first lecture, when our instructor was trying to tell us what is information and what is not. Let me quote him first.

"Imagine that you lose sleep, and got up at 3:00 am in the morning. You are surfing in the web, and you came across with the news about an earthquake in California, happened just an hour ago. Then you went back to bed, and woke up at 8:00 am. When you were drinking your coffee and reading the newspaper, you saw the news about the earthquake in California again. Now, which news - the 3:00 am or the 8:00 am - contains information?"

The answer is, as you already have guessed, is the 3:00 am news. Why? Because when you read the same news on the newspaper, it doesn't say anything to you that you don't know. It doesn't contain any information for you. If the guy sitting at the next table (who is also reading the same newspaper) had a sound sleep, then the 8:00am news definitely contains information for him.

At this point, I guess the remaining part of the lecture depends on which department you are taking the course from. As electronics engineering students, we quickly jumped to the calculation of entropy, since our inspiration was the 1948 paper of Claude Shannon (which is an astonishing paper by the way). Our aim was to determine the magnitude of uncertainty in a given binary string. Lower the uncertainty, lower the minimum amount of bits required encode it. Eventually, amount of information was equal to the amount of uncertainty.

For instance, consider the three codewords below \begin{align} C_{1} = [1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1], \nonumber \\ C_{2} = [1\,\,0\,\,1\,\,0\,\,1\,\,0\,\,1\,\,0\,\,1\,\,0], \nonumber \\ C_{3} = [0\,\,1\,\,0\,\,1\,\,1\,\,1\,\,0\,\,1\,\,0\,\,0]. \nonumber \end{align} Which one requires the least amount of bits to encode? Try to describe each of them to the person sitting in front of you. Eventually, it is a communication problem. Let's begin with the easiest one, $C_{1}$. You could immediately say "Ten consecutive 1's". That's done. Now try for $C_{2}$. "1 and 0, repeated five times". This is also out of the way, but the number of words you used to describe it was higher than the previous one. Now let's try $C_{3}$. "0 and 1, repeated two times, then two 1's and a 0, then a 1 and two more 0's". Not a short description, right?

So what is key difference between these codewords that caused you to describe them in different number of words? You used the leverage of repetition. You've counted the patterns, and you've just transmitted the information of a single pattern plus the information of how much it's been repeated in the codeword. And in the case of $C_{3}$, when you couldn't describe the codeword in terms of a repetitive pattern, you just described the whole codeword itself, which costs you additional number of words.

This is the point where most people get confused and associate entropy with disorder, which is totally a fallacy. When we don't see patterns, i.e., order, we feel like there is more uncertainty in what we're looking at. But here comes the trick. Think about how you described $C_{3}$. You knew it in the first place, right? Just because you can't describe it in terms of patterns doesn't mean that you haven't described it at all. Yes, probably it will take more bits to encode it after a proper compression, but still, it's the 8:00 am news for you.

So we need to go way back to what uncertainty means. What if I told you that I have a codeword containing 10 bits, half of it is 1 and the other half is 0? Now we have an unknown about the codeword. We don't know the exact locations of the 0's and 1's. We don't know whether the codeword is $C_{2}$ or $C_{3}$. They both contain five 1's and five 0's, but their locations are uncertain under the limited information provided to us. What we only know is the probability of seeing a 1 in a given bit, i.e., $p_{1}$, which is, in this case, is equal to 0.5.

Now apply this line of though to the codeword $C_{1}$. Would it be hard for you to guess if I told you that I have a codeword containing 10 bits, and all of them are 1? It's not hard to find, since there is only one possible combination when $p_{1}=1$. Now we are getting somewhere. In order to define a measure of uncertainty, you need to define a system (a codeword of length 10, with five 1's and five 0's), and have an unknown about it (locations of the 1's and 0's). Amount of uncertainty rises from how many different realizations you can generate without violating the properties of your system (How many codewords you can generate of length 10, containing five 1's and five 0's?), not the individual disorder related to each realization. It's kind of a measure of freedom of action under certain pre-defined conditions.

Then what determines the number of possible realizations? Obviously, the number of values that your unknown parameter can take. Mathematically speaking, as the probability distribution of your unknown parameter gets wider, the amount of uncertainty gets larger.

Now, let's link this idea to thermodynamics.

Consider the Figure 1 below, where both systems contain identical boxes with six identical marbles inside. The marbles on the left are located randomly and they are completely wedged, meaning that they have no place to move. On the contrary, marbles on the right are intentionally and nicely ordered side by side, but since there is free room left in the box, they have a little space to move. So which box seems to be more ordered? Which one seems to have more entropy?

Figure 1: Randomly stacked versus nicely stacked marbles.

You smelled the rat, right?

Stacking marbles randomly (as in the box on the left) reduces their freedom of motion, leading to a narrower energy distribution. On the contrary, due to the little room that the nicely stacked marbles have, they have freedom to move, and a wider energy distribution. Such a simple example where higher entropy means more order.

This example can also be analogous to the gas molecules diffusing in a closed volume. As the molecules diffuse, they hit the borders of the container or they hit each other, and they transfer energy. Even if we knew the exact location and temperature of each molecule at the beginning, it becomes impossible to track this information after all those collisions going on in the container. As molecules chaotically clash, their energy distribution becomes wider and wider. It's not only their random places that increases the entropy, it's their widening energy distribution¹.

Long story short, entropy can not be simply identified with disorder. Instead, entropy measures how much the energy is spread out. Sometimes an orderly-appearing system may have energy that is more dispersed than that of a disordered system. Such a system, although seemingly more ordered, would have higher entropy[1].

References
[1] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.

Footnotes
1) When I say energy, I mean the combination of gravitational(potential), kinetic and thermal energy.

3/06/2015

Errata for the second law : Part I

I am sure that many of you are familiar with the definition of the second law of thermodynamics. After all, it is recklessly taught in physics 201, without paying attention to the physical interpretations of the words constituting this well memorized sentence,

In a closed system, entropy always increases.

That's why it might be the most interdisciplinary, but yet the most misunderstood law of nature. So we need to fix it here.

Let's begin with the word always.

You all know the famous story of the ink drop. When left into a glass of water, the drop diffuses into the water, exhibiting Brownian motion. We know - because this is what we observed up until now - that the diffused ink molecules¹ will never gather back together and form a drop. Or will they, if we wait long enough?

Brownian motion simply means random motion. Molecules move by hitting and pushing each other in a chaotic environment. Remember the time that the concert is over and you are trying to get out with your friends, also with a million additional people, simultaneously. In this analogy, you and your friends form a "friend drop". Even though you don't take a step intentionally, as the crowd begins to go out, you begin to be pushed by them, and your motion gets out of your control. You begin to move randomly.

This random motion of molecules were precisely explained by Albert Einstein, in his 1905 paper. What you need to know about it for now is the following : steps that you are taking when you are trying to get out of the concert hall are independent of each other. Remember that your motion is not in control, you are pushed around by other people (in the ink case, other molecules), and you also push other people when you are being pushed around. So you don't know where you will set your next step on. But you can not take a step of 1 km long, right? So intuitively, your hunch tells you that there must be an underlying probability distribution of your step size. In the Brownian motion case, these step sizes are (as you may guess) normally distributed.

So each step of yours becomes a normal random variable, and the cumulative sum of these variables constitute your trajectory. But what determines the parameters of this normal distribution? Keep in mind that escaping from the concert hall is just an analogy. Molecules don't have an intention to reach to the exit door, neither they are trying to take a cab home. So in case that they are not being pushed around, they don't have a reason to move. This gives us the first parameter of our Normal distribution, which is the mean, being equal to zero.

Back to the concert analogy. What is one of the things that determines the length of your step size? The crowd! Your step size depends on how crowded the hall is, and from a molecule's perspective, how dense the medium is. Of course, density is not the only parameter for a molecule to go faster or slower. Temperature, viscosity of the fluid, and the size of the molecule itself (how fat and tall you are) are also important. All these properties combine in a single parameter, which called the diffusion coefficient. Physically, diffusion coefficient is the measure of how much molecules diffuse thorough a unit surface in a unit time at a concentration gradient of unity_[1]. Intuitively, it provides us a measure of how fast the molecules diffuse. Since your speed depends on the length of your step for a fixed distance, diffusion coefficient is significant in determining the variance of the step size, which is the second parameter of our Normal distribution.

Up until now, we only talked about a single step. But to proceed further, we need to determine the trajectory of our molecule. Recall that each step is independent of each other, and cumulative sum of the steps constitute the trajectory. Which means that, in a discrete case, if I am forced to take 10 steps in the crowd (I am not taking them intentionally), my trajectory will be the sum of 10 normally distributed, zero mean random variables, which is again, a normal random variable with zero mean. Since these variables are independent of each other, the variance of my trajectory will be equal to the variance of my each step multiplied by the number of my steps, which in this case is equal to 10. For the case of a continuous motion, we will adapt these discrete sums as integrals over a time interval.

Enough with the analogy, let's calculate the probability to see our beloved drop back again.

Assume that you have a cylinder glass with a radius of $R=4$ cm, and it is full of water up to $h=6$ cm height. You injected your ink drop into water with an injection syringe, precisely in the middle. You can see the illustration of this setup in Figure 1.

Figure 1: Illustration of the diffusion system.

Let ${\mathbf{X^{i}}} = [X_{1}^{i},X_{2}^{i},X_{3}^{i}]$ denote the position vector of the $i^{th}$ ink molecule, and let $X_{j}^{i}$ denote the position variable in the $j^{th}$ dimension at time $t$ (For the sake of simplicity, variable $t$ is not included in the notation). Since the glass is 3-D, we will have $j=\{1,2,3\}$, and $i=\{1,2,...,N\}$, where $N$ is the total number of ink molecules in our initial drop. Statistically, the average radius of a water drop is equal to 1.5 mm, and it contains approximately $10^{20}$ $H_{2}O$ molecules, so let us use these values for our ink drop too. Let's denote the radius of the drop with $r=1.5$mm, and $N=10^{20}$. Since the drop is so small, also let us assume that the initial positions of all molecules are zero, i.e., ${\mathbf{x^{i}}}=[0,0,0]$ for $i=\{1,2,...,N\}$ at $t=0$.

Since all the molecules will exhibit Brownian motion, their position can be modelled as a continuous time stochastic process, in which, the increments in a given dimension are normally distributed, and independent of each other. As time elapses, these independent increments add up to form molecules' path in the medium, and as these independent increments add up, the variance of the position adds up too (Recall the concert hall analogy). So the position variable at time $t$ becomes normally distributed such that \begin{align} X_{1,2,3}^{i} \sim \mathcal{N}(0,\sigma^2) \phantom{00} \text{i=\{1,2,...,N\}}, \label{eq:g} \end{align} where $\sigma^2 = 2Dt$, and $D$ denotes the diffusion coefficient.

Let us simplify the problem a little. We were trying to find the probability that all $10^{20}$ molecules form back the original drop again with a proper alignment, i.e., position of the molecules must not collapse, and their centers must be located such that they produce a sphere-like shape. Instead of this, let me re-define the problem. What is the probability that all ink molecules are in a sphere of radius $r$, centered at ${\mathbf{C}}=[C_{1},C_{2},C_{3}]$ at the same instant $t$? So for a single ink molecule, probability of being in the drop at a given time $t$ and for a given center $\mathbf{C}$ is equal to \begin{align} P\big\{\mathbf{X^{i}} \in \mathcal{D}\mid t, \mathbf{C}\big\} = P\big\{\lVert \mathbf{X^{i}}-\mathbf{C} \rVert \leq +r\mid t, \mathbf{C}\big\}. \label{eq:ineq} \end{align} where $\mathcal{D}$ denote the connected set of points that constitutes the drop. But we need to calculate the probability of all ink molecules being in the drop at the same time. As a result, since the ink molecules' trajectories are independent of each other, what we need to calculate becomes \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \mid t, \mathbf{C} \big\} = \prod_{i=1}^{N}P\big\{\mathbf{X^{i}} \in \mathcal{D} \mid t, \mathbf{C} \big\} = \prod_{i=1}^{N}P\big\{\lVert \mathbf{X^{i}}-\mathbf{C} \rVert \leq +r\mid t, \mathbf{C}\big\}. \label{eq:prod} \end{align} Furthermore, due to the Brownian motion, steps taken at each dimension is also independent of each other. So instead of calculating the Euclidian distance between the center of the drop and the ink molecules, we can check whether each dimension of the ink molecules' falls inside the range of the drop in the corresponding dimension. To do that, we need to be more precise about the center of the drop. Since the drop can not get close the boundaries more than its radius, center of the drop must be bounded such that \begin{align} -(R-r) &\leq c_{1,2} \leq +(R-r),\nonumber \\ -(h/2-r) &\leq c_{3} \leq +(h/2-r).\nonumber \\ \end{align} Since there is no restriction on where the drop will form, it is safe to assume that the center is uniformly distributed between these ranges. As a result, center variables for each dimension become \begin{align} C_{1,2} \sim \mathcal{U}(-(R-r),&+(R-r)), \nonumber \\ C_{3} \sim \mathcal{U}(-(h/2-r),&+(h/2-r)). \nonumber \end{align} Calculating whether all the molecules' positions fall in the drop's corresponding ranges gives us \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \mid t, \mathbf{C} \big\} &= \prod_{i=1}^{N}P\big\{-r \leq X_{1}^{i}-C_{1}\leq +r\mid t, C_{1}\big\} \times \nonumber \\ & P\big\{-r \leq X_{2}^{i}-C_{2}\leq +r\mid t, C_{2}\big\}P\big\{-r \leq X_{3}^{i}-C_{3}\leq +r\mid t, C_{3}\big\}. \label{eq:prod12} \end{align} We need to relax the problem a little bit more here. The calculation given in \eqref{eq:prod12} requires to calculate the probabilities conditioned on $\mathbf{C}$. This makes sense, since at time $t=0$, we will see the drop at $\mathbf{C}=[0,0,0]$ with probability $1$, because we put it there in the first place. In order to make these probabilities independent of $\mathbf{C}$, we need all the molecules diffuse into the medium, and the solution to become homogeneous. This means that the ink molecules can be everywhere with equal probability, which is the asymptotic behaviour of \eqref{eq:g}, i.e., as $t \to \infty $. As a result, for a homogeneous solution, we can write \begin{align} X_{1,2}^{i} \sim \mathcal{U}(-R,&+R), \nonumber \\ X_{3}^{i} \sim \mathcal{U}(-h/2,&+h/2), \nonumber \label{eq:g2} \end{align} for ${i=\{1,2,...,N\}}$. Only under this assumption, it is not going to matter where the center is, since the ink molecules become randomly distributed in water. Furthermore, by assuming the solution is homogeneous, all the probability distributions become independent of time too.

Let's assume that enough time had passed, the solution became homogeneous, and the probabilities in \eqref{eq:prod12} became independent of $\mathbf{C}$ and $t$. Now we can re-write \eqref{eq:prod12} such that \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} &= \prod_{i=1}^{N}P\big\{-r \leq X_{1}^{i}-C_{1}\leq +r\big\}\nonumber \\ &P\big\{-r \leq X_{2}^{i}-C_{2}\leq +r\big\}P\big\{-r \leq X_{3}^{i}-C_{3}\leq +r\big\}. \label{eq:prod2} \end{align} To calculate \eqref{eq:prod2}, we need to know the probability distribution of $(X^{i}_{j}-C_{j})$ for $j=\{1,2,3\}$. Since $C_{j}$ is symmetric around zero for all $j$, probability distribution of $(X^{i}_{j}+C_{j})$ will be identical to the probability distribution of $(X^{i}_{j}-C_{j})$. Let us define a new random variable, $Z_{j}^{i}$, such that \begin{align} Z_{j}^{i} = X^{i}_{j}+C_{j}, \end{align} which permits us to write the probability distribution function of $Z_{j}^{i}$ as \begin{align} f_{Z^{i}_{j}}(z^{i}_{j}) = f_{C_{j}}(c_{j}) \ast f_{X_{j}^{i}}(x_{j}^{i}). \end{align} This is the convolution of two uniform probability distributions. Writing the convolution explicitly yields to the piecewise probability distribution function $$ f_{Z^{i}_{1,2}}(z^{i}_{1,2}) = \left\{ \begin{array}{ll} \frac{z^{i}_{1,2}+2R-r}{4(R-r)R} & : z^{i}_{1,2} \in [-(2R-r), -r],\\ \frac{1}{2R} & : z^{i}_{1,2} \in [-r, r],\\ \frac{z^{i}_{1,2}-(2R-r)}{4(R-r)R} & : z^{i}_{1,2} \in [r, 2R-r]. \end{array} \right.$$ $$ f_{Z^{i}_{3}}(z^{i}_{3}) = \left\{ \begin{array}{ll} \frac{z^{i}_{3}+h-r}{2(h/2-r)h} & : z^{i}_{3} \in [-(h-r), -r],\\ \frac{1}{h} & : z^{i}_{3} \in [-r, r],\\ \frac{z^{i}_{3}-(h-r)}{2(h/2-r)h} & : z^{i}_{1,2} \in [r, h-r]. \end{array} \right.$$ which is plotted in Figure 2.

Figure 2:$ \,\,\,f_{Z^{i}_{1,2}}(z^{i}_{1,2})$ (on the left) and $ \,\,\,f_{Z^{i}_{1,2}}(z^{i}_{3})$ (on the right).

Now we can calculate \eqref{eq:prod2} by integrating $f_{Z^{i}_{j}}(z^{i}_{j})$ over $[-r, +r]$, which is \begin{align} P\Big\{{-r \leq Z_{1,2}^{i} \leq +r}\Big\} &=\int_{-r}^{+r}f_{Z^{i}_{j}}(z^{i}_{1,2})\,dz^{i}_{1,2}, \nonumber \\ &=\int_{-r}^{+r}\frac{1}{2R}\,dz^{i}_{1,2},\nonumber \\ &=\frac{r}{R}. \end{align} \begin{align} P\Big\{{-r \leq Z_{3}^{i} \leq +r}\Big\} &=\int_{-r}^{+r}f_{Z^{i}_{j}}(z^{i}_{3})\,dz^{i}_{3},\nonumber \\ &=\int_{-r}^{+r}\frac{1}{h}\,dz^{i}_{3},\nonumber \\ &=\frac{2r}{h}. \end{align} Plugging these results in \eqref{eq:prod2} gives us \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} = \Big(\frac{2r^2}{R^2h}\Big)^N. \label{eq:final} \end{align} Equation \eqref{eq:final} tells us that the volume of the solution, which appears to be at the denominator, affects $P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\}$ by the power of $N$. In other words, larger the glass, lower the probability of seeing our drop back. Additionally, recall that $r = 1.5$mm, which is a really small value compared to the dimensions of the solution.

Let's talk in numbers. Recall that we were trying to calculate the probability of all ink molecules being in the drop simulatenously. Before that, let's focus on calculating the probability of only one ink molecule being in the drop, i.e., $P\big\{\mathbf{X^{i}} \in \mathcal{D}\big\}$. The reason is the following : If $P\big\{\mathbf{X^{i}} \in \mathcal{D} \big\}$ is so small, then multiplying it by itself $N$ times will be practically zero due to the machine precision. Although we have chosen a realistic number of molecules for an ink drop $(N=10^{20})$, we must be careful about $N$, and increase it step by step.

Let's start calculating. You can see the probabilities for different $N$ values in Table 1. Notice that approximately after $N=60$, which is really small compared to the realistic value of $N=10^{20}$, machine precision becomes inadequate, and probabilities appear to be zero. Of course these numbers are some kind of an upperbound, since we relaxed our problem in some aspects.

$\newcommand\T{\Rule{0pt}{1em}{.3em}}$ \begin{array}{c|c} \hline N \,\,\, & P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} \T \\\hline 1 & 4.6875\times 10^{-5} \\\hline 2 \T & 2.1973\times 10^{-9} \\\hline 5 \T & 2.2631\times 10^{-22} \\\hline 10 \T & 5.1217\times 10^{-44} \\\hline 15 \T & 1.1591\times 10^{-65} \\\hline 20 \T & 2.6232\times 10^{-87} \\\hline 30 \T & 1.3435\times 10^{-130} \\\hline 50 \T & 3.5242\times 10^{-217} \\\hline 60 \T & 1.8050\times 10^{-260} \\\hline 80 \T & 0 \\\hline 100 \T & 0 \\\hline \end{array}

Table 1: Probabilities of forming a drop containing $N$ molecules.

So, what does all those numbers mean?

They mean that second law of thermodynamics is a statistical law. They mean that even though the probabilities are extremely low, there exists a probability to see the ink molecules form a drop back. Yes, practically it is equal to zero; but it's important to be aware of that this is just a practical result. This perspective also helps us to understand the transitions from microscale to macroscale. In microscale, if you observe only two molecules hitting each other and bouncing back to their original positions, you don't think they violate the second law. But if a homogeneous water-ink solution forms a drop from nowhere, things become strange. This is because you are not talking about the behaviour of only two molecules anymore, you are talking about trillions of trillions.

So if everybody is convinced, let's fix the first part of the sentence.

Second law doesn't say that it is impossible to see a case such that the entropy decreases. Instead, it says that most of the time (usually very close to always), systems tend toward their most probable state _[2], which is (as calculated above) is obviously not the state which the entropy decreases.

References
[1] http://www.thermopedia.com/content/696/
[2] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.

Footnotes
1) Ink is composed of many different components, but let's assume there is an identical "ink molecule".