3/08/2015

Errata for the second law : Part II

Let me quickly remind what I was trying to point out in the previous post on the second law.
"Second law doesn't say that it is impossible to see a case such that the entropy decreases. Instead, it says that most of the time (usually very close to always), systems tend toward their most probable state [1], which is not the state where the entropy decreases."
From now on, we know that second law is a statistical law. An ink drop can form in an homogeneous solution, or your cold coffee can spontaneously boil. Only just if we are willing to wait long enough, strange things can happen by chance.

Recall that we also derived how long we should wait. Probability of simultaneously seeing \(N\) molecules in a sphere of radius \(r\) inside a volume of \(\pi R^2 h\) was equal to \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} = \Big(\frac{2r^2}{R^2h}\Big)^N, \label{eq:final} \end{align} which emphasizes that the how long we should wait strongly depends on the system size, i.e., the variables \(R\), \(r\), and \(h\). Think about it for a second, it makes perfect sense. Imagine that your system is a really really small cube which can surround only one drop. Probability of seeing that drop in your tiny cube will be much higher than probability of seeing it in your coffee mug. For the coffe mug case you may have to wait, practically forever.

Another misconception about the second law rises from the fallacy of equating disorder with entropy. If entropy (practically) always increases, how is it possible for us to see so many patterns forming around, random water molecules turning into snowflakes, or basically how life, such ordered organisms, can even exist? I know I sound like a creationist. And yes, this is exactly what they say. If we summarize the idea, creationists claim,

"The second law of thermodynamics requires that all systems and individual parts of systems have a tendency to go from order to disorder. The second law will not permit order to spontaneously arise from disorder. To do so would violate the universal tendency of matter to decay or disintegrate [2]."
This is the kind of mistake we always do when we try to abstract a mathematical concept. When an example illustrates a mathematical concept, we tend to remember the example rather than the mathematics itself. Furthermore, we extract some features from that example which seems important to us, and yet may be irrelevant with the underlying mathematical model. Afterwards, when we see these features in another phenomenon, we quickly associate it with the example in mind, and at the same time, with the mathematical concept. Long story short, since it requires less cognitive load, we tend to replace mathematics with metaphors. The pay-off is to think that creationists may have a point at all.

So next time you see a creationist talking about extracting order from disorder, tell him to try to cool down his kitchen by leaving the refrigerator door open. If he succeeds, then it means he's right.

When you cool things down, i.e., decrease the thermal energy of a system, you also narrow down the energy distribution of the molecules. You reduce entropy. Your refrigerator also does the same thing to keep your ice-cream cold. It reduces entropy, locally. Keep in mind that the second law is valid for a closed system. Your refrigerator (or you) consume electricity (just like you consume food) to cool things down (to keep your body at 37 \(\circ\)C), but it turns most of this energy into heat, plus with some chemical waste (you simply sweat and poop). Overall, your refrigerator increases entropy by a large amount, even though it locally decreases it [1].

Does life reduce entropy? Yes, it reduces entropy locally, while it increases it globally. The trick here is to keep in mind that when you analyze the entropy of an organism, you must also consider its surroundings. Otherwise, God forbid, you may end up being a creationist.



References
[1] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.
[2] http://www.talkorigins.org/faqs/thermo/creationism.html

3/07/2015

Errata for entropy

Before diving into the second part of the previous post, I thought it might be useful to fix the misconception about entropy first.

If you ever took a course in information theory, you are familiar with the concept of entropy. When I was taking the course, the concept was introduced to me as the amount of unknown information. Of course, in order to make sense out of this sentence, one must define what information is. I remember the very first lecture, when our instructor was trying to tell us what is information and what is not. Let me quote him first.
"Imagine that you lose sleep, and got up at 3:00 am in the morning. You are surfing in the web, and you came across with the news about an earthquake in California, happened just an hour ago. Then you went back to bed, and woke up at 8:00 am. When you were drinking your coffee and reading the newspaper, you saw the news about the earthquake in California again. Now, which news - the 3:00 am or the 8:00 am - contains information?"
The answer is, as you already have guessed, is the 3:00 am news. Why? Because when you read the same news on the newspaper, it doesn't say anything to you that you don't know. It doesn't contain any information for you. If the guy sitting at the next table (who is also reading the same newspaper) had a sound sleep, then the 8:00am news definitely contains information for him.

At this point, I guess the remaining part of the lecture depends on which department you are taking the course from. As electronics engineering students, we quickly jumped to the calculation of entropy, since our inspiration was the 1948 paper of Claude Shannon (which is an astonishing paper by the way). Our aim was to determine the magnitude of uncertainty in a given binary string. Lower the uncertainty, lower the minimum amount of bits required encode it. Eventually, amount of information was equal to the amount of uncertainty.

For instance, consider the three codewords below \begin{align} C_{1} = [1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1\,\,1], \nonumber \\ C_{2} = [1\,\,0\,\,1\,\,0\,\,1\,\,0\,\,1\,\,0\,\,1\,\,0], \nonumber \\ C_{3} = [0\,\,1\,\,0\,\,1\,\,1\,\,1\,\,0\,\,1\,\,0\,\,0]. \nonumber \end{align} Which one requires the least amount of bits to encode? Try to describe each of them to the person sitting in front of you. Eventually, it is a communication problem. Let's begin with the easiest one, \(C_{1}\). You could immediately say "Ten consecutive 1's". That's done. Now try for \(C_{2}\). "1 and 0, repeated five times". This is also out of the way, but the number of words you used to describe it was higher than the previous one. Now let's try \(C_{3}\). "0 and 1, repeated two times, then two 1's and a 0, then a 1 and two more 0's". Not a short description, right?

So what is key difference between these codewords that caused you to describe them in different number of words? You used the leverage of repetition. You've counted the patterns, and you've just transmitted the information of a single pattern plus the information of how much it's been repeated in the codeword. And in the case of \(C_{3}\), when you couldn't describe the codeword in terms of a repetitive pattern, you just described the whole codeword itself, which costs you additional number of words.

This is the point where most people get confused and associate entropy with disorder, which is totally a fallacy. When we don't see patterns, i.e., order, we feel like there is more uncertainty in what we're looking at. But here comes the trick. Think about how you described \(C_{3}\). You knew it in the first place, right? Just because you can't describe it in terms of patterns doesn't mean that you haven't described it at all. Yes, probably it will take more bits to encode it after a proper compression, but still, it's the 8:00 am news for you.

So we need to go way back to what uncertainty means. What if I told you that I have a codeword containing 10 bits, half of it is 1 and the other half is 0? Now we have an unknown about the codeword. We don't know the exact locations of the 0's and 1's. We don't know whether the codeword is \(C_{2}\) or \(C_{3}\). They both contain five 1's and five 0's, but their locations are uncertain under the limited information provided to us. What we only know is the probability of seeing a 1 in a given bit, i.e., \(p_{1}\), which is, in this case, is equal to 0.5.

Now apply this line of though to the codeword \(C_{1}\). Would it be hard for you to guess if I told you that I have a codeword containing 10 bits, and all of them are 1? It's not hard to find, since there is only one possible combination when \(p_{1}=1\). Now we are getting somewhere. In order to define a measure of uncertainty, you need to define a system (a codeword of length 10, with five 1's and five 0's), and have an unknown about it (locations of the 1's and 0's). Amount of uncertainty rises from how many different realizations you can generate without violating the properties of your system (How many codewords you can generate of length 10, containing five 1's and five 0's?), not the individual disorder related to each realization. It's kind of a measure of freedom of action under certain pre-defined conditions.

Then what determines the number of possible realizations? Obviously, the number of values that your unknown parameter can take. Mathematically speaking, as the probability distribution of your unknown parameter gets wider, the amount of uncertainty gets larger.

Now, let's link this idea to thermodynamics.

Consider the Figure 1 below, where both systems contain identical boxes with six identical marbles inside. The marbles on the left are located randomly and they are completely wedged, meaning that they have no place to move. On the contrary, marbles on the right are intentionally and nicely ordered side by side, but since there is free room left in the box, they have a little space to move. So which box seems to be more ordered? Which one seems to have more entropy?
Figure 1: Randomly stacked versus nicely stacked marbles.


You smelled the rat, right?

Stacking marbles randomly (as in the box on the left) reduces their freedom of motion, leading to a narrower energy distribution. On the contrary, due to the little room that the nicely stacked marbles have, they have freedom to move, and a wider energy distribution. Such a simple example where higher entropy means more order.

This example can also be analogous to the gas molecules diffusing in a closed volume. As the molecules diffuse, they hit the borders of the container or they hit each other, and they transfer energy. Even if we knew the exact location and temperature of each molecule at the beginning, it becomes impossible to track this information after all those collisions going on in the container. As molecules chaotically clash, their energy distribution becomes wider and wider. It's not only their random places that increases the entropy, it's their widening energy distribution1.

Long story short, entropy can not be simply identified with disorder. Instead, entropy measures how much the energy is spread out. Sometimes an orderly-appearing system may have energy that is more dispersed than that of a disordered system. Such a system, although seemingly more ordered, would have higher entropy[1].

References
[1] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.

Footnotes
1) When I say energy, I mean the combination of gravitational(potential), kinetic and thermal energy.

3/06/2015

Errata for the second law : Part I

I am sure that many of you are familiar with the definition of the second law of thermodynamics. After all, it is recklessly taught in physics 201, without paying attention to the physical interpretations of the words constituting this well memorized sentence,

In a closed system, entropy always increases.

That's why it might be the most interdisciplinary, but yet the most misunderstood law of nature. So we need to fix it here.

Let's begin with the word always.

You all know the famous story of the ink drop. When left into a glass of water, the drop diffuses into the water, exhibiting Brownian motion. We know - because this is what we observed up until now - that the diffused ink molecules1 will never gather back together and form a drop. Or will they, if we wait long enough?

Brownian motion simply means random motion. Molecules move by hitting and pushing each other in a chaotic environment. Remember the time that the concert is over and you are trying to get out with your friends, also with a million additional people, simultaneously. In this analogy, you and your friends form a "friend drop". Even though you don't take a step intentionally, as the crowd begins to go out, you begin to be pushed by them, and your motion gets out of your control. You begin to move randomly.

This random motion of molecules were precisely explained by Albert Einstein, in his 1905 paper. What you need to know about it for now is the following : steps that you are taking when you are trying to get out of the concert hall are independent of each other. Remember that your motion is not in control, you are pushed around by other people (in the ink case, other molecules), and you also push other people when you are being pushed around. So you don't know where you will set your next step on. But you can not take a step of 1 km long, right? So intuitively, your hunch tells you that there must be an underlying probability distribution of your step size. In the Brownian motion case, these step sizes are (as you may guess) normally distributed.

So each step of yours becomes a normal random variable, and the cumulative sum of these variables constitute your trajectory. But what determines the parameters of this normal distribution? Keep in mind that escaping from the concert hall is just an analogy. Molecules don't have an intention to reach to the exit door, neither they are trying to take a cab home. So in case that they are not being pushed around, they don't have a reason to move. This gives us the first parameter of our Normal distribution, which is the mean, being equal to zero.

Back to the concert analogy. What is one of the things that determines the length of your step size? The crowd! Your step size depends on how crowded the hall is, and from a molecule's perspective, how dense the medium is. Of course, density is not the only parameter for a molecule to go faster or slower. Temperature, viscosity of the fluid, and the size of the molecule itself (how fat and tall you are) are also important. All these properties combine in a single parameter, which called the diffusion coefficient. Physically, diffusion coefficient is the measure of how much molecules diffuse thorough a unit surface in a unit time at a concentration gradient of unity[1]. Intuitively, it provides us a measure of how fast the molecules diffuse. Since your speed depends on the length of your step for a fixed distance, diffusion coefficient is significant in determining the variance of the step size, which is the second parameter of our Normal distribution.

Up until now, we only talked about a single step. But to proceed further, we need to determine the trajectory of our molecule. Recall that each step is independent of each other, and cumulative sum of the steps constitute the trajectory. Which means that, in a discrete case, if I am forced to take 10 steps in the crowd (I am not taking them intentionally), my trajectory will be the sum of 10 normally distributed, zero mean random variables, which is again, a normal random variable with zero mean. Since these variables are independent of each other, the variance of my trajectory will be equal to the variance of my each step multiplied by the number of my steps, which in this case is equal to 10. For the case of a continuous motion, we will adapt these discrete sums as integrals over a time interval.

Enough with the analogy, let's calculate the probability to see our beloved drop back again.

Assume that you have a cylinder glass with a radius of \(R=4\) cm, and it is full of water up to \(h=6\) cm height. You injected your ink drop into water with an injection syringe, precisely in the middle. You can see the illustration of this setup in Figure 1.
Figure 1: Illustration of the diffusion system.


Let \({\mathbf{X^{i}}} = [X_{1}^{i},X_{2}^{i},X_{3}^{i}]\) denote the position vector of the \(i^{th}\) ink molecule, and let \(X_{j}^{i}\) denote the position variable in the \(j^{th}\) dimension at time \(t\) (For the sake of simplicity, variable \(t\) is not included in the notation). Since the glass is 3-D, we will have \(j=\{1,2,3\}\), and \(i=\{1,2,...,N\}\), where \(N\) is the total number of ink molecules in our initial drop. Statistically, the average radius of a water drop is equal to 1.5 mm, and it contains approximately \(10^{20}\) \(H_{2}O\) molecules, so let us use these values for our ink drop too. Let's denote the radius of the drop with \(r=1.5\)mm, and \(N=10^{20}\). Since the drop is so small, also let us assume that the initial positions of all molecules are zero, i.e., \({\mathbf{x^{i}}}=[0,0,0]\) for \(i=\{1,2,...,N\}\) at \(t=0\).

Since all the molecules will exhibit Brownian motion, their position can be modelled as a continuous time stochastic process, in which, the increments in a given dimension are normally distributed, and independent of each other. As time elapses, these independent increments add up to form molecules' path in the medium, and as these independent increments add up, the variance of the position adds up too (Recall the concert hall analogy). So the position variable at time \(t\) becomes normally distributed such that \begin{align} X_{1,2,3}^{i} \sim \mathcal{N}(0,\sigma^2) \phantom{00} \text{i=\{1,2,...,N\}}, \label{eq:g} \end{align} where \(\sigma^2 = 2Dt\), and \(D\) denotes the diffusion coefficient.

Let us simplify the problem a little. We were trying to find the probability that all \(10^{20}\) molecules form back the original drop again with a proper alignment, i.e., position of the molecules must not collapse, and their centers must be located such that they produce a sphere-like shape. Instead of this, let me re-define the problem. What is the probability that all ink molecules are in a sphere of radius \(r\), centered at \({\mathbf{C}}=[C_{1},C_{2},C_{3}]\) at the same instant \(t\)? So for a single ink molecule, probability of being in the drop at a given time \(t\) and for a given center \(\mathbf{C}\) is equal to \begin{align} P\big\{\mathbf{X^{i}} \in \mathcal{D}\mid t, \mathbf{C}\big\} = P\big\{\lVert \mathbf{X^{i}}-\mathbf{C} \rVert \leq +r\mid t, \mathbf{C}\big\}. \label{eq:ineq} \end{align} where \(\mathcal{D}\) denote the connected set of points that constitutes the drop. But we need to calculate the probability of all ink molecules being in the drop at the same time. As a result, since the ink molecules' trajectories are independent of each other, what we need to calculate becomes \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \mid t, \mathbf{C} \big\} = \prod_{i=1}^{N}P\big\{\mathbf{X^{i}} \in \mathcal{D} \mid t, \mathbf{C} \big\} = \prod_{i=1}^{N}P\big\{\lVert \mathbf{X^{i}}-\mathbf{C} \rVert \leq +r\mid t, \mathbf{C}\big\}. \label{eq:prod} \end{align} Furthermore, due to the Brownian motion, steps taken at each dimension is also independent of each other. So instead of calculating the Euclidian distance between the center of the drop and the ink molecules, we can check whether each dimension of the ink molecules' falls inside the range of the drop in the corresponding dimension. To do that, we need to be more precise about the center of the drop. Since the drop can not get close the boundaries more than its radius, center of the drop must be bounded such that \begin{align} -(R-r) &\leq c_{1,2} \leq +(R-r),\nonumber \\ -(h/2-r) &\leq c_{3} \leq +(h/2-r).\nonumber \\ \end{align} Since there is no restriction on where the drop will form, it is safe to assume that the center is uniformly distributed between these ranges. As a result, center variables for each dimension become \begin{align} C_{1,2} \sim \mathcal{U}(-(R-r),&+(R-r)), \nonumber \\ C_{3} \sim \mathcal{U}(-(h/2-r),&+(h/2-r)). \nonumber \end{align} Calculating whether all the molecules' positions fall in the drop's corresponding ranges gives us \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \mid t, \mathbf{C} \big\} &= \prod_{i=1}^{N}P\big\{-r \leq X_{1}^{i}-C_{1}\leq +r\mid t, C_{1}\big\} \times \nonumber \\ & P\big\{-r \leq X_{2}^{i}-C_{2}\leq +r\mid t, C_{2}\big\}P\big\{-r \leq X_{3}^{i}-C_{3}\leq +r\mid t, C_{3}\big\}. \label{eq:prod12} \end{align} We need to relax the problem a little bit more here. The calculation given in \eqref{eq:prod12} requires to calculate the probabilities conditioned on \(\mathbf{C}\). This makes sense, since at time \(t=0\), we will see the drop at \(\mathbf{C}=[0,0,0]\) with probability \(1\), because we put it there in the first place. In order to make these probabilities independent of \(\mathbf{C}\), we need all the molecules diffuse into the medium, and the solution to become homogeneous. This means that the ink molecules can be everywhere with equal probability, which is the asymptotic behaviour of \eqref{eq:g}, i.e., as \(t \to \infty \). As a result, for a homogeneous solution, we can write \begin{align} X_{1,2}^{i} \sim \mathcal{U}(-R,&+R), \nonumber \\ X_{3}^{i} \sim \mathcal{U}(-h/2,&+h/2), \nonumber \label{eq:g2} \end{align} for \({i=\{1,2,...,N\}}\). Only under this assumption, it is not going to matter where the center is, since the ink molecules become randomly distributed in water. Furthermore, by assuming the solution is homogeneous, all the probability distributions become independent of time too.

Let's assume that enough time had passed, the solution became homogeneous, and the probabilities in \eqref{eq:prod12} became independent of \(\mathbf{C}\) and \(t\). Now we can re-write \eqref{eq:prod12} such that \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} &= \prod_{i=1}^{N}P\big\{-r \leq X_{1}^{i}-C_{1}\leq +r\big\}\nonumber \\ &P\big\{-r \leq X_{2}^{i}-C_{2}\leq +r\big\}P\big\{-r \leq X_{3}^{i}-C_{3}\leq +r\big\}. \label{eq:prod2} \end{align} To calculate \eqref{eq:prod2}, we need to know the probability distribution of \((X^{i}_{j}-C_{j})\) for \(j=\{1,2,3\}\). Since \(C_{j}\) is symmetric around zero for all \(j\), probability distribution of \((X^{i}_{j}+C_{j})\) will be identical to the probability distribution of \((X^{i}_{j}-C_{j})\). Let us define a new random variable, \(Z_{j}^{i}\), such that \begin{align} Z_{j}^{i} = X^{i}_{j}+C_{j}, \end{align} which permits us to write the probability distribution function of \(Z_{j}^{i}\) as \begin{align} f_{Z^{i}_{j}}(z^{i}_{j}) = f_{C_{j}}(c_{j}) \ast f_{X_{j}^{i}}(x_{j}^{i}). \end{align} This is the convolution of two uniform probability distributions. Writing the convolution explicitly yields to the piecewise probability distribution function $$ f_{Z^{i}_{1,2}}(z^{i}_{1,2}) = \left\{ \begin{array}{ll} \frac{z^{i}_{1,2}+2R-r}{4(R-r)R} & : z^{i}_{1,2} \in [-(2R-r), -r],\\ \frac{1}{2R} & : z^{i}_{1,2} \in [-r, r],\\ \frac{z^{i}_{1,2}-(2R-r)}{4(R-r)R} & : z^{i}_{1,2} \in [r, 2R-r]. \end{array} \right.$$ $$ f_{Z^{i}_{3}}(z^{i}_{3}) = \left\{ \begin{array}{ll} \frac{z^{i}_{3}+h-r}{2(h/2-r)h} & : z^{i}_{3} \in [-(h-r), -r],\\ \frac{1}{h} & : z^{i}_{3} \in [-r, r],\\ \frac{z^{i}_{3}-(h-r)}{2(h/2-r)h} & : z^{i}_{1,2} \in [r, h-r]. \end{array} \right.$$ which is plotted in Figure 2.



Figure 2:\( \,\,\,f_{Z^{i}_{1,2}}(z^{i}_{1,2})\) (on the left) and \( \,\,\,f_{Z^{i}_{1,2}}(z^{i}_{3})\) (on the right).

Now we can calculate \eqref{eq:prod2} by integrating \(f_{Z^{i}_{j}}(z^{i}_{j})\) over \([-r, +r]\), which is \begin{align} P\Big\{{-r \leq Z_{1,2}^{i} \leq +r}\Big\} &=\int_{-r}^{+r}f_{Z^{i}_{j}}(z^{i}_{1,2})\,dz^{i}_{1,2}, \nonumber \\ &=\int_{-r}^{+r}\frac{1}{2R}\,dz^{i}_{1,2},\nonumber \\ &=\frac{r}{R}. \end{align} \begin{align} P\Big\{{-r \leq Z_{3}^{i} \leq +r}\Big\} &=\int_{-r}^{+r}f_{Z^{i}_{j}}(z^{i}_{3})\,dz^{i}_{3},\nonumber \\ &=\int_{-r}^{+r}\frac{1}{h}\,dz^{i}_{3},\nonumber \\ &=\frac{2r}{h}. \end{align} Plugging these results in \eqref{eq:prod2} gives us \begin{align} P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} = \Big(\frac{2r^2}{R^2h}\Big)^N. \label{eq:final} \end{align} Equation \eqref{eq:final} tells us that the volume of the solution, which appears to be at the denominator, affects \(P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\}\) by the power of \(N\). In other words, larger the glass, lower the probability of seeing our drop back. Additionally, recall that \(r = 1.5\)mm, which is a really small value compared to the dimensions of the solution.

Let's talk in numbers. Recall that we were trying to calculate the probability of all ink molecules being in the drop simulatenously. Before that, let's focus on calculating the probability of only one ink molecule being in the drop, i.e., \(P\big\{\mathbf{X^{i}} \in \mathcal{D}\big\}\). The reason is the following : If \(P\big\{\mathbf{X^{i}} \in \mathcal{D} \big\}\) is so small, then multiplying it by itself \(N\) times will be practically zero due to the machine precision. Although we have chosen a realistic number of molecules for an ink drop \((N=10^{20})\), we must be careful about \(N\), and increase it step by step.

Let's start calculating. You can see the probabilities for different \(N\) values in Table 1. Notice that approximately after \(N=60\), which is really small compared to the realistic value of \(N=10^{20}\), machine precision becomes inadequate, and probabilities appear to be zero. Of course these numbers are some kind of an upperbound, since we relaxed our problem in some aspects.

\(\newcommand\T{\Rule{0pt}{1em}{.3em}}\) \begin{array}{c|c} \hline N \,\,\, & P\big\{\mathbf{X^{1:N}} \in \mathcal{D} \big\} \T \\\hline 1 & 4.6875\times 10^{-5} \\\hline 2 \T & 2.1973\times 10^{-9} \\\hline 5 \T & 2.2631\times 10^{-22} \\\hline 10 \T & 5.1217\times 10^{-44} \\\hline 15 \T & 1.1591\times 10^{-65} \\\hline 20 \T & 2.6232\times 10^{-87} \\\hline 30 \T & 1.3435\times 10^{-130} \\\hline 50 \T & 3.5242\times 10^{-217} \\\hline 60 \T & 1.8050\times 10^{-260} \\\hline 80 \T & 0 \\\hline 100 \T & 0 \\\hline \end{array}
Table 1: Probabilities of forming a drop containing \(N\) molecules.


So, what does all those numbers mean?

They mean that second law of thermodynamics is a statistical law. They mean that even though the probabilities are extremely low, there exists a probability to see the ink molecules form a drop back. Yes, practically it is equal to zero; but it's important to be aware of that this is just a practical result. This perspective also helps us to understand the transitions from microscale to macroscale. In microscale, if you observe only two molecules hitting each other and bouncing back to their original positions, you don't think they violate the second law. But if a homogeneous water-ink solution forms a drop from nowhere, things become strange. This is because you are not talking about the behaviour of only two molecules anymore, you are talking about trillions of trillions.

So if everybody is convinced, let's fix the first part of the sentence.

Second law doesn't say that it is impossible to see a case such that the entropy decreases. Instead, it says that most of the time (usually very close to always), systems tend toward their most probable state [2], which is (as calculated above) is obviously not the state which the entropy decreases.

References
[1] http://www.thermopedia.com/content/696/
[2] Peter M. Hoffman, Life's Ratchet : How molecular machines extract order from chaos.

Footnotes
1) Ink is composed of many different components, but let's assume there is an identical "ink molecule".