Everything is connected but some things are more connected than others. The world is a large matrix of interactions in which most of the entries are close to zero, and in which, by ordering those entries according to their orders of magnitude, a distinct hierarchic structure can be discerned.Lack of mechanistic interpretability in deep learning methods has been bothering engineers ever since the extensive deployment of neural networks in the 2010s. I actually find it hilarious how agitated we are by not understanding how they work and how we can't let it go ๐ … Anyway, although this is purely about us not being able to interpret what they do (which is quite a human-centric problem), there is an unnatural feeling to how they work – and I don't use the word unnatural lightly here. For a long time I thought this was purely because of a key biological difference between our neurons and the so-called artificial neurons, following the debate about whether cognition is fundamentally computational1. Turns out the unnaturalness might lie somewhere else entirely: not what learns, but the dynamism of what is being learnt.
- Herbert Simon
For a brief moment think about what you need a machine learning model to look like for it to be mechanistically interpretable. What does it even mean? If we define lack of mechanism by pure statistical association based predictions, a mechanistic model needs to offer some sort of explanation as to why they produced that particular prediction. Consider the simple task of handwritten digit recognition - a mechanistically interpretable model would need to show you how it recognizes specific curves and lines, how it combines these features to identify digits, and why certain patterns lead to certain predictions. Instead of just saying "this is a 7 with 99% confidence," it would need to reveal its reasoning process: "I identified these specific diagonal and horizontal strokes that are characteristic of a 7 and distinguished them from similar patterns in 1s or 9s." In the end this is how we think we think. In this video, Grant Sanderson beautifully explains what hidden layers in a digit-recognition feed forward neural network mechanistically mean: nothing. No interpretability whatsoever. I find it fascinating that the algorithm can converge to a so-called 'happy little local minimum' that gets very good at performing its task without the need of any mechanistic understanding whatsoever2. Is this bad? Not really, given this is your only task. Would it be useful for any other task that is slightly different? Not really either. Would it recognize that your input is not a number if you drew a kangaroo? Nope, it will see it as a 6.
Going back to the unnatural feeling about how deep learning algorithms work: if you think about the learning process – the optimization of the weights of the network —, it is based on minimizing the cost function on only one task, no matter how big or small of a task it is. Evolution also pushes the organisms to optimize their fitness (or minimize the cost of survival), but with a crucial plot twist: in the real world, environment keeps changing. Natural selection pushes the organisms to optimize their fitness under continuously varying environments. This applies no matter the generation time. A mammal goes through winter and summer, a cell must respond to different pH levels and sugar resources, a virus must navigate different hosts with varying immune responses. So the unnatural difference may not just be between biological and synthetic computation, but in the dynamic character of the cost function.
How do evolving under a changing environment make your solution interpretable though? If you think about the examples above (like the seasons or the sugar resources), you will see that first, they are only one part of a very complex environment – meaning that when they change, the rest doesn't have to change. Second, they are mostly cyclic. Winters follow summers, cells switch between glucose and lactose, and viruses jump back and forth between humans and animals. This cyclical, partial change of environments means organisms must develop modular solutions. Why? Because you don't wanna redesign everything when only one thing changes in your environment – rather, you wanna be able to change one trait without ruining everything else. When a bacterium encounters lactose after growing on glucose, it doesn't completely reorganize its entire metabolism – it activates specific genetic modules (like the lac operon) designed to handle this particular change. In fact, if it were to reorganize its entire metabolism each and every time, evolution would be dramatically slower. An extremely high-dimensional cost function integrating every environmental aspect would need to be re-optimized, again and again, every time one environmental factor changed. So in a sense, modularity speeds evolution up, and it may be the necessary pathway to scaling up to higher complexity. This emergent modularity makes biological systems inherently more interpretable than deep neural networks that are stuck in their happy little uninterpretable local minimas.
Ok, let's code this :) In this paper, Nadav Kashtan and Uri Alon present a great toy model to simulate this exact phenomenon: evolution of modularity under modularly varying goals (Alon's youtube channel is a hidden gem, with a lecture exactly on modularity here). I will leave the nitty-gritty details for the curious, but the basic idea is to evolve an electronic circuit using genetic algorithms to perform logical functions that change over time in a modular way. Say you have the following two goals G1 and G2 (analogous to glucose vs lactose metabolism) that you periodically switch between, where X, Y, Z, and W denote the inputs. Your genome is only able to encode NAND gates that can be combined to form other logic gates like AND, OR, XOR, etc3. Since this scenario mimics natural selection, we are not only evolving under the selective pressure of finding the optimal solution, but also energy constraints, meaning fewer NAND gates the better. When you see the formulation of G1 and G2, it is obvious (I hope) that the most reasonable thing to do is to evolve two XOR gates that can stay constant across environments, and then switch the last bit of operation (AND or OR) given the environment. However, as an evolving circuit, what you see is not the formulation – in fact, you don't see anything, but you only experience the consequences of your computation depending on how far it is from the truth table that G1 and G2 produces below.
X | Y | Z | W | G1 | G2 |
---|
So your fitness will depend on how many of the outputs you correctly compute in the truth table above given the different inputs based on the current goal (or the environment in this analogy) minus a penalty for the size of your network. As an example, if your circuit always outputs 0, you will be correct 12 out of 16 times for the environment G1. If you can do this with 10 NAND gates instead of 12, your fitness will increase. Every iteration, we will select the best (fittest) L circuits to pass them to the next generation. We will duplicate them (anologus to reproduction4), mutate them with a certain probability pm (either add gates, remove gates, or randomly change connections), and calculate the fitness again. Every E generations, the goal will switch between G1 and G2, mimicking the concept of modularly varying environments. Note that the modular bit is quite important here. If G1 and G2 would not have a common operation (in this case XOR) that can be turned into a module, we wouldn't see modularity evolving (an example to a nonmodularly varying goals would be two random number generators with fixed seeds). This would correspond to entirely different environments in biological terms (say your new goal is to live on Mars rather than the Earth, though even there you would have a module to deal with some level of gravity ๐ง. Anyway, you got my point).
The pseudocode of our genetic algorithm becomes,
2
3
4
5
6
7
8
9
10
12
15
16
17
18
22
23
You can see the evolution of our circuit for pm = 0.7, E = 20, and L for top 25% below (max and mean fitness values are the maximum and the average fitness within the fittest population L, respectively). Just before 2000 iterations, our circuit converges to a modular architecture and builds two XOR gates (the identical structures consisting of 4 NAND gates on the top left and bottom left), switching the connections depending on the environment (G1 or G2) to form either an AND or an OR gate! Super cool, no?! Github repo is here in case you would like to play around with it yourself. Note that the modular solution is not the best solution for neither G1 nor G2. If you want to optimize only for either of those, you can get a circuit with one less NAND gate and thus increase your fitness since you operate under energy constraints. BUT, it takes more time to find the optimal solution! You can also see that in Fig 1a vs. Fig 1b in Kashtan et al. They claim that "One reason that fixed-goal evolution is often slow is that the population becomes stuck in local fitness maxima. Because the fitness landscape changes each time that the goal changes, modularly varying goals can help move the population from these local traps."
These results immediately open up more questions: Does the system manage to evolve a modular architecture even if the changes are too rapid? Does it need time in one environment to learn properly? Can we reverse-engineer the environmental history from the modules of biological organisms? If a module is used more frequently, does it evolve faster than others?
Regarding the epoch duration (or number of generations in one environment), I played around with E = 2 rather than E = 20. Seems like the algorithm is finding a hard time to converge to a modular structure, and stuck around a maximum fitness of 0.85. Running it for more generations themselves likely won't solve this issue, since the fundamental problem is the rapid environmental switching rather than total evolutionary time.
Regarding the question about whether a module would pop-up earlier in the total evolutionary time if it's used more frequently: to test this, I started a simulation with 6 inputs rather than 4, and generated the following two environments. My hope here is to see XOR gates evolving before any of the other gates, since they are used more frequently. Problem is, I only managed to have 385 generations in 17 hours so we might have to wait a little bit ๐. Nevertheless, I will update this post when I have the results because I think there are quite a few analogies in nature to expect XOR gates to evolve first - like innate immunity evolving way before adaptive immunity, and adaptive immunity building upon the modules of innate immunity, etc.
Going back to deep learning to complete the cycle: digging deeper in the literature, I figured that training under modularly varying goals has already been used in deep learning as a modularization technique to combat catastrophic forgetting (and it's actually not news) [1,2]. Modularity evolves as an emergent property (rather than being imposed on the network architecture) in networks that evolve under the evolutionary pressure of a connection cost (making connections expensive in terms of fitness) and the functional demands of learning different skills (modularly varying goals). Note that the connection cost is different than penalizing the network size - it directly selects for the modular architecture (and less biologically relevant in my opinon, but anyway). By doing so, the network can learn new skills without forgetting the old ones, and hopefully becomes a bit more interpretable than a network that trains to get the exact solution to the one exact question it is trying to solve, using every dirty trick it possibly can.
Why isn't training with modularly varying goals more commonly used? I guess integration and optimization challenges might be significant bottlenecks, and of course there will be a significant performance gap between modular systems and end-to-end trained models. So if you prioritize raw performance metrics (which is how money is made), then there is no point pushing for modularity (which is not inherently bad). But at the end of the day, natural selection is always cooler ๐.
References
[1] Masayuki, Murata, and Lu Chen. "Mitigate Catastrophic Forgetting by Varying Goals." Proceedings of the 12th International Conference on Agents and Artificial Intelligence. SCITEPRESS-Science and Technology Publications, 2020.[2] Ellefsen, K.O., Mouret, J.B. and Clune, J., 2015. Neural modularity helps organisms evolve to learn new skills without forgetting old skills. PLoS computational biology, 11(4), p.e1004128.
1 This debate actually has a long history going back to Descartes and Hobbes' mechanical philosophy, attracting significant attention with the famous Turing Test in the 1950s (previous post about it here). ↩
2 Of course feed forward NNs are quite old technology and there are more sophisticated architectures like transformers, CNNs, and various attention mechanisms that have shown better generalization capabilities, but in the end the fundamental challenge of mechanistic interpretability remains - we still can't fully explain how these networks arrive at their decisions, even when they perform exceptionally well. ↩
3NAND gates are universal, because any Boolean expression can be expressed by only using NAND gates – meaning that you can build any other logic gate by combining them. You can find how to use them to make other gates here.↩
4I use the word "analogous" here for a good reason. Although we cover asexual reproduction to some extend by implementing random mutations, sexual reproduction generates significantly more genetic variation than simply introducing mutations to identical copies of organisms. That said, this entire simulation is admittedly a caricature of evolution, so cut me some slack here ๐ซ . ↩