This study by Rosenzweig, Barnes, and McNaughton highlights the importance of forgetting in order to make the best use of the brain cells we have,
http://frank.itlab.us/forgetting/making_room.html
If we fail to forget, our neural networks will saturate and become useless. Saturation in a neural network does not merely mean that a network cannot learn more, it can mean that a network could fail to respond to input in an appropriate manner.
Consider a very simple network consisting of two input neurons I1 and I2, and two output neurons O1 and O2. A neural network learns by increasing the strength of connections between associated inputs and outputs. For instance should an input signal be present at I1 while an output signal is also present at O2, then the connection I1↔O2 would be strengthened. Consider Ivan Pavlov, his dog, a dog treat, and Pavlov’s bell.
A trained neural network acts by pro-actively triggering appropriate output neurons when specific input signals are present. Should a signal trigger the first input neuron in our example, I1, the second output neuron would be triggered pro-actively.
- Given the “learned” synaptic connection: I1↔O2
- Assuming: I1
- Triggered: O2 (I1→O2)
Consider Pavlov’s dogs salivating when the bells rang regardless if treats were provided.
If a second training exercise triggered input I1 but instead the first output neuron was triggered in lieu of the second, the connection I1↔O1 would also be strengthened. We now have,
- I1↔O1
- I1↔O2
After the second training session, should I1 be triggered once again, which output neuron would trigger? Without any further weighting functions to apply to our connections, a I1 signal would trigger both outputs,
I1→O1∪O2
Consider a situation where Pavlov’s dogs were sometimes offered treats when the bells rang, or sometimes were given electric shocks. What would the dogs have expected the next time bells rang? Would they have expected treats, electric shocks, or both?
Perhaps this state of affairs is desirable, perhaps it is not. Now that this cross-association is saturated however, there exists no way to trigger only O2 given I1. Even if all future training sessions reinforce the I1↔O2 connection, the system will remain ambiguous forever.
It is likely that nature’s first, simple neural networks exhibited this kind easy saturation. Perhaps early critters could only adapt to very limited environmental conditions during their very short lives. Perhaps these critters simply died from indecision if they encountered natural oddities they weren’t prepared for. In the competitive evolutionary race however, those critters who occasionally reset their saturated networks would have an evolutionary advantage over those who did not. To reset an easily saturated neural network would have been to allow the forgetting of anomalies. These critters would have had a better chance of survival in the real, random natural world. They would relearn their most common and important lessons and forget the oddities which simply did not pertain to most circumstances of their lives.
In the context of the article, 4-(3-phosphonopropyl) piperazine-2-carboxylic acid (CPP) provides an occasional “reset” function to spatial memory that allows de-saturation and re-learning. CPP is one of nature’s “dirty tricks” that helps to alleviate the downsides of easily saturated neural networks. Nature has converged upon many such dirty tricks over the eons, including:
- Chemical washes (CPP)
- Inhibition, “pulsing” and other mild periodic reset mechanisms
- Network segmentation (slows saturation)
- Physical growth and degeneration
- Specialty circuits (e.g., “instinct”)
- Preferential learning such as that which provides increased weight to electric shocks versus pleasurable food treats
- Consciousness (self-awareness)
- Concept formation and other information compression mechanisms
- Emotion, heuristic, magical thinking, social deference and economic behavior in humans
The basic lesson is that, short of ameliorating effects, all neural networks easily saturate. For any cognitive function, researchers should ask two questions:
- How does the associated network saturate? What are the effects?
- What solutions has evolution converged upon to de-saturate the network?