Remembering and Forgetting, Saturation in Neural Networks

This study by Rosenzweig, Barnes, and McNaughton highlights the importance of forgetting in order to make the best use of the brain cells we have,

If we fail to forget, our neural networks will saturate and become useless.  Saturation in a neural network does not merely mean that a network cannot learn more, it can mean that a network could fail to respond to input in an appropriate manner.

Consider a very simple network consisting of two input neurons I1 and I2, and two output neurons O1 and O2.  A neural network learns by increasing the strength of connections between associated inputs and outputs.  For instance should an input signal be present at I1 while an output signal is also present at O2, then the connection I1O2 would be strengthened.  Consider Ivan Pavlov, his dog, a dog treat, and Pavlov’s bell.

A trained neural network acts by pro-actively triggering appropriate output neurons when specific input signals are present.  Should a signal trigger the first input neuron in our example, I1, the second output neuron would be triggered pro-actively.

  • Given the “learned” synaptic connection: I1O2
  • Assuming: I1
  • Triggered: O2 (I1O2)

Consider Pavlov’s dogs salivating when the bells rang regardless if treats were provided.

If a second training exercise triggered input I1 but instead the first output neuron was triggered in lieu of the second, the connection I1O1 would also be strengthened.  We now have,

  • I1O1
  • I1↔O2

After the second training session, should I1 be triggered once again, which output neuron would trigger?  Without any further weighting functions to apply to our connections, a I1 signal would trigger both outputs,


Consider a situation where Pavlov’s dogs were sometimes offered treats when the bells rang, or sometimes were given electric shocks.  What would the dogs have expected the next time bells rang?  Would they have expected treats, electric shocks, or both?

Perhaps this state of affairs is desirable, perhaps it is not.  Now that this cross-association is saturated however, there exists no way to trigger only O2 given I1.  Even if all future training sessions reinforce the I1↔O2 connection, the system will remain ambiguous forever.

It is likely that nature’s first, simple neural networks exhibited this kind easy saturation.  Perhaps early critters could only adapt to very limited environmental conditions during their very short lives.  Perhaps these critters simply died from indecision if they encountered natural oddities they weren’t prepared for.  In the competitive evolutionary race however, those critters who occasionally reset their saturated networks would have an evolutionary advantage over those who did not.  To reset an easily saturated neural network would have been to allow the forgetting of anomalies.  These critters would have had a better chance of survival in the real, random natural world.  They would relearn their most common and important lessons and forget the oddities which simply did not pertain to most circumstances of their lives.

In the context of the article, 4-(3-phosphonopropyl) piperazine-2-carboxylic acid (CPP) provides an occasional “reset” function to spatial memory that allows de-saturation and re-learning.  CPP is one of nature’s “dirty tricks” that helps to alleviate the downsides of easily saturated neural networks.  Nature has converged upon many such dirty tricks over the eons, including:

  • Chemical washes (CPP)
  • Inhibition, “pulsing” and other mild periodic reset mechanisms
  • Network segmentation (slows saturation)
  • Physical growth and degeneration
  • Specialty circuits (e.g., “instinct”)
  • Preferential learning such as that which provides increased weight to electric shocks versus pleasurable food treats
  • Consciousness (self-awareness)
  • Concept formation and other information compression mechanisms
  • Emotion, heuristic, magical thinking, social deference and economic behavior in humans

The basic lesson is that, short of ameliorating effects, all neural networks easily saturate.  For any cognitive function, researchers should ask two questions:

  • How does the associated network saturate?  What are the effects?
  • What solutions has evolution converged upon to de-saturate the network?

Software Process, Like “Method” in Science, is Bunk

My response to James Turner’s article, Process Kills Developer Passion,

…you’re spending a lot of your time on process, and less and less actually coding the applications… The underlying feedback loop making this progressively worse is that passionate programmers write great code, but process kills passion. Disaffected programmers write poor code, and poor code makes management add more process in an attempt to “make” their programmers write good code. That just makes morale worse, and so on.

Software process, like “method” in science, is bunk!  I finally understood what the philosopher Paul Feyerabend was trying to say.  What is important is the data, or the stuff of reality, otherwise known as results.  Experiment (software tests), deduction and induction are all very important but no two people are going to arrive at their conclusions in the same way.  That is, no two people process data and leverage their capacity for deduction and induction in the same way.  One person’s process (method) is another person’s confusion.

If you want to de-motivate a creative scientist or software developer, force them to think like someone else who isn’t them.

Process is no substitute for knowledge.

Boiling the Ocean

My book will be a tour of both the most basic topics in I/T architecture and, simultaneously the most advanced.  I say “basic” because the philosophical underpinnings of any enterprise are the basic foundation upon which that enterprise rests.  These are topics every child should come to understand.  I say, “advanced” however, because these topics are tragically not taught and left for “advanced” studies.


As I move forward writing my book, I must work to avoid “boiling the ocean”.  I must winnow down my list of topics to what is relevant in architecture and justify why.  Of course, everything is relevant, eh?  (This will be one of my points.)