The singular exception is provided by the human ability to remember past happenings. When one thinks today about what one did yesterday, time’s arrow is bent into a loop. The rememberer has mentally traveled back into her past and thus violated the law of the irreversibility of the flow of time. She has not accomplished the feat in physical reality, of course, but rather in the reality of the mind, which, as everyone knows, is at least as important for human beings as is the physical reality." -Endel Tulving
Forgetting is probably the scariest thing I can think of short of death itself (indeed, perhaps even scarier). The two phenomena seem to have a lot in common if you think about it! Both are inexorable, incompletely understood, terrifying natural destructions of the self. Both, also, seem to be necessary side-effects of the very processes that enable life and our ability to remember it. Life is the constant (and ultimately losing?) struggle of processes creating order against the prevailing thermodynamic tendency toward disorder (entropy), made possible by a continual flux of energy from the environment. Memory, too, is a way of imposing temporary order on fleeting perceptions, allowing our sensory experience to temporarily persist in the form of impressions which, if attended to and encoded, become memories… at least, until we forget them.
Just imagine “you” without your memories, without the impressions of your past experiences… there’s really no such thing! Without memories of the past serving to verify our continued existence, we lose the concept of time altogether and are consigned to exist forever in the present moment. Less existentially dramatic, imagine everyday cases of forgetting like losing access to knowledge you worked hard to acquire, skills you’ve trained painstakingly to master. Needless to say, I have a morbid fascination with forgetting (and remembering, which we will see is part and parcel). Like those latterday immortality-seekers, in my own way I too care deeply about such preservation… not of life per se (though I do think I support these efforts), but of my memories of my own life as-lived; unadulterated access to my past experiences as-encoded. I’m not asking for superhuman mnemonic powers or an eidetic memory; no, I humbly desire continued access to all the ideas_ I’ve had_ and the things _I’ve tried consciously to remember_. It’s scary when you realize you don’t know what you no-longer know… because you’ve necessarily forgotten about all the things you’ve forgotten, all that you once knew. At the same time, even in the face of its many foibles, human memory is an astounding innovation! An apparatus shaped by natural forces to serve only a few basic needs (food/water, shelter/safety, mates…) is currently being used to perform all sorts of amazing feats in a world it simply was not evolved for. But more about memory later; this post is about forgetting.
Forgetting occurs in at least two ways. One is active around the time of initial memory formation (encoding), and the other is active as you recall information from memory (retrieval). I’ll give you a foretaste of both phenomena, to be savored at length in due time. The first type of forgetting has to do with the fact that recently formed memories take time to fully consolidate, and during this time they are vulnerable to interference by other mental activity, which can weaken and even eliminate incipient memories! We will talk about how sleep and alcohol actually improve memory for previously learned material by reducing the amount mental activity in the intervening period, which would otherwise interfere with the consolidation process (e.g. Gais, Lucas, & Born, 2006; Bruce & Pihl, 1997; Lamberty et al., 1990). We will also talk about how neuroscience bears this out; recently induced long-term potentiation (LTP) - the neural mechanism thought to underlie memory formation - is inhibited by subsequent LTP, even if the tasks are unrelated (Xu et al., 1998; Abraham, 2003). This damaging effect of new learning has on prior learning becomes less and less pronounced as the delay between new and prior learning is increased. This is super fascinating, so stick around; we’ll save this best bit for last.
The second way forgetting can occur is through memory retrieval; that is, remembering one thing may necessarily make it more difficult to remember other things. When you go to retrieve a memory, something has cued you to search for that memory. If many different memories are associated with the same cue, then they “compete” for access to conscious awareness during the memory search; the more of these cue-to-memory associations there are, the more difficult it will be to recall the desired information. Furthermore, increasing the strength of one cue-to-memory association necessarily weakens the associations of all other memories to that cue: this is why it is hard to remember all of the things in a given category, like the wives of Henry VIII. As you recall each additional wife, the relative strengths of the others in memory are thereby diminished, and thus the remaining wives become harder to recall. “Ah, I can never remember that last one!” Selectively retrieving an item from memory, though bolstering future recall for that item, harms our later recall of similar items. As we will see, this negative effect of retrieval may even extend beyond related items; much evidence exists to suggest that retrieval of information is impaired by the previous retrieval of other, unrelated items. This can be viewed as a basic consequence of increasing non-specific mental activity during the retention interval, when presumably the original memories for the to-be-remembered material are still in a state of consolidation. Incredibly, recent research suggests retrieving an well-remembered item from memory may put that item in a state of “reconsolidation” thus re-opening even well-remembered, durable memories to the interfering effects of mental activity (Dudai, 2004). That is, recently activated memories may be just as vulnerable to vitiation by subsequent mental activity as recently formed memories are, though this is still the subject of much debate.
Since these processes of encoding and retrieval are going on literally all of the time in each and every sober, waking one of us, so too is forgetting. I have now given you a relatively complete overview; next, we look into both retrieval-based and encoding-based forgetting phenomena in greater detail. In this post, I rely heavily on two review papers (Anderson & Neely, 1996; Wixted, 2004) both for ideas and sources; I highly recommend reading them both in full, especially if this post leaves you with lingering questions.
“Interference” during Memory Retrieval
An old, solid theory of forgetting was based on various, often related memories interfering with each other. The topic of interference generated loads of research from 1900-1970 and led to many important discoveries in the study of human memory. Though it is no longer alone completely satisfactory as a unified theory of forgetting, it is extremely robust in many respects and it continues to shed light on our understanding of cognition.
In this context, “interference” is the impaired ability to remember information that is similar to other information in memory. It is a theory of forgetting, which itself has several proposed mechanisms. Interference occurs when newly acquired information impairs our ability to retrieve previously stored memories (retroactive interference), or conversely, when what you already know makes it harder to learn something new (proactive interference). It happens when new learning crowds out old learning, or vice versa. Imagine you get a new telephone number; at first, when people ask you for your number, you may give them your old one by accident (proactive interference). However, after a year’s experience using your new phone number, you may find your old number difficult to recall (retroactive interference).
But this is not the usual way! Typically, the more you know about something, the more readily you will be able to associate incoming information with stuff you’ve already stored in memory; indeed, high levels of domain-specific knowledge can enable otherwise low-aptitude people to perform at the same level as their high-aptitude counterparts (Walker, 1987). By virtue of these additional associations and interconnections, the new memory will be more durable and accessible via multiple pathways. And I’ve talked before about the testing effect: the fact that retrieving a given memory makes that memory easier to retrieve in the future. So what gives?
The trouble starts when the same retrieval cues are related to multiple items in memory. Retrieval cues can be attributes of the target memory or just incidental contextual factors that were present at the time of encoding. For example, when you park your car at the store, aspects of your parking experience are encoded into a mental representation of the event (store, time of day, the fact that you drove a car, the type of car you drove, your internal state…). To the extent that your other parking experiences are similar, they will also contain these characteristics. If these serve as the primary cues which you use to recall your car’s location, other memories sharing these features will also be evoked.
Interference increases with the number of competing memories associated with a given cue or set of cues; thus, going from retrieval cue to target memory depends not only on the strength of cue-target association, but also on whether the cue is related to other items in memory.
Early research into these phenomena was done using the something called the A-B, A-D paradigm. Participants would be instructed to learn random pairs of words (A-B), such as dog-boat, desk-rock, etc. Once they had learned these associations, they would be given another list (A-D) where the first word in each pair was the same, but the word it was paired with was different (dog-sky, desk-egg). Various methods of testing show that memory performance on the first list (A-B) suffers when second list of responses (A-D) must be learned, presumably because people show mutually incompatible responses to the common cue “A” (McGeoch, 1942).
Observations such as this led to the ratio rule: the probability that you recall B given cue A is equal to the strength of the A-B association relative to the strengths of all other associations involving cue A. That is,
$$ P(B|A) = \frac{strength_{AB}}{strength_{AB}+strength_{AD}+...} $$
Further findings emerged from a paradigm known as Part-Set Cuing. Slamecka (1968) had people study six words from each of five different categories (e.g., types of trees, birds, etc) and tested them later by giving them each category and having them recall the 6 items that went with each. Crucially, the experimental group was given a couple of items from each category as cues (they were cued with part of each set) to help the recall the remaining items, while the control group was given no cues. It turned out that, when you count the number of non-cued target items recalled in both groups, the people who received the cues performed significantly worse than those who got no cues. Roediger (1973) showed that this deficit in recall for remaining items increases as a function of the number of part-set cues given. The crucial factor is that the cues and the targets have a common retrieval cue: the category. However, while part-set cuing hinders recall of remaining items in the set, it has no effect on people’s ability to recognize those remaining items in a list of distractors.
Dewey Rundus (1973) claimed that part-set cuing induces “retrieval competition” between cued items and non-cued target items. In terms of the ratio rule described above, presentation of non-target items strengthens the association of these items to their category cue, which reduces the relative strength of target items. Like, if the category was fruits and you had been given 6 fruits to remember, then providing you with the cue ‘orange’ at the time of recall strengthens the fruit-orange association while weakening the association of the 5 others, thus making them more difficult to recall.
So increasing the strength of one cue-item association necessarily weakens the association of all other items with that cue. As if that wasn’t bad enough, it turns out that a shared cue may not even be required for such “retrieval induced forgetting”. Many studies have demonstrated Output Interference, the finding that recall of a target item is impaired by previous retrieval of other items regardless of whether or not the target item shares cues with those retrieved items! This observation is extremely important and has helped to shape recent theory about forgetting.
In one early study, AD Smith (1971) had people study 7 items from 7 unrelated categories; the final test was simply cuing people with the category names in different orders. He found that the average number of items recalled dropped significantly with each additional category, from 70% for the first to 45% for the seventh. The same thing was found when examining paired associates (Roediger & Schmidt, 1980): participants studied 20 A-B pairs and were then given A-____ as the test cue and asked to recall the target (B). Performance was found to decrease as a function of previous retrievals; across 5 sequential blocks (of 4 questions each), there was a systematic decline in average correctly recalled (.85, .83, .80, .76, and .73). The decline in successful future retrieval caused by prior retrieval, then, does not depend on the category. The findings hold for recognition tests, too. Smith repeated the previous experiment but gave a test that listed 7 items alongside 7 unrelated distractors for each category, requiring subjects to pick the 7 that appeared on the initial study list, very similar decreases were observed. The finding appears to be quite robust (Ratcliff, Clark, & Shiffrin, 1990).
Other studies have shown that both general output interference and cue/category-specific retrieval competition operate separately and simultaneously (Neely, Schmidt, & Roediger, 1983). Even controlling for the passage of time (and thus any residual effects of working memory), output interference persists. As we will see, this distinction between cue-dependent forgetting and forgetting that results from nonspecific mental activity is important because it implies (at least) two different processes.
The Paradox of Interference
Any cognitive act that involves representations stored in memory requires the process of retrieval. If retrieval itself is a source of interference, then accessing what we already know contributes to forgetting, independent of any new learning (Roediger, 1974). If semantic memory (memory for facts) was as susceptible to interference as episodic memory, then this would lead to the paradox of interference (Smith, Adams, & Schorr, 1978). That is, as an expert learned more and more facts about a specific topic/subject/category, he or she should develop more and more difficulty in remembering any one of them (i.e., by the ratio rule). But thankfully, this does not seem to happen. Why?
John Anderson (1974) performed an interesting study where people were asked to study various “facts” of the basic form “a person is in the place”, for instance, “a hippie is in the park” or “a lawyer is in the church”. Then, subjects were given test items where they had to verify whether or not certain statements were true (eg, “a hippie is in the school”). The crucial finding was that the greater the number of facts learned about a person or location, the longer it took for subjects to verify a statement about that person or location.
Assuming, as we have been, that facts are stored in memory as a network of associations, we can imagine that cue like “a hippie is in the park” causes memory nodes “hippie"and “park” to become activated, and that this activation spreads to other nodes that they are linked to in memory. If activation from one node intersects with the activation from another node (i.e., if they are associatively linked), a person would say “true”. Under these assumptions, the speed at which activation spreads down an associative link coming from a node is thought to be slower the greater the number of other associative links there are fanning out from that same node. Thus, the more different facts a person learns about someone (hippie, lawyer, etc) the longer the verification times become.
This “fan effect” is not just an artifact of using these questionable pseudo-facts. In a follow-up study addressing these concerns, students studied true facts about famous people (eg, “Teddy Kennedy is a liberal senator”) alongside up to 4 fantasy facts about them. Even when people knew they were being tested only on the true facts, a fan effect occurred as the number of fantasy facts learned about the famous person increased (Lewis & Anderson, 1976).
Two studies provided helped to resolve this paradox (McCloskey & Bigler, 1980; Reder & Anderson, 1980), and they both depend on category hierarchies: if memory search can be restricted to subcategories, they found, then only the facts within that subcategory affect search time. For example, an expert on Richard Nixon has many different categories of information stored about him (his foreign policy, his family life, Watergate…). When asked a question about Nixon’s wife, the expert can limit the search to the family subcategory; crucially, only facts within this subcategory will produce interference effects. When retrieving information from memory, people first select the relevant subcategory, and the time it takes to do this is affected by the number of irrelevant subcategories BUT NOT by the number of facts within those irrelevant subcategories. Once the relevant category is selected, only the facts within that category show a fan effect.
Thus, interference effects occur in both semantic and episodic memory. But with semantic memory, memories may be more easily compartmentalized into subcategories that allow for a focused search that restricts the source of interference to items within that category.
The new perspective on interference (first hinted at by Roediger, 1974) holds that our tendency to forget is intimately bound to the very mechanisms that allow memory retrieval to occur. It has come to be known as Retrieval-induced Forgetting (RIF), and it has as its counterpart the Testing Effect (aka Retrieval Practice), which is the happier finding that retrieving information from memory enhances future retrieval of the same information, above and beyond simply restudying the information.
A definitive experimental demonstration of both effects can be seen in Goodmon & Anderson (2011), summarized in the graph above. Essentially, participants study pairs of words that consist of a category name and then an item from the category, such as METAL-iron, TREE-birch, FRUIT-orange, METAL-silver, TREE-elm… There are usually 5-10 categories and 5-10 items per category. After studying the initial list, participants are then given a fill-in-the-blank test over some of the items they saw. They might get METAL-i___ _and _TREE-e_____ for example, with the initial letter provided as a cue. After this, there will be items that were practiced (**Rp+**, such as _METAL-iron, TREE-elm_), items that were not practiced but related to practiced items by category (**Rp-**, such as _METAL-silver, TREE-birch_), and unpracticed, unrelated items (**NRp**, such as _FRUIT-orange_).
After this test phase which provides retrieval practice for certain items, participants are given a final test in which they are asked to remember all studied items! The goal is to see how retrieval practice affects participants' recall for unpracticed-related words (Rp-) compared to unpracticed-unrelated words (NRp). The results of these studies follow the same general trend, with one example being given in the graph above. Notice three things: (1) the items people had practiced retrieving (Rp+, like METAL-iron) are remember best, (2) unpracticed items in the same category are remembered worst (Rp-, like METAL-silver), and that unpracticed items from unpracticed categories (NRp, like FRUIT-orange), are in the middle. Thus, in a single experiment, you can demonstrate the beneficial effects of retrieval practice on the practiced items, and the detrimental effects of retrieval practice on items related to the practiced items (that is, retrieval-induced forgetting).
Selectively retrieving items from memory harms our later recall of similar items by some kind of suppression of the latter. This helps to overcome competition by these related associations when we are interested in retrieving one of them in particular, but it has the significant downside of impairing retrieval of those other items were they to again become relevant.
“Interference” during Memory Encoding
In 1924, Jenkins and Dallenbach demonstrated that everyday experiences can interfere with memory. Their study showed that recall of information was better after period of sleep than after an equal period spent awake (controlling for time of day, etc). The findings are robust and have been replicated many times. For example, high school students' ability to remember new vocabulary terms is enhanced when sleep follows within a few hours of learning (Gais, Lucas, & Born, 2006).
In the late 19th century, observations of patients with brain damage leading to retrograde amnesia (memory loss for prior events) revealed that the degree of forgetting was greater for more recently acquired memories than it was for older memories; this came to be known as Ribot’s Law in honor of one of it’s earliest discoverers. In retrograde amnesia, more recent memories are more likely to be lost than more remote ones. Because of the effects of forgetting occur on a temporal gradient, this phenomenon is appropriately called “temporally graded retrograde amnesia”. It can be induced by electroshock therapy and is seen in many neurological disorders including Alzheimer’s disease. All this is to suggest that older memories are somehow strengthened against degeneration while newer memories are not. This is consistent with Jost’s second law (1897) and the fact that forgetting data is not well fit by an exponential function (that is, a function with a constant decay rate), but fit much better by functions with ever decreasing proportional rates of decay!
This finding of a temporal gradient in interference is very suggestive of the idea that memories consolidate over time. This would imply that retroactive interference would be stronger the closer it is to the original learning; that is, interference should affect newer memories more than it affects older ones.
You could test this in the laboratory by having people learn something, and then interfere with this learning at differing intervals by having different groups of people study something else after different amounts of time had elapsed. Amazingly, a laboratory study testing these ideas was conducted in 1900 by Muller and Pilzecker in Germany produced findings absolutely consistent with this hypothesis. Imposing mental exertion earlier in the retention interval resulted in poorer performance when subjects were called upon to remember the original information. And it need not be list-memorization, either! Early research began to uncover temporal gradients when the interfering mental exertion was solving algebra problems (Skaggs, 1925), reading a newspaper (Robinson, 1920), etc. Why would this happen? One leading hypothesis that has received neuroscientific support is that the resources required to consolidate memories are themselves limited, and so any subsequent learning takes away from resources that would have otherwise been used to consolidate the original learning.
As discussed above, an A-B, A-D paired associates learning paradigm is where participants learn a list of items A-B, and then some time later a list of items A-D, and then later are given a cued recall test for their memory of the original list (prompted with A, they have to produce B). The period from original learning of the A-B list until the final test is called the retention interval, and the learning of list A-D produces retroactive interference on participants memories for the A-B list. Later studies using this paradigm to search for an temporal gradient of retroactive interference found an inverted U-shape. Poor recall of A-B was observed if A-D was learned soon after; much better recall of A-B was found if A-D was learned in the middle of the retention interval; poor recall of A-B was observed if A-D was learned right before the cued-recall test. This finding is quite common (Wixted, 2004) and suggestive of both forms of forgetting discussed in this post: interference during consolidation, and interference due to retrieval competition.
Retrograde Facilitation?
If increasing mental exertion during the retention interval results in poorer recall, then reducing mental exertion as much as possible after learning should improve recall! Unfortunately, a waking brain is almost always aflutter with activity… but sleep, alcohol, and benzodiazapenes all reduce this activity, inducing a temporary state of retrograde amnesia and closing the brain (specifically, the hippocampus) to much new input. By limiting this new input, recently formed memories should be protected from the retroactive interference they would otherwise encounter from waking mental activity.
Let’s talk about alcohol first, because it’s fun. As you may have experienced, alcohol causes a certain degree of anterograde amnesia for materials studied (and events experienced) while under the influence. It is a less widely known that alcohol actually results in improved memory for materials studied just prior to consumption (Bruce & Pihl, 1997; Lamberty et al., 1990; Mann et al. 1984; Parker et al. 1980, 1981). Not that this is a good study strategy… (though who’s to say). The prevailing explanation for this finding is that alcohol facilitates recent memories because it prevents the formation of new memories that would otherwise cause retroactive interference (Mueller et al. 1983). The same thing is observed with benzodiazepines (Coenen & Van Luijtelaar, 1997). All of this is entirely consistent with the idea that ordinary forgetting is caused by the retroactive effects of subsequent memory formation that accompanies ordinary mental activity.
No direct tests for a temporal gradient have been done with alcohol or benzos in the retention interval, but some studies a la Muller & Pilzecker (1900) have revealed such an effect for sleep. For example, Ekstrand (1972) used a 24-hour retention interval and had subjects sleep 8 hours either right after the original learning occurred or right before the recall test; people in the immediate-sleep condition recalled 81% of the items, whereas people in the delayed-sleep condition recalled only 66%.
Neural Mechanism
As I’ve talked about in other posts, long-term potentiation (LTP) in the hippocampus is the leading explanation for how memories are initially formed (Martin et al. 2000). LTP is a long-lasting enhancement of synaptic transmission (the “receiving” neuron becomes more sensitive to the “sending” neuron) brought about by high frequency stimulation from sender. Interestingly (!), alcohol and benzodiazepines are both known to block LTP in the hippocampus (Del Cerro et al., 1992; Evans & Viola-McCabe, 1996; Givens & McMahon 1995, Roberto et al. 2002). Furthermore, alcohol does NOT impair the maintenance of hippocampal LTP induced PRIOR to consumption—indeed, consistent with memory findings, it enhances prior LTP!
With respect to sleep-related brain activity, it is known that LTP can be induced during REM sleep but not during non-REM sleep. This may account for the fact that during REM sleep, people are often able to remember mental activity (i.e., dreams), while during non-REM sleep people cannot remember any mental activity taking place (and thus do not experience dreams). Indeed, it has been shown that non-REM sleep protects previously-established memories from interference far better than does REM sleep (Plihal & Born, 1999); further, REM sleep was found to interfere with prior memories just as much as an equal period of intervening wakefulness! All of this is consistent with the observation that, while many prescription antidepressants greatly reduce REM sleep, they are not known to cause memory problems (Vertes & Eastman, 2000). Weirdly, REM sleep does appear to be very important for the consolidation of procedural memories which are non-hippocampus-dependent (Karni et al., 1994).
All this is to suggest that when the demands placed on the hippocampus are reduced, it is better able to coordinate memory consolidation. Cells in the hippocampus which fired together during waking experience were shown to be reactivated together during non-REM sleep in rats (Wilson & McNaughton, 1994). Also, coordinated firing between different areas of the neocortex has been shown to replay itself during quiet, unstimulated wakefulness in monkeys (Hoffman & McNaughton, 2002).
Instead of relying on sleep or alcohol to inhibit LTP, it is much more precise to administer a drug that selectively targets and prevents induction of hippocampal LTP. These drugs, known as NMDA antagonists, prevent LTP and thus prevent learning of hippocampus-dependent tasks. Experiments using these drugs show that, when administered after a learning task or after LTP is artificially induced in the lab by direct neuronal stimulation, they block all subsequent LTP which would otherwise interfere with the original learning; thus, memories for the originally-learned information (or the strength of the artificially induced LTP) are enhanced by taking NMDA antagonists during the retention period. That is, these LTP-blocking drugs produce all the same effects we’ve seen above, but allow us to draw more specific conclusions about the underlying mechanism.
In one great example of this research, Izquierdo et al. (1999) had rats learn a hippocampus-dependent task and then exposed rats to a novel, stimulating environment either 1 hour or 6 hours later. These researchers observed a temporal gradient: rats forgot more of the original learning when exposed to the novel environment 1 hour after learning, again suggesting that recently established LTP is more vulnerable to disruption by subsequent mental activity than LTP established longer ago. To investigate whether this interference was caused by new LTP associated with exposure to the novel environment, they administered an LTP-blocking drug to a group of rats prior to exposure to the novel environment (1 hour after original learning). These drugs prevented any retroactive interference effects; in this group of rats, memory for the original learning was unimpaired by exposure to the novel environment!
The same findings are observed if you artificially induce LTP and measure it’s decay over time. Abraham et al. (2002) induced LTP in the hippocampus of rats using electrical stimulation and these animals were returned to their usual “stimulus poor” home cage environments. In this low-interference environment, LTP decayed very slowly. In the experimental condition, some of the rats were exposed to a complex, stimulating environment (a larger cage, new objects, and other rats). Exposures to this environment caused the originally induced LTP to decay much more rapidly, and this interference was a function of the delay between exposure to the new environment and the induction of LTP.
Thus, regardless of whether original learning is “actual” or artificially induced LTP, subsequent interfering learning (either actual or artificially induced) interferes with the original learning, and this interference is more pronounced the smaller the delay between original and interfering learning. The central message here, and indeed the main argument made by Wixted (2004) from which this post draws shamefully, is this: the hippocampus is extremely important in consolidating newly formed memories and ordinary mental activity (particularly subsequent memory formation) interferes with that process. Thus, biological memory appears to be self-limiting; new memories are created at the expense of partially damaging other memories, especially if these other memories haven’t had enough time to consolidate. Even more spooky is the somewhat new idea of reconsolidation: that it is recently activated memories, not just recently formed ones, that are vulnerable to interference (Dudai, 2004). According to this theory, even if a memory has been completely consolidated, reactivation of said memory makes it just as vulnerable to interference as if it were newly formed; thus accessing consolidated memories might simply restart the consolidation process! AHH!