Understanding the Iterated Prisoner’s Dilemma
The entire book is based on a classic Game Theory problem called Prisoner’s Dilemma, so I will first try to explain the idea behind it, before we get to the book notes themselves.
The Prisoner’s Dilemma describes a scenario where two players can either cooperate or defect (cheat / betray / be uncooperative). Based on what each of them does, they get a certain reward (or punishment).
In the classic scenario, there are two prisoners, Joe and Tuna, which are being interrogated. They have no way of communicating with each other, and therefore cannot agree on a strategy beforehand. The interrogator is asking each of them separately to give criminalizing information about the other, such that they can take them to court. In exchange, the interrogator promises that they will get beneficial treatment for helping.
So Joe and Tuna can cooperate with each other by staying silent, or they can defect by giving information to the interrogator. There are four possible outcomes:
- Both of them cooperate, and they give no information about each other to the interrogators. They both get a 2-year sentence.
- Tuna cooperates, by giving no information to the interrogator, but Joe betrays him. Tuna gets a 10-year sentence, while Joe gets set free (as a favor for helping the interrogators).
- Vice-versa (Joe stays silent and Tuna defects). Tuna gets set free and Joe gets the 10-year sentence.
- Both defect and give information about each other, they each get a 5-year sentence.
So the outcomes can be summarized on a table like this:
Joe cooperates | Joe defects | |
---|---|---|
Tuna cooperates | 2Y each | 0Y Joe, 10Y Tuna |
Tuna defects | 5Y each | 10Y Joe, 0Y Tuna |
The problem can be generalized to any situation where:
- If you cooperate with someone you both get good results.
- If you trick them into cooperating but you defect you get great results and they get terrible results (and vice-versa).
- If neither cooperates you get mediocre to bad results.
The Iterated Prisoners Dilemma is exactly the same concept, but instead of playing the game once, you play it for many times (i.e. in iteration). This is a small but important change, since you are now choosing your action knowing that you will be faced with the other player again, and they will remember what you did.
This situation comes up all the time in real life between individuals, companies, even between states. The entire book is about comparing various strategies one can use in such a setting, and finding out what impact different strategies have.
A strategy is a set of rules that one uses to choose their action in an Iterated Prisoner’s Dilemma game.
Example strategies:
- ALL D: Always defect.
- ALL C: Always cooperate.
- RANDOM: Cooperate or defect randomly.
- TIT FOR TAT: Start with cooperation. Then do whatever the other did in the previous round.
- SPITEFUL: Start with cooperation. If the other ever defects, always defect from that point on.
Spoiler alert: You might want to keep TIT FOR TAT in mind. It will appear often.
My Notes
Computer tournaments teach us about cooperation
Axelrod presents the results of various computer tournaments he organized, where he asked people to write computer programs to play the Iterated Prisoner’s Dilemma game against each other. So the participants would submit their code, which was playing with specific rules. At the end of each tournament, some computer programs (which we will refer to as strategies) would get more points than others, and Axelrod would try to understand which strategies did well and why.
This simple experiment, resulted in remarkable insight on how cooperation emerges among people, societies, and even biological systems (for example birds feeding in a crocodile’s mouth, and the crocodile not eating them, because they are cleaning its teeth).
Based on the results of the tournament, there are four behaviors that can best ensure that a strategy will do well:
- Be nice. Avoid unnecessary conflict. Cooperate as long as the other person is cooperating. This creates a stable environment where you both get good rewards.
- React when the other is defecting for no reason. Otherwise you look exploitable and players will try to exploit you.
- After you respond to the defection, forgive. Don’t be spiteful, because it can create an endless loop of defecting at each other, resulting in both players getting the lowest reward possible, and therefore you both lose.
- Follow a simple strategy, such that others can recognize it, and adapt to it. If your reactions are complex, they are as good as random to the other person, and they cannot learn how to cooperate with you.
Being nice was the single most important property that differentiated the high scoring strategies from the low performing ones in the tournament. By nice, we mean that you should never defect first, and cooperate as long as the other player cooperates.
“Eye for an eye” or “Do unto others as you would have them do unto you”?
The most robust strategy was TIT FOR TAT. It did well against all other players. And it displays all of the characteristics of well performing strategies:
- It always starts with cooperation, so it’s a nice strategy.
- It always retaliates defection immediately, so it cannot be exploited.
- It always forgives right after retaliating, so it does not end up in defection loops.
- It is easy to recognize, so others can easily adapt to it.
As a bonus, it is a very, very patient strategy. It will always forgive once it retaliates, no matter how many times it is provoked.
Interestingly, TIT FOR TAT never scored better in a single game than the other player. It cannot, since it always lets the other player defect first, and never defects more times than the other player. So it will always get the same score as the other player or less.
However, TIT FOR TAT won in all the tournaments. This was because it was good at eliciting such behavior from other players that enabled them both to do well. It won by consistently being able to make other players cooperate, resulting in mutually rewarding outcomes.
TIT FOR TAT can elicit good behavior by other strategies so well because it is simple to recognize. The other person can see what you are doing very easily. Your pattern is straightforward to predict, and the other person can easily learn that the best they can do with you is cooperate.
If your behavior is too complex, it looks no better than total chaos for the other player. They cannot tell what you do, or why, and you appear unresponsive. This gives no incentive to the other player to try to cooperate, making it hard to elicit good behavior from them.
TIT FOR TAT sounds mean to most of us because it is essentially following the “Eye for an eye” rule, which is a bit harsh. The question one can ask, is, are there any better alternatives? For example, the most widely accepted moral rule is “Do unto others as you would have them do unto you”. In the context of Prisoner’s Dilemma, this means you should always cooperate, since this is what you would want from the other person.
This provides incentive for other players to exploit you. Moreover, unconditionally cooperating spoils the other player. Thus, implicitly, it hurts innocent players the exploiters will interact with later. And the rest of the community is burdened to reform the spoiled player. Curiously, reciprocity seems to be a better moral foundation than unconditional cooperation.
As a side note, when there is trustworthy central authority, you can rely on it to enforce the community standards. But in its absence, it falls to the players to give each other the necessary incentives to be cooperative.
In the WWI trenches
During the First World War, it was often the case that a battalion would stay at a fixed location for months. Apart from the atrocities that happened during that period, a fascinating aspect of the WWI Trench Warfare was how the Allies and the Germans developed cooperative behaviors, even though they were enemies, and without prior agreement. The English knew that if they did an aggressive attack, the Germans would reply back equally viciously. If the French let Germans rest during lunch time, the Germans would do the same.
So they developed various mutually followed behaviors. An account by a British officer facing a Saxon unit of the German army is almost hard not to laugh:
I was having tea with A Company when we heard a lot of shouting and went out to investigate. We found our men and the Germans standing on their respective parapets. Suddenly a salvo arrived but did no damage. Naturally both sides got down and our men started swearing at the Germans, when all at once a brave German got on to his parapet and shouted out “We are very sorry about that; we hope no one was hurt. It is not our fault, it is that damned Prussian artillery.” (Rutter 1934, p. 29)
In 1914 the two sides even unofficially stopped fighting during Christmas, and exchanged food and souvenirs. There are even songs about it.
Because the two sides would face the same opponents again and again, the stationed battalions of the Allies and the Germans were essentially in an Iterated Prisoner’s Dilemma scenario.
Life is a non-zero sum game
We tend to judge how well they do by what is available in their surroundings, and that is usually other people. We compare with others’ success, which often leads to being envious. Therefore, we try to reach the other player by defecting, but this leads to mutual punishment in the end. Envy is self-destructive.
How well you are doing compared to how the other players are doing is a bad metric, unless your goal is not to do well, but to destroy the other players. In most situations, you can either not destroy the other player, or it will be very costly to try.
A better metric is how well you are doing compared to how well someone else could be doing in your shoes. Given the others’ behaviors, are you doing as good as possible? Could someone else in your situation do better with this other player?
We often think of many situations in life as zero-sum games. That means a situation where advantage for one player means equivalent loss to the other, like in a chess game for example. But most situations are not zero-sum.
Smart people realize that, and find ways that make everyone win, rather than how can they win at the expense of others (think alliances in Big Tech etc.). In a non zero-sum world, you don’t have to do better than others to do well for yourself. Everybody can win. Therefore, it doesn’t make any sense to be envious if someone is doing a little better than you, as long as you take care to do well for yourself. In fact, in a game with repeated interactions, the other’s success is a prerequisite for yours.
How to promote stable cooperation
Cooperation among people can be encouraged by:
- enlarging the shadow of the future
- changing the payoffs
- teaching people to care about others
- teaching the value of reciprocity
If you think you will not see the other player ever again, then it makes sense to defect, and get the maximum reward. Therefore to make people cooperate, we should enlarge the “shadow of the future”. Make interactions more durable, and more frequent.
Another way to promote stable cooperation, is to make the gains from defection smaller than the gains of cooperation. Try to make the long-term incentive for mutual cooperation greater than the short-term incentive for defection.
For instance, if you are a business, it makes sense to ask for frequent small payments throughout a project, rather than a lump sum at the end. The other player will not try to cheat by taking the product and not paying, since they will only save a small amount of money, and they will not get the entire product.
Pitfalls to avoid
When you know that the other player is not reciprocating it doesn’t make sense to cooperate since whatever you do, the other person will keep doing their thing.
If you are cautious, and try to “play it safe” by starting with defection until the other person cooperates, it will backfire. Your defection will likely trigger a retaliation from the other player. Remember that the most important characteristic of winning strategies in the tournament, was being nice: never defect first.
It is important to be able to recognize defection when it occurs, in order to retaliate. In practice, this is harder than it sounds, because often you are not sure of the other person’s motives and actions.
Hierarchies make equality hard
Status hierarchies can have impactful effects on how cooperation evolves among people in different levels. Suppose that everyone when dealing with people below their status, are sometimes cooperating and sometimes defecting. At the same time, they don’t tolerate defection from those below them. When dealing with some one above them, they always cooperate, unless the other defects twice in a row. So with people below you, you are being a bully by often defecting but not tolerating defection, and with people above you, you are tolerating being a sucker sometimes, but only allow certain amount of exploitation.
This means that people near the top will always do well because they can defect everyone else some times, and get away with it. In contrast, people near the bottom will do poorly because they are meek to everyone. They can’t defect because they will immediately receive defection from those above them. An individual operating alone at the bottom of the social structure is trapped. They are doing poorly, but if they try to react they will do worse.
Should you hide your strategy?
A way to measure the value of any piece of information is to calculate how much better you could do with the information than without it. Thus, the better you can do without the information, the less you need the information, and the less it is worth.
Depending on your strategy, it might be costly to you if others know it. For example, if you are using a meek strategy that can be exploited, and the other player knows it, you will get exploited. On the other hand, if you are using a strategy that works best with cooperation, like TIT FOR TAT, then you are happy if the others know your strategy, because they know they should cooperate with you.
With TIT FOR TAT you are asserting a fixed behavior pattern. Since it is very easy to recognize, you let the other player adapt to you. You refuse to be bullied, but do not do any bullying. If the other player adapts, then you have mutual cooperation.
The value of reciprocity
The biggest surprise was the value of reciprocity, and especially provocability. Common sense says one should be slow to anger, but the results of the computer tournaments show that it is better to respond quickly to provocation.
The reason is that if one waits to respond to an uncalled for defection, they risk sending a wrong signal. The longer defections are allowed unchallenged, the more likely that the other player will draw the conclusion that defection can pay. And the more strongly this pattern is established, the harder it will be to break it.