# Game Theory

Karl W. Palachuk
July 15, 2010

In college-level psychology courses, one of the fun things you get to do is train mice. In addition to being easy, training mice helps you learn a lot about behavior generally and rewards and punishments specifically.

Someone should write a book on training mice for kids. It’s simple. An 8 year old that can learn it.

For example, we can create a maze and put Miss Mouse at the entrance. Let’s say we want to teach her to always go right as the first move when entering a maze.  We’ll reward her when she goes right. If she goes left, there is no reward, we pick her up and start over.  Eventually we would expect Miss Mouse to always start out going to the right. That’s where the rewards are.

In the field of “Game Theory,” we can model learning without touching mice or spending money on cheese.  In the example above, we divide the mouse’s behavior into two categories: Go Right and Go Left.

Now let’s say that a basic store-bought, untrained mouse is equally likely to go left or right. So the probability left = 50% and the probability right = 50%.  Let’s also say that each reward will increase the probability of repeating the rewarded activate by 10%.

Here’s how the mouse learns: Chance of going right = 50%.

 Event 1: Mouse goes left No reward Chance of Going Right 50% Event 2: Mouse goes right Eats cheese 55% (50 X 110%) Event 3: Mouse goes right Eats cheese 60.5% (55 X 110%) Event 4: Mouse goes left No reward 60.5% (no change) Event 5: Mouse goes right Eats cheese 66.5% (60.5 X 110%) Event 6: Mouse goes left No reward 66.5% (no change) Event 7: Mouse goes left No reward 66.5% (no change) Event 8: Mouse goes right Eats cheese 73.2% (66.5 X 110%) Event 9: Mouse goes right Eats cheese 80.5% (73.2 X 110%) Event 10: Mouse goes right Eats cheese 88.6% (80.5 X 110%) Event 11: Mouse goes left No reward 88.6% (no change) Event 12: Mouse goes right Eats cheese 97.4% (88.6 X 110%)

In this example we see that after 12 trips into the maze, the mouse is likely to go right 97% at the time! Notice also that the mouse went the wrong way five times and the right way seven times.

All you home psychologists should know that the reward must be given right away.

Notice that rewarding the behavior you want has a dramatic impact on future behavior.

Stop.

Highlight That.

Rewarding the behavior you want

has a dramatic impact on future behavior.

Reward and believe:

That was fun, but we’re spending too much money on cheese. We can’t give a reward every time. The next experiment would be to give a reward with every second correct move rather than every time.

The result is that learning is a bit slower, but still quite dramatic.  After seven correct turns, the mouse is likely to go right almost 75% of the time.

So, we know that rewards work. What about punishment? Since we don’t want to physically harm our mouse, let’s say we stick to psychological damage. We’ll reward every second correct choice, but this time we’ll also have a mild punishment for incorrect choices. For punishment we’ll play ten seconds at Jethro Tull at very high volume. Again, the punishment must be administered right away to be effective.

Because this is a mild punishment, let’s say the effect is to decrease the chance at going left by 10%.

We start out with chance Right = 50% and chance Left = 50%

 Event 1: goes left Punish Chance of Going Left 45% Chance of Going Right 55% Event 2: goes right Reward 39.5% 60.5% Event 3: goes left Punish 35.5% 64.5% Event 4: goes right No Reward 35.5% 64.5% Event 5: goes right Reward 29.1% 70.9% Event 6: goes left Punish 26.2% 73.8% Event 7: goes right No Reward 26.2% 73.8% Event 8: goes right Reward 18.8% 81.2% Event 9: goes left Punish 16.9% 83.1% Event 10: goes right No Reward 16.9% 83.1% Event 11: goes right Reward 8.6% 91.4% Event 12: goes left Pushish 7.7% 92.3%

As you can see, you don’t need to give a reward every time, but a combination of rewards and mild punishments is very effective. You can also summarize from the math that greater rewards and greater punishment would result in more dramatic changes and behavior.

Some Words of Caution
In our example we use a mild punishment. Strong punishments are generally to be avoided. In addition to electrocuting our mouse, we want to avoid instilling too much fear.

Punishment works by increasing fear. A punishment that is too strong can leave the subject (e.g. Miss Mouse) nervous about making a wrong move. This can result in slow, cautious, halting behavior. See the note on consistency below.

You must also be careful with rewards. Once a behavior is learned you can cut the rewards way back. Even sporadic rewards can maintain a well learned behavior.

We won’t go through the math necessary to demonstrate diminishing motivation, but you should know that the chances of correct behavior will decrease as the time between rewards increases. Dropping all rewards altogether will have no immediate effect. However,  over time even well-learned behaviors will drift back to the probabilities we saw in the untrained mouse. One big reward all at once has almost no effect. If we give Miss Mouse a huge chunk of cheese the first time she goes right, but no rewards after that, she’ll think she just stumbled on some cheese. Smaller, regular rewards are much more effective.

The most important factor in using rewards and punishments is consistency. Close behind that is timeliness.

If you give a reward or punishment it must be administered immediately after the behavior. Think about training your dog: Doggie brings you the newspaper, goes outside for no apparent reason, comes back inside, gets a drink of water, then lies down to take a nap.

If you then praise the dog for bringing you the newspaper, he won’t connect the two. He will think he is being praised for lying down. It’s my personal theory that this is the reason dogs spend so much time lying down–they’re trying to make you happy.

Timeliness and consistency go hand in hand. You want to reward (or punish) behavior right away to have the greatest impact. Timeliness connects the reward (or punishment) to the behavior. Consistency provides reinforcement. If a mouse is rewarded sometimes for going left and sometimes for going right, she won’t see a connection between behavior and reward. Even worse, if she is punished sometimes for a left turn and sometimes for a right turn, she will avoid both behaviors.

Let’s go back to the lab for an illustration. The classic example of arbitrary rewards is the pigeon who gets fed a food pellet at random intervals. If the pigeon happens to be cleaning his wing when this happens, he might try cleaning his wing again to see if there’s another reward. And if there just happens to be a reward at the time he is cleaning his wing, he thinks he has learned a connection.

The same happens for scratching the floor, nodding his head, etc. With no connection between behavior and rewards, the pigeon will “learn” things that result in reward. So, after a few days we have a pigeon who spends all his time scratching and squawking and strutting around trying to “learn” a reward. Inconsistent, arbitrary rewards create and encourage a pattern of behavior, but not necessarily the behavior you want.

There is also the classic pigeon example of arbitrary punishment. When researchers randomly administer punishments, pigeons “learn” to avoid various behaviors. So, over time, we have a bird that doesn’t clean, doesn’t scratch, doesn’t walk in circles, doesn’t walk in a line. Eventually, the bird stands in one place afraid to take any action at all.

Inconsistent, arbitrary punishments lead to a fear of doing anything. You actually train the pigeon to do nothing.

In general, I believe rewards are a better teaching tool than punishments. Based on a worst case scenario of inconsistent, powerful rewards, you will have a subject who is constantly trying to do what it takes to get the reward. This subject is highly motivated and easily trained in the correct behavior: as you adopt a consistent reward procedure (even with small rewards), the subject will learn the new behavior quickly. And as rewards disappear for the old, arbitrary behavior, the old habits will fade away.

The worst-care scenario for inconsistent, powerful punishments is a subject who is paralyzed by fear. Adopting a consistent policy of rewards and punishments is very difficult in this case. First, you have to teach the subject that it’s okay to do something. There you have to coax it to overcome specific fears in order to try the behaviors that will now be rewarded.

As you can imagine, the quickest way to overcome fear and train new behavior in this case is with timely, frequent rewards; rewards powerful enough to overcome fear of punishment.

Does all of this really translate to human beings? Remember the mantra “Rewarding the behavior you want has a dramatic impact on future behavior.”

People absolutely respond to reward and punishment. If you don’t believe me, raise a child!

I am over-educated. I have used a few simple rules for raising my daughter.
1) No physical punishment.
2) She knows what the rules are.
3) She is consistently punished for incorrect behavior.
4) She is consistently rewarded for good behavior

I’m not perfect and my daughter is not perfect,* but my daughter knows she’s loved and she’s very well behaved.  She never begs for toys or candy at the store. I never go through the routine of some parents who say “no-no-no-no-no” until they finally say “yes, but this is the last time.”

Children are extremely smart. They are all naturally lawyers. They want to pick apart your answer for clarity and consistency. They compare the current answer to all past similar behavior. They are willing to negotiate and compromise until they get something out of the deal. It is very difficult in change a policy without a good reason. If you show any weakness, they’ll take advantage of it.

Children are also delightful to work with because humans are intelligent enough that we can talk about punishments and rewards and create punishments and rewards through the use of speech.

For example, you can create rewards by agreeing that a hug is a reward, or staying up on extra five minutes, or helping to cook the soup, or putting a gold star on the calendar.

The same is true of punishments. Sitting on the floor for five minutes is a punishment. In fact, this may be the most consistently successful punishment we’ve ever used. My  daughter was told that this is a punishment and it became one.

Adults have one major disadvantage: they have experienced a wide variety of rewards and punishments that are outside your relationship with them. Thus, they’ve learned about a world of rewards and punishments that is completely unknown to you.

 Punishing Adults Confused about punishment?  See Ken Blanchard’s The One Minute Manger series.  Full citations are in the left-hand column.

Very often we adults are a jumbled mess of mixed-up, inconsistent motivations and fears. This is great for psychologists but makes team management difficult.  Adults also have some advantages: they tend to be motivated to do well and they have excellent reasoning ability.

This reasoning ability gives us the power to lay out reward systems without a lot of “trial and error.” We can also agree before-hand on rewards and punishments. And, best of all, rewards do not have to consist of instant gratification.

So, rather than having to instantly reward people as we see the correct behavior, we can agree on incentive programs, weekly meetings, and quarterly reports.

Here are some guidelines . . . But, don’t forget what we’ve learned:

 You should reward the behavior you want to encourage. You should punish behavior you wish to discourage. Agree on rewards and punishment Consistent small rewards are generally better thatn a single large reward. Consistent small punishments are more effective than large punishments. Rewards and punishments must be timely. With humans they do not have to be “immediate” but should be close to the behavior. Be honest, open, and consistent. Don’t promise a reward and fail to deliver.

Why rewards and punishment, don’t work.

If this is all so simple, why does it seem to not work in your business? Well, as with so many simple truths, we humans don’t have enough faith and we don’t follow the formula. We sabotage over own efforts.

In the Big Picture, a motivational program should work like this:
1) Set goals – short, intermediate, long.
2) Establish rewards and punishments
3) Evaluate performance
4) Administer rewards and punishments (consistently, fairly, honestly)

Repeat:
1) Revise goals periodically
2) Revise Rewards and Punishments periodically
3) Continue to Evaluate

A simple 4-step process, repeated continuously. So why does it fail? It fails because we don’t do one or more of the steps. And 99% of the time, it’s the bosses fault. His excuse is usually “I don’t have enough time.” Goals are not set.

As a result, there is no structure for success. The manager doesn’t have time to tell people what she wants. So they do what they think they should do, whether its what the boss wants or not. In fact, the boss doesn’t even set her own goals.

Stop. Be your own boss for ten minutes.

What are three things you want to accomplish today?
What are three things you want to accomplish this week?

This Month?

Why don’t you take ten minutes every day to decide what’s important today?
Be honest, you do have time.

We . . . the vast majority of bosses and workers . . . don’t set goals. We don’t have a clear idea at what we’re going to do today that will help us advance toward the bigger goals.

Goal-setting should not be a huge scary task that requires retreat time or offsite meetings or long arguments.

Make a habit every day of jotting down your goals. Look at them everyday, and adjust them as needed. This ten-minute habit will change your life. It will bring focus.

The second reason motivational plans fail is lack of integrity. Bosses promise rewards and fail to deliver. Or they are inconsistent with rewards and punishments.

People learn very quickly and they remember negative experiences for a long time.

I have the great good fortune of seeing how different businesses operate. As a result I see motivational plans come and go. I also see successful reward structures that last for a long time.
Overwhelmingly, the lasting techniques are those that are:
1) Clearly understood by everyone.
2) Consistently followed–both rewards and punishments.
3) Perceived as fair.

I berate bosses for being stingy with rewards. Some bosses are even stingy with small rewards. Bosses are rarely stingy with punishments.  If you have a system of large rewards–such as \$1000 bonuses or trips to Maui–you had better be prepared to pay up.

But don’t forget that small rewards can be even more powerful. Five weeks into the quarter, some people know they’re not going to win the trip. What’s their motivation?

With small rewards there is a flurry of activity around the rewards. People get regular feedback and compete to get their name in the “star performance” chart, or try to collect the most T-shirts, squeezy toys, pencils, or whatever.

Every day and every week they can see their success. And their success is visible to themselves and others. Finally, competitions evolve as people display these little rewards as measures of their success.

It is beyond my capacity to understand why a boss would be stingy in this process.  Remember that, as humans, we create a reward by agreeing that something is a reward. When we say, for example, that a company T-shirt is a reward, then it has become more than a T-shirt.

If someone meets the criteria, give him the T-shirt! Stinginess with a ten dollar piece of clothing can destroy your motivational program.

First, you lack integrity. If you’re not fair on this little thing, how can your employees trust you on larger things?

Second, you turn a “performer” into a disgruntled employee.

Third, this kind of stinginess will become widely known in very short order.

So you see, bosses can sabotage their own motivational programs when they are stingy.

These discussions of the behavioral sciences are not meant to replace a Bachelor’s Degree in psychology. I encourage you to learn more about rewards and punishments in the workplace.

As a worker, consider what motivates you and talk to your boss about it. But don’t start with \$1000 reward and trips to Hawaii. Start with an examination of your daily and weekly activates. What would be an appropriate, small reward for reaching the next performance level each week?

If you’re a boss, consider the two or three basic “building blocks” of your success. What are the measures of your success? These could be increasing sales, productivity, or timeliness; or reducing mistakes, injuries, or sick days.

Find measurable indicators of your success. Begin measuring them and consider what kind of small rewards you can dole out each work for improved performance.

Then have the integrity to present the rewards as promised.

There are lots of good books on reward systems and building motivation in your workplace. You (workers and bosses) need to find a system that works for your job.

As usual, I encourage you to read lots of ideas on this topic and then come up with your own plan.

*Note: My daughter is perfect.