The first step toward understanding animal training is to learn some commonly used words and terms.
Operant conditioning is the basis of animal training. It is a type of learning in which an animal learns (or, is conditioned) from its behaviors as it acts (operates) on the environment. In operant conditioning, the likelihood of a behavior is increased or decreased by the consequences that follow. That is, a behavior will happen either more often or less often, depending on its results. When an animal performs a particular behavior that produces a favorable consequence, the animal is likely to repeat that behavior.
Animals learn by the principles of operant conditioning every day. For example, woodpeckers find insects to eat by pecking holes in trees with their beaks. One day, a woodpecker finds a particular tree that offers an especially abundant supply of the bird's favorite bugs. The woodpecker is likely to return to that tree again and again.
Humans learn by the same principles. For example, consider the behavior of a child doing chores. Suppose a child voluntarily performs a chore, like cleaning the garage or washing the car. If the behavior is reinforced by positive attention such as praise, money, or some other reward, the child is likely to do additional chores on his own. If no positive attention were to follow, repeating that behavior would be less likely.
Animal trainers apply the principles of operant conditioning. If an animal performs a behavior that the trainer wants to see performed again, the trainer will administer a favorable consequence.
If an animal performs a behavior that the trainer wants to see performed again, the trainer will administer a favorable consequence.
Let's explore those favorable consequences a little further. A favorable consequence is often a physical experience - something that can be seen, heard, felt, or tasted. This experience is a type of stimulus. When an animal performs a behavior that produces a positive stimulus, the animal is likely to repeat the behavior in the near future. The positive stimulus is termed a positive reinforcer because it reinforces, or strengthens, the behavior. When a positive reinforcer immediately follows a behavior, it increases the likelihood that the behavior will be repeated. It must immediately follow the behavior in order to be effective.
Positive reinforcement can also be called rewards. Rewards take on many forms. For animals, one of the most common rewards is food. Indeed, many behaviors animals do in the wild are for getting food. If certain behaviors allow an animal to get food successfully, the animal will repeat these behaviors the next time it is hungry.
Food is an example of a primary reinforcer. Primary reinforcers are reinforcers that are automatically positive. An animal does not have to learn to "like" them. Other examples of primary reinforcers include water, shelter, and mating opportunities. Several types of primary reinforcers provide tactile stimulation, like a good back scratch. For people, a hug can be very reinforcing.
Primary reinforcers are reinforcers that are automatically positive. Food is an example of a primary reinforcer.
Not all types of reinforcers are automatically positive. Some can be learned. Reinforcers that are learned are called conditioned reinforcers. For example, money is not a primary reinforcer. To small children, money is just paper. But children grow to learn that money can be used to buy candy, toys, and other things they like. Money becomes very rewarding. It is one of the most common and effective reinforcers in many human societies.
Animals learn conditioned reinforcers when they are paired with primary ones. Suppose an animal trainer exclaims "Good boy!" and then gives the animal food or a back scratch. After several repetitions, the exclamation "Good boy!" will become rewarding to the animal. Positive attention like this is a conditioned reinforcer.
Positive attention is a conditioned reinforcer.
A less common type of reinforcement is negative reinforcement. Unlike positive reinforcement, which involves giving a favorable stimulus, negative reinforcement involves removing an unfavorable stimulus. For example, consider a child crying or whining for something it wants. If his parent gives in and produces the desired effect (that is, giving the child what he wants), the child stops crying. He has reinforced his parent's behavior by removing the unfavorable stimulus. It can be argued that this isn't necessarily the ideal outcome for the parent - the parent has just reinforced the child's crying behavior! But it is an example of negative reinforcement.
Negative reinforcement is not punishment. Punishment involves giving an unfavorable consequence. Punishment decreases the likelihood of a behavior repeating. Both positive and negative reinforcement increase the likelihood that a behavior will be repeated.
Schedules Of Reinforcement
Positive reinforcement for desired behaviors may occur on one of four possible reinforcement schedules.
This schedule of reinforcement is based on receiving reinforcement after a fixed amount of time. The desired behavior must continue for a certain amount of time before a reinforcer is delivered. The amount of time between reinforcers is always the same. Perhaps the most common type of fixed interval reinforcement is an employee who gets a paycheck every two weeks for doing his job. The paycheck reinforces his continued work.
A fixed ratio reinforcement schedule is based on receiving reinforcement after a fixed amount of behaviors. The desired behavior must occur a certain number of times before it is reinforced. The number of behaviors always remains the same. Example: a child receives an ice cream cone after she reads five books. She receives a reward after every fifth book she reads.
Variable interval reinforcement occurs after varying lengths of time. The behavior is reinforced at random intervals. Volunteer work may fall under this category. Because it is not a paid job, volunteer workers find reinforcement in ways other than money. Words or gestures of appreciation, given at random intervals, are reinforcing. The volunteer is likely to continue his work.
Variable ratio reinforcement occurs after a varying number of behaviors. Reinforcement varies unpredictably, so the person or animal performing the behavior is never certain when they will be reinforced. A common example of variable ratio reinforcement in humans is gambling. A person depositing coins in a slot machine is never sure when he will receive a pay-off. Initially, variable ratio reinforcement may take longer to condition a behavior. But once conditioned, the behavior generally occurs at a higher rate and takes longer to extinguish.
Extinction Of Behavior
If a behavior is not reinforced, it decreases. Eventually, it is extinguished altogether. This is called extinction. Animal trainers use the technique of extinction to eliminate undesired behaviors. (In animal training, when a trainer requests a particular behavior and the animal gives no response, this is also considered an undesired behavior.) To eliminate the behavior, they simply do not reinforce it. Over time, the animal learns that a particular behavior is not producing a desired effect. The animal discontinues the behavior.
When using the extinction technique, it is important to identify what stimuli are reinforcing for an animal. The trainer must be careful not to present a positive reinforcer after an undesirable behavior. The best way to avoid reinforcing an undesired behavior is to give no stimulus at all.