RESEARCH IN PROGRESS

On the Differential Impact of Postive and Negative Reinforcement

Thomas S. Critchfield and Michael A. Magoon

ILLINOIS STATE UNIVERSITY AND AUBURN UNIVERSITY

Acknowledgments

As illustrated by debates over one-factor versus two-factor theories of punishment and negative reinforcement, behavior scientists long have wondered whether appetitive and aversive events affect behavior through similar mechanisms. Largely overlooked in these debates is a conceptually simpler issue that has attracted considerable attention outside of behavior analysis: Regardless of their mechanisms of action, on a unit by unit basis, do consequences based on appetitive and aversive events have equal amounts of influence on behavior? Behavior analysts have said little on this issue, but cognitive research on decision making (e.g., Kahneman & Tversky, 1979) supports a differential-impact hypothesis by suggesting that losses exert greater impact on behavior than equal-sized benefits. The experiments on which this conclusion is based, however, have uncertain generality, focusing largely on verbal responses to hypothetical, anticipated consequences.

Most operant experiments with nonhumans can shed limited light on a differential-impact hypothesis because they employ qualitatively different appetitive (e.g., food) and aversive (e.g., electric shock) consequences that cannot be compared on the same measurement scale without special procedures (Farley & Fantino, 1978). Operant experiments with human subjects offer a distinct advantage in the present context, because their procedures often arrange consequences based on point gains and losses, making it possible to directly compare the relative effects of equal-sized appetitive and aversive consequences. Ongoing research in our laboratory employs concurrent schedules of reinforcement as a means of doing this. Here we present preliminary data that illustrate our approach to evaluating the differential-impact hypothesis.

For the sake of simplicity, we focus mainly on aversive control in the form of negative reinforcement, because (a) it can be procedurally quite similar to positive reinforcement, and (b) its effects on response strength can be measured directly (unlike punishment, which can be evaluated only in terms of the extent to which it reverses the effects of other operations). We employ concurrent schedules of positive reinforcement, involving money gains of amount X, and negative reinforcement involving the cancellation of money losses of amount X. In a two-ply concurrent schedule in which the responses produce qualitatively different reinforcers, preference for (i.e., differential impact of) one reinforcer is indicated by a consistent biasing of response allocation (a change in intercept; Baum, 1974). Thus, if one type of reinforcer is more potent than the other, relative response rate will consistently exceed relative reinforcement rate for the behavior maintained by that consequence.

As far as we can determine, only two published studies have examined concurrent schedules of positive vs. negative reinforcement in humans using equal-sized money outcomes (Ruddle, Bradshaw, & Szabadi, 1981; Ruddle, Bradshaw, Szabadi, & Foster, 1982). Both found that humans matched positive to negative reinforcement with no consistent bias, suggesting equal control by the two types of consequences, but the studies have limitations. For example, different types of schedules were used to arrange positive versus negative reinforcement, and there were problems regarding the independence of response options. One study employed no changeover delay, and the other employed a changeover procedure that could have created safety periods during which no money losses could occur on the negative-reinforcement schedule just after a switch (thereby reinforcing changeovers). We seek to improve upon the procedures of Ruddle and colleagues as a means of better evaluating the differential-impact hypothesis. Our ongoing investigations employ independent, identically-structured, concurrent schedules of variable-cycle (VC) positive and negative reinforcement. Thus, in positive reinforcement, the first response within a cycle immediately produces point gain (that gain is "forfeited" at the end of a cycle with no responding). In negative reinforcement, the first response in a cycle immediately cancels a point loss (which occurs at the end of a cycle with no responding). We have resolved the problem of adventitious safety periods by programming a changeover cost (a fixed-ratio response requirement on a changeover button) rather than a changeover delay.

The experimental task is based closely on that of Madden and Perone (1999). Consequences are point gains and losses (see below), and conditions are run to stability. In a pilot study, all subjects but one exchanged points for course credit (for exchange procedures, see Critchfield, Schlund, & Ecott, 2000). For these subjects, session earnings were supplemented during negative reinforcement conditions to prevent sub-zero session point totals (we feared that subjects might quit the experiment in such cases). Thus, a counter, not visible on the subject's screen during sessions, tallied session earnings and was displayed at session's end. At the start of a session, the counter was set equal to the programmed session rate of point loss that would accrue following no responding on the negative reinforcement schedule. One subject (S504) exchanged points for money and did not receive supplements to session totals.

Figure 1 summarizes the response-allocation results of the pilot study. Subjects completed four conditions under a 5:1 (VC 12 s VC 60 s) reinforcement ratio. In one pair of conditions (labeled "Rich" in the figure), both schedules produced positive reinforcement during baseline (black bar), and then negative reinforcement was substituted on the rich-reinforcement alternative during the subsequent condition (white bar). A consistent increase in preference for the rich alternative suggested a negative-reinforcement bias. In the other pair of conditions ("Lean"), the positive-reinforcement baseline condition was repeated, and then negative reinforcement was substituted for the lean-reinforcement alternative. Under these conditions, there was no systematic change in preference, suggesting that effects seen in the "Rich" conditions may have had some basis other than a reinforcer bias.

Now underway are studies in which each subject provides two complete matching functions, one involving positive reinforcement only and one involving both positive and negative reinforcement, across a range of relative reinforcement rates.

Figure 2 shows response-matching data from one subject who worked for money and received no session-total supplements. Compared to an all-positive-reinforcement baseline, the introduction of negative reinforcement for one response option (filled data points and dark regression line) induced no bias, but did increase the slope of the response-matching function (equivalent to magnifying rich-side preference in Figure 1). If replicated, the latter effect would provide the first provisional support for an as-yet untested prediction by Davison and Nevin (1999) of a slope-increasing "differential outcomes effect" in matching.

So far, contrary to assumptions in cognitive decision research (Kahneman & Tversky, 1979), our results suggest no systematic differential impact of positive and negative reinforcement (and this outcome appears not to depend on minor procedural variations like session earnings supplements and exchanging points for money vs. course credit). It is difficult to affirm a null hypothesis, but if this finding holds up under more systematic investigation, it will raise interesting questions, not about positive and negative reinforcement, but rather about the procedural differences between operant and cognitive investigations that bear on a differential-impact hypothesis.

REFERENCES

Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231-242.

Critchfield, T. S., Schlund, M., & Ecott, C. (2001). A procedure for using bonus course credit to establish points as reinforcers for human subjects. Experimental Analysis of Human Behavior Bulletin, 18, 15-18.

Davison, M., & Nevin, J. A. (1999). Stimuli, reinforcers, and behavior: An integration. Journal of the Experimental Analysis of Behavior, 71, 439-482.

Farley, J., & Fantino, E. (1978). The symmetrical law of effect and the matching relation in choice behavior. Journal of the Experimental Analysis of Behavior, 29, 37-60.

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decisions under risk. Econometrica, 47, 263-291.

Madden, G. J., & Perone, M. (1999). Human sensitivity to concurrent schedules of reinforcement: Effects of observing schedule-correlated stimuli. Journal of the Experimental Analysis of Behavior, 71, 303-318.

Ruddle, H. V., Bradshaw, C. M., & Szabadi, E. (1981). Performance of humans in variable-interval avoidance schedules programmed singly, and concurrently with variable-interval schedules of positive reinforcement. Quarterly Journal of Experimental Psychology, 33 (B), 213-226.

Ruddle, H. V., Bradshaw, C. M., Szabadi, E., & Foster, T. M. (1982). Performance of humans in concurrent avoidance/positive-reinforcement schedules. Journal of the Experimental Analysis of Behavior, 38, 51-61.