Socially Rational Models for Autonomous Agents
Autonomous multi-agent systems that are to coordinate must be designed according to models that accommodate such complex social behavior as compromise, negotiation, and altruism. In contrast to individually rational models, where each agent seeks to maximize its own welfare without regard for others, socially rational agents have interests beyond themselves. Such models require a new type of utility function a social utility to ensure three desirable properties: (a) conditional preferences agents may adjust their preferences to account for the preferences of others; (b) endogeny group preferences are determined internally by interactions between individual agents; (c) framing invariance reformulations of the social model using exactly the same information should not alter the conclusions; and (d) social coherence no individual's welfare is categorically subjugated to the welfare of the group. Social utilities in turn require a compatible solution concept optimal failure-avoidance. Satisficing game theory embodies both social rationality and optimal failure-avoidance and provides a formal mathematical framework in which to balance group and individual interests in mixed-motive societies. The satisficing approach is applied to two scenarios: the Ultimatum game and a random graph search problem. The Ultimatum game is one for which game-theoretic analysis does not correspond well to empirical data regarding human behavior; it is thus an important test case for a new theory. The graph search scenario is an idealization of an important resource allocation problem in which the ability to compromise and negotiate can greatly facilitate the search for a solution.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 533 Social Utility Functions Part II: Applications Wynn C. Stirling and Richard L. Frost, Member, IEEE Abstract Social utilities account for agent preferences and, thus, can characterize complex interrelationships, such as cooperation, compromise, negotiation, and altruism, that can exist between agents. Satisficing game theory, which is based on social utilities, offers a framework within which to design sophisticated multiagent systems. Key features of this approach are: a) an -agent system may be represented by a -dimensional Bayesian network, called a praxeic network; b) the theory accommodates a notion of situational altruism (a willingness to defer to others in a controlled way if so doing would actually benefit others under the condition that others wish to take advantage of such largesse); and c) satisficing games admits a protocol for effective negotiation between agents who, though interested in their own welfare, are also willing to give some deference to others. Three applications are presented. The first two involve well-known two-person games: the Prisoner s Dilemma and the Battle of the Sexes, and the third is a simulated uninhabited aerial vehicle scenario. Index Terms Altruism, Bayesian networks, decision making, distributed control, intelligent systems, multiagent systems, negotiations, satisficing games. I. INTRODUCTION METHODOLOGIES for the design of effective multiagent systems should account for sophisticated agent behaviors such as cooperation, compromise, negotiation, and altruism. Conventional decision-making methodologies attempt to optimize individual performance (even if only approximately) and have a limited capability to account for complex social behavior. Essentially, this is because optimization is an individual activity that is based on the doctrine that each individual is committed to maximizing its own satisfaction without concern for the welfare of others. Group interests are generally not optimized and perhaps not even well served if each individual optimizes its own behavior (see [1] for a detailed discussion). Designs that put final confidence in the limited perspective of pure self-interest may ultimately function disjunctively, and perhaps illogically, when participating in collective activities that would more naturally be expressions of sophisticated social behavior. Even though cooperative behavior may be desired of a multiagent system, it would be rare indeed for the interests of all agents to be perfectly aligned. Some form of conflict, even if minor, is likely to be present. Decision makers must have the capability to give deference to others and compromise, but they must not be required to capitulate and unconditionally surrender their interests in all situations. They must be able to negotiate ef- Manuscript received June 4, 2004; revised August 27, 2004. This paper was recommended by Guest Editor C. Malmborg. The authors are with the Electrical and Computing Engineering Department, Brigham Young University, Provo, UT 84602 USA (e-mail: wynn@ee.byu.edu; rickf@ee.byu.edu). Digital Object Identifier 10.1109/TSMCC.2004.843200 fectively without either undue intransigence or overeagerness to defer. These requirements call for a decision-making methodology that allows each member of a decision-making society to maintain its own interests while, at the same time, yielding some deference, in a controlled way, to others as the situation warrants. Such behavior requires each individual to seek a social balance between its individual interests and the interests of others. Achieving social balance requires a concept of rational behavior that is more flexible and accommodating than optimization which, by its very structure, is rigid and intransigent. In [2], an alternative notion of rational behavior is presented, termed satisficing rationality, and a summary of this approach is given in [1]. Satisficing serves as an alternative to the more familiar notion of optimization. Just as individual rationality is formalized in multiagent settings by von Neumann Morgenstern (vN-M) game theory, satisficing rationality is formalized in multiagent settings by satisficing game theory. This theory provides a mechanism by which cooperative social systems may be synthesized according to a systematic concept of rational behavior that involves social utilities; that is, utilities that account for the interests of others as well as of the self. Three main issues must be addressed in order to establish credibility of the social utilities approach: i) we must develop a systematic way to obtain a solution once the model has been formulated; ii) we must test its performance and compare it with results obtained by conventional means; and, ultimately iii) we must implement and evaluate performance in real-world situations. In this paper, we address the first two of these criteria. Following a brief summary of satisficing game theory, we show how it can be formulated as a praxeic network, similar in structure to a Bayesian network, and discuss the relevant computational issues.We next describe two important aspects of sophisticated social behavior; namely, situational altruism and negotiation, and indicate how these concepts can be modeled with social utility theory.We then provide three applications of this new concept of multiagent decision theory. The first two comprise satisficing solutions to two well-known mixed-motive games: Prisoner s Dilemma and the Battle of the Sexes. The Prisoner s Dilemma characterizes situations where cooperation is possible, but the players must guard against exploitation. The Battle of the Sexes is a coordination game where either both players win or both lose. The third application is an analysis of a simulated multiagent uninhabited aerial vehicle (UAV) scenario, in which we analyze and compare the performance of both satisficing and optimal decision methodologies. II. SATISFICING DECISION THEORY A. Interdependence A multiagent system is a set of players, denoted , each with its corresponding finite op- 1094-6977/$20.00 2005 IEEE Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. 534 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 tion set . The vectors , where , constitute a joint option. The product space is the joint option space. Satisficing game theory [2] is a new approach to multiagent decision making that accounts simultaneously for both group and individual interests. It does this by replacing optimization as the ideal of performance in favor of a notion of adequacy as the ideal. A decision is good enough, or satisficing, if the benefits from adopting it are at least as great as the costs. To form a precise definition of satisficing, we view each decision maker as consisting of two personas. The selectability persona for , denoted , views the options available to it exclusively in terms of achieving the goal, while the rejectability persona for , denoted , views the options exclusively in terms of the consumption of resources. An option is satisficing if the selectability persona s support for it exceeds the rejectability persona s objections. To evaluate these degrees of support, we must define utility functions for the two personas. For each , let be a selectability function characterizing the relative degree of support that associates with the elements of , and let be a rejectability function characterizing the degree of opposition (e.g., the consumption of resources) that associates with the elements of . These functions constitute dual utilities which make possible intrinsic, or intraoption comparisons of the attributes of each option in contrast to the usual extrinsic (interoption) comparisons that single (vN-M) utilities afford. To emphasize this distinction, we will refer to and as social utilities. As developed in [1], normalizing the social utilities to be mass functions makes it possible to express them as marginals of joint selectability and rejectability functions, which provide evaluations of the relative degrees of achieving a group goal and consuming group resources for each option vector . That is (1) (2) where and are the groups of agents considered exclusively in terms of joint selectability and joint rejectability, respectively. Since the rejectability or selectability of one or more agents may affect the preferences for rejection or selection of another agent or agents, we must view the joint selectability and joint rejectability functions as marginals of a more complex mass function that accounts for all relevant interrelationships that exist among the members of the multiagent system. That is (3) (4) where is called the interdependence function. Fig. 1. Praxeic network for a three-agent system. B. Constructing the Interdependence Function The specification of the interdependence function is the key issue when modeling a multiagent system, and constructing this function is the first order of business of satisficing decision theory. In [1], it is shown that the properties of probability mass functions, such as conditioning and independence, also apply to social utility functions. These properties, coupled with the chain rule, provide a convenient and economical way to construct the interdependence function as the product of conditional and marginal mass functions. This construction permits a multiagent system to be viewed from the perspective of graph theory, and yields a structure that is similar to a Bayesian network [3] [5]. We may employ the tools of graph theory to express a multiagent system as a directed acyclic graph (DAG). To distinguish between the usual probabilistic application and our context, we will refer to such networks as praxeic networks. A praxeic network for an -agent system consists of nodes, with each participant having two nodes associated with it one for its selectability persona and one for its rejectability persona. The variables associated with these nodes are the options available to the decision maker and the edges represent the influence that one persona has on another persona. These linkages consist of conditional selectability or conditional rejectability functions. Consider the graph displayed in Fig. 1, which corresponds to a three-agent system whose interdependence function is given by (5) Notice that because of the structure of the interdependence function, persona does not influence, and is not influenced by, any other personas. There are two major distinctions between Bayesian networks and praxeic networks. First, whereas Bayesian networks deal with the epistemological problem of what to believe, praxeic networks deal with the praxeic problem of how to act. Second, whereas Bayesian networks deal with one notion of probability a measure of the degree of belief praxeic networks deal with two notions: selectability, which characterizes the degree of effectiveness, and rejectability, which characterizes the degree of inefficiency. Whereas Bayesian networks define connections between random variables, praxeic networks define connections between agent personas. With Bayesian networks, the vertices correspond to the values that the random variable can assume, and the edges represent the flows of belief influence between vertices as characterized by conditional probability mass functions. By contrast, the vertices of a praxeic network Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. STIRLING AND FROST: SOCIAL UTILITY FUNCTIONS PART II: APPLICATIONS 535 represent the sets of actions that are possible for the agent personas, and the edges characterize the flows of preference influence between the personas as characterized by conditional selectability or rejectability mass functions. Although Bayesian networks have been used in the multiagent context [6], [7], they are used to model the flow of information in the epistemic context as characterized by conditional probability mass functions, rather than to model the flow of praxeic influence as characterized by conditional social utility functions (mathematically, or course, there is no difference between the two concepts). C. Satisficing Games A satisficing game is a triple . To solve this game, we first compute joint selectability and joint rejectability functions according to (3) and (4), and then the individual selectability and rejectability functions according to (1) and (2). These functions may be obtained by invoking, for example, Pearl s Belief Propagation Algorithm [3] or the Sum Product rule for factor graphs [8]. The jointly satisficing solution at caution level of a satisficing game is the subset of all option vectors such that the joint selectability is at least as great as the caution level times the joint rejectability, that is The individually satisficing solutions for each agent are obtained from the marginal selectability and rejectability functions, yielding the individually satisficing solutions The jointly satisficing set comprises all joint options that provide a group benefit that exceeds (as modulated by ) the cost to the group, and the individually satisficing sets comprise all individual options that provide an individual benefit that exceeds (as modulated by ) the cost to the individual. The satisficing rectangle is the product set of the individually satisficing sets, namely In general, the satisficing rectangle will not be the same as the jointly satisficing set; they may even be disjointed. However, the following theorem relates the two sets. If is individually satisficing for agent , that is, , then it must be the th element of some jointly satisficing vector . Proof: We will establish the contrapositive, namely, that if is not the th element of any , then . Without loss of generality, let . By hypothesis, for all , so , hence . Thus, if an option vector is individually satisficing, it is part of a jointly satisficing vector, although it need not be part of all jointly satisficing vectors. The content of this theorem is that no one is ever completely frozen out of a deal every decision maker has, from its own perspective, a seat at the negotiating table. This condition is perhaps the weakest condition under which negotiations are possible. Setting grants equal weight to effectiveness and efficiency and ensures that the satisficing sets are not empty. In practice, can be viewed as a negotiation parameter. Reducing , increases the size of the satisficing sets and permits the participants to lower their standards in a controlled way to reach a compromise. III. SOCIAL BALANCE A well-balanced individual must be able to reconcile its own interests with the interests of the group. While, on the one hand, it must be able to accommodate the interests of others, on the other hand, it must not be required to abandon its own interests completely. Thus, even though it may possess a propensity to cooperate, it must be able to control the degree of accommodation it is willing to offer others. Achieving this kind of balance requires a sophisticated notion of altruism and a well-defined protocol for negotiation. A. Altruism Altruism literally means unselfishness. Typically, this is taken in the positive sense, whereby an agent is willing to sacrifice in order to benefit another, but one could also consider negative altruism, whereby a malevolent agent is willing to sacrifice in order to injure another. In either case, the essence of the concept is that an altruistic agent takes into consideration the preferences of others when defining its preferences. Conventional notions of altruism [9] do not distinguish between the state of actually relinquishing one s own self-interest and the state of being willing to relinquish one s own self-interest under the appropriate circumstances. To relinquish unconditionally one s own self-interest is a condition of categorical altruism a decision maker unconditionally modifies its preferences to accommodate the preferences of others in all circumstances. A purely altruistic player would completely replace its preferences with the preferences of others. By contrast, a state of being willing to modify one s preferences to accommodate others if the need arises is a state of situational altruism. Here, a decision maker is willing to accommodate, at least to some degree, the preferences of others in lieu of its own preferences if, and only if, the other wishes to take advantage of the offered largesse. Otherwise, the agent would be governed by its own preferences and would avoid needless sacrifice. Situational altruism is a sophisticated form of unselfish behavior that is not easily modeled with vN-M utility theory. Essentially, this is because vN-M utilities are functions of the actions of the players. On the other hand, social utilities are functions of the preferences of the players and, therefore, are more suited to characterize sophisticated notions of social behavior such as situational altruism. To illustrate, consider the persona set , with s options considered in terms of selectability and s options considered in terms of rejectability. The corresponding interdependence function may be factored to become . Each option generates a different conditional rejectability function ; that is, for all and . Now, suppose that by not adopting a certain option, say , would altruistically benefit Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. 536 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 if it were to favor one of its options, say , but if were to favor , then would act egoistically. accommodate this situation by the following conditional selectability structure: if and if and if (6) where is the number of options available to , and is s rejectability based solely on its own interests without taking into consideration s desires. Notice that can specify this conditional rejectability without knowing s actual selectability structure it is a purely hypothetical consideration. An altruistic would set , resulting in if if (7) In this way, exhibits situational altruism; it is willing to accommodate by rejecting if, but only if, doing so is critical to s welfare. For example, suppose and are to purchase an automobile. Selectability might comprise attributes such as performance, styling, and reliability, and rejectability might comprise attributes such as the purchase price and operating costs. Suppose would be willing to place high rejectability on paying less than (i.e., lowrejectability for paying more than ) for a vehicle if and only if were to place high selectability on style . Thus, displays a willingness to indulge without categorically relinquishing its individual preference. B. Negotiations Expanding the spheres of interest of the agents to consideration of the interests of others as well as oneself provides an environment that is conducive to negotiations. The negotiation theorem presented in Section II-C provides an environment where the interests of all participants are represented, in that no one is excluded from making deals. This structure invites the definition of a very simple negotiation protocol that would be suitable for a collection of decision makers each of who, in addition to being intent upon pursuing its own self-interest, is willing to give some deference to others. Such an agent would be inclined to compromise by giving up some of its assured benefit if so doing would offer significant benefit to others. Such a community could effectively negotiate for the benefit of the group. The following corollary to the Negotiation Theorem motivates a very simple negotiation procedure for such a community. Corollary 1: There exists an index of caution profile such that . The proof of this corollary is immediate: as , every option becomes satisficing, that is, . Let denote the index of caution for , let denote s individually satisficing set, and let denote s version of the jointly satisficing set. By decreasing incrementally, increases its set of satisficing decisions, thereby gradually lowering its standards of acceptable performance. It is thus able to control the amount of deference it is willing to give to others in an attempt to reach a compromise. Each is also able to specify a lower value, , which sets a limit on the degree of compromise it is willing to undergo. It is reasonable to assume that the standard of acceptance for the group can be no higher than the minimum standard for any individuals, and we may accordingly specify the index of caution of the group as . For each value of , forms its compromise set as Notice that it is not necessary for all agents to have the same interdependence function. is the set of all joint actions such that, from the point of view of , are jointly satisficing (that is, that are good enough for everyone). It is easy to see that so long as all indices of caution are less than or equal to unity. The global compromise set is the intersection of all compromise sets: . A compromise will be reached if . If no compromise is reached before all s have reached their limits, then an impasse exists and negotiations will be broken off. If is singleton, then each implements the th component of the unique element of . If contains more than one element, then the agents must undergo another round of negotiation to ensure that they all choose the same jointly satisficing option. If the agents all use the same interdependence function model, then they may choose a tie breaker such as the member of the global compromise set that maximizes the benefit to the group: . If they do not all use the same interdependence function, they may adopt some other protocol as a tie-breaker. For example, they could agree beforehand on a randomizing procedure to choose a designated tie-breaking agent, who would pick one of the members of for all to use. By its very nature, negotiations cannot force a compromise. If the interests of the agents are deeply contradictory, or if the agents are rigid and intransigent, then attempts to negotiate may not be fruitful, even if the consequence of failing to compromise is severe. On the other hand, if agents are flexible and accommodating, then negotiation may be conducive to success. IV. APPLICATION 1: PRISONER S DILEMMA We now apply satisficing game theory to the Prisoner s Dilemma game. This game comprises two players, and , who may either cooperate ( ) or defect ( ) (see, for example, [1] for details). (To review conventional solutions to the Prisoner s Dilemma game, refer, for example, to [10] and [11].) Our task is to define the interdependence function, from which the selectability and rejectability functions can be obtained and compared for each joint option. To generate the interdependence function, we must first define operational notions for selectability and rejectability. Suppose we view the players as individuals who are concerned primarily with their own welfare but, at the same time, have a degree of consideration for other player s difficulties and consider it a cost to them if an action they take makes it difficult for others. Although there Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. STIRLING AND FROST: SOCIAL UTILITY FUNCTIONS PART II: APPLICATIONS 537 are many ways to frame this problem, we adopt the procedure of associating selectability with reducing individual jail time and associating rejectability with increasing group jail time. Thus, short individual sentences will have high individual selectability, and long group sentences will have high joint rejectability. The notion of group interest can have significance only if there is some social relationship between the players. There are (at least) two social desiderata that may affect their decisions: a) a propensity for dissociation, that is, for each player to behave without regard for the welfare of the other, and b) a propensity for vulnerability; that is, for the players to expose themselves to individual risk in the hope of improving the joint outcome. Let be a measure of the joint value the players place on rejecting the joint option . We may identify dissociation index: if , the players are completely dissociated and cooperation is unlikely. Also, let be a measure of the joint value placed on rejecting the joint option . may be viewed as a vulnerability index: means the players are each willing to risk receiving a long jail sentence in the hopes of both obtaining a shorter one. A condition of high dissociation and high vulnerability would indicate lack of concern for the welfare of the other player while, at the same time, implying a willingness to expose oneself to dire consequences. We may prohibit this inconsistent situation by imposing the constraint that . If, for example, and , then self-interest is the only consideration. If, however, , then the players are willing to assume high risk to achieve cooperation. The case is a condition of close association coupled with a strong desire for invulnerability. This is a condition of intransigence, which may lead to dysfunctional behavior. To avoid such situations, we also impose the somewhat arbitrary constraint that . In general, each player would have its own set of these indices, which need not be the same. To simplify our presentation, however, we assume that and are common knowledge. Our task is to define the interdependence function. Let . We begin by examining the criteria for rejectability. Assuming that the consequences to the players are symmetric, that is, the outcomes are independent of the identity of the players, then the joint rejectability of should be equal to the joint rejectability of . Furthermore, the degree to which the players would jointly prefer to reject mutual cooperation would be proportional to the index of dissociation, and the degree to which they would prefer to reject mutual defection would be proportional to the index of vulnerability.With these assumptions and the constraints on and , we define the joint rejectability function (8) The constraint that ensures that preferring to reject the opposing actions ( or ) will always be at least as rejectable as the minimum of preferring to reject similar behavior ( or ); thus, dysfunctional behavior is always more rejectable than coordinated behavior. TABLE I CONDITIONAL SELECTABILITY FOR THE PRISONER S DILEMMA GAME We now consider selectability. Although selectability deals with individual objectives, it is a joint consideration since the consequences of the players decisions are not independent and, thus, preferences cannot be independent. A convenient way to express this dependency is to compute joint selectability conditioned on joint rejectability for all pairs and . The interdependence function may then be obtained by the product rule (9) The function characterizes the selectability of the joint option that the players jointly place all of their rejectability on . We may compute the conditional selectability by invoking straightforward and intuitive rules of the form: If and jointly prefer to reject , then they should jointly prefer to select . Let s say , that is, the players jointly prefer to reject cooperation. Given this situation, it is obvious that the preferred joint option is for both to defect. We may encode this rule into the conditional selectability function by placing all of the selectability on the joint option , that is, . If exactly one player prefers to reject cooperating, then it is obvious that, in this case as well, the preferred joint option is for both to defect; consequently, we set and . We complete this development by noting that if both players prefer to reject defecting, then . Table I summarizes the structure of this conditional selectability function. Substituting the conditional selectability function given by Table I and the joint rejectability given by (8) and (9) yields the interdependence function, whose values are provided in Table II. The praxeic network for this system is given in Fig. 2. Parameterized by and , the jointly satisficing set is, for for for , for , (10) These regions are depicted in Fig. 3(a). The joint satisficing set coincides with the Pareto optimal solution when the vulnerability index is at least as large as 1/2. It coincides with the Nash solution when the vulnerability index is less than the dissociation index. If the vulnerability index is greater than dissociation index but less than 1/2, then the joint satisficing set contains both and . To take action in this situation Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. 538 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 TABLE II INTERDEPENDENCE FUNCTION FOR THE PRISONER S DILEMMA GAME Fig. 2. Praxeic network for the Prisoner s Dilemma. Fig. 3. Satisficing decision regions for the Prisoner s Dilemma game: (a) joint decisions and (b) individual decisions. requires the invocation of a tie-breaker. For example, is the satisficing option placing higher emphasis on individual interest (higher selectability), and is the satisficing option placing higher emphasis on group-interest (lower rejectability). The individually satisficing set for either player is for , for , for , The individual decision regions are illustrated in Fig. 3(b). Note that the set is a singleton except in the special situation of . The Nash equilibrium solution emerges as a special case (e.g., and ), but the satisficing solution gives the solution for all admissible pairs. Notice that the unilateral, or individually satisficing decisions are compatible with the bilateral, or group satisficing decisions. Either both players will defect or both will cooperate as determined by the values of the parameters. V. APPLICATION 2: BATTLE OF THE SEXES A simple example that is a prototype for complex social relationships where there is a strong incentive to cooperate, yet the players do not agree on the right course of action, is the so-called Battle of the Sexes game. The usual story-line is as follows (for example, see [12] and [13]): two players and plan to meet for a social function. prefers the ballet ( ), while prefers the dog races ( ). Each also prefers to be with the other, however, regardless of venue. This game serves as a model for economic scenarios where two firms are to choose between com- TABLE III ORDINAL VN-M PAYOFF MATRIX FOR THE BATTLE OF THE SEXES GAME peting standards. Although each has its own preference, both would sell more products if they were to adopt a common standard. This is a coordination game [14]; either both players win or both lose. It is not a game of pure coordination, however, since the interests of the players do not perfectly coincide and there is an opportunity for conflict and, hence, exploitation. The vN-M version of this game is to form a payoff matrix by juxtaposing the utility functions as given in Table III. Under this formulation, there are two Nash equilibria, namely and . Since the Nash solution is not uniquely optimal, this solution concept does not resolve the issue. A standard game-theoretic way to deal with this problem is to define a coordinated equilibrium by having each player randomize its decision according to a joint probability distribution [15], but this approach is problematic, at best, for one-off play. We may also address this game with satisficing methodology by constructing an interdependence function. Let us take selectability as the two players being with each other, regardless of where they go, and let rejectability correspond to the costs of being at a particular function. would prefer if he did not take into consideration s preferences; similarly, would prefer . Thus, we may express the myopic rejectabilities for and in terms of parameters and , respectively, as where is s rejectability of and is s rejectability of . The closer is to zero, the more is adverse to analogous interpretation for with respect to attending . We assume that and are to be consistent with the stereotypes. Since being together is a joint, rather than an individual objective, it is difficult to form unilateral assessments of selectability, but it is possible to characterize individually the conditional selectability. One way to do this is to specify conditional mass functions and , that is, s selectability conditioned on s rejectability and s selectability conditioned on s rejectability. If strongly desired to reject , may account for this, if he cares about s feelings, by placing some portion of his conditional selectability mass on . may construct her conditional selectability in a similar way, yielding The parameters and determine the amount of deference one player gives the other. If were to place all of her rejectability mass on , then may defer to s strong dislike Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. STIRLING AND FROST: SOCIAL UTILITY FUNCTIONS PART II: APPLICATIONS 539 Fig. 4. Praxeic network for the Battle of the Sexes. of by placing of his selectability mass on . Similarly, could show a conditional preference for if were to reject strongly. With these conditional and marginal functions, we may construct the interdependence function according to the chain rule as where we have assumed that s selectability conditioned on s rejectability is dependent only on s rejectability, that s selectability conditioned on s rejectability is dependent only on s rejectability, and that the myopic rejectability values of and are independent. The praxeic network for this example is given in Fig. 4. With vN-M game theory, if each player is categorically altruistic (i.e., goes to in an attempt to please while goes to in an attempt to please ), the result is that both players receive theirworst payoffs.With satisficing game theory, however, the ability to be situationally altruistic eliminates this paradox. To illustrate, we shall assume that both players are maximally conditionally deferential and set (in principle, however, they may assume any values in [0, 1]). Even in this most deferential case, these conditional preferences do not commit one to unconditional abdication of his or her own unilateral preferences. still myopically (that is, without taking into consideration) prefers , and still myopically prefers , and there is no intimation that either participant must throw the game in order to accommodate the other. Setting the index of caution equal to unity, we obtain the jointly satisficing set as for for for the individually satisficing sets are for for for and the satisficing rectangle is for for for Thus, we see that the individually satisficing decisions are compatible with the group satisficing decisions. Even though the decisions are individual, they are obtained in a way that ensures compatibility with the interests of the group. If s aversion to is less than s aversion to , then both players will go to s preference, namely, , and conversely. Under vN-M game theory, if each player unconditionally defers to the other, the result is disastrous for both. By contrast, with the satisficing approach, even though both players are maximally conditionally deferential, the satisficing solution results in a natural cooperative strategy that is socially defensible. VI. APPLICATION 3: UAVS Consider a system of three autonomous UAVs (i.e., ). The group is required to engage targets (either for reconnaissance or combat) while avoiding unnecessary exposure to hazard, collisions, and loss of communications. Each vehicle is able to detect all targets and hazards within its field of view, and each may communicate with its nearest neighbors, provided they are within range. Each UAV acts autonomously, but all are jointly responsible to achieve high performance and satisfy the constraints. A. Simulation Model To simplify this coordination problem to its bare essentials, we assume the following. C-1 The field of action consists of a grid divided into cells such that each target and each hazard is contained in one and only one cell. No cell may contain both a target and a hazard. C-2 The vehicles fly at constant forward velocity but variable lateral velocity in a three-abreast formation. The forward velocity is cell per time unit. The lateral velocity is drawn from the set cells per time unit, where negative signifies a move ahead and to the left, zero a move straight ahead, and positive a move ahead and to the right. Each cell may be occupied by, at most, one vehicle. C-3 Each vehicle is able to detect all targets and hazards within a distance of cells in the forward direction from their current cell, with unlimited lateral detection. C-4 If a vehicle enters a cell that contains a target, the group scores one point. C-5 If a vehicle enters a cell that contains a hazard, the group loses one point. C-6 Direct communication may occur between two adjacent vehicles only if they are within lateral cells of each other. C-7 Indirect or relayed communication may occur between two nonadjacent vehicles if, and only if, a third vehicle can communicate directly with each of them. C-8 Communicating vehicles may provide each other with: a) their positions, b) their current and anticipated action commands, and c) negotiation proposals with respect to current and anticipated commands. Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. 540 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 C-9 If vehicles cannot communicate either directly or indirectly, they cannot know the positions or the actions (either instantiated or anticipated) of each other. C-10 Vehicles may not collide, that is, they may not approach closer to each other than adjacent cells. In evaluating our proposed methodology, we adopt, as a baseline, the jointly optimal solution. We then consider two satisficing system designs. The first provides a globally satisficing solution somewhat analogous to a globally optimal solution. The second approach illustrates how the satisficing methodology can be adapted to situations involving heterogeneous agents with partial knowledge or hierarchical structures. Together, the two satisficing solutions illustrate the flexibility of the satisficing design space and provide some idea of the tradeoffs between structure, complexity, and performance. Note that all solutions assume a limited horizon, consistent with our earlier assumption of a limited forward field of view. Let us designate and order the positions of the vehicles as for the left-most vehicle, for the center vehicle, and for the right-most vehicle. B. Optimal Solution To obtain the optimal solution to the limited-horizon decision problem, we must compute the total target-minus-hazard count for each of the possible paths available to each agent, subject to the constraints. For a horizon of depth , there is a total of available for each agent. This defines the reachable cone. Thus, if each of three agents were to work strictly in isolation, we would require a total of calculations. However, to satisfy the constraints of avoiding collisions and loss of communication, we must calculate the costs for all joint paths between agents and and between and . Note that since and cannot violate constraints without consideration of , it is not necessary to compute the costs of the joint paths between , , and simultaneously. The total number of arithmetic operations that must be performed to compute the jointly optimal solution is on the order of where is the cardinality of . For and , the number of operations is 1458. Notice that this count is exponential in the horizon depth. C. Satisficing Solutions The interdependence function for our hypothetical UAV system is a function of six variables, which we denote by , where denotes an option available to (that is, the option that might instantiate in the interest of its selectability) and denotes an option available to (that is, the option that might avoid instantiating in the interest of its rejectability). Our satisficing approach requires each agent to derive from a set of jointly satisficing decisions, that is, decisions that, from its perspective, would be satisficing for all agents. From this set of multipartite decision vectors, each player must determine which individual decision it should make. We assume that all agents are operating with the same model. The interdependence function characterizes all of the relationships that exist between the six elements of the praxeic system. Once it is specified, it is straightforward to obtain the satisficing joint and individual solutions. Its specification, however, may be extremely complex. At each moment of (discrete) time, each vehicle has three possible options at its disposal, and since each option must be considered in terms of both its selectability and its rejectability, the interdependence function has six arguments, each of which may assume any of three values, yielding a total of specifications. For many applications, however, we can reduce the number of independent parameters by exploiting influence linkages between agents; we may compose the interdependence function by identifying the appropriate influence linkages. 1) Global Satisficing Structure: For this case, we adopt the following operational definitions for the preferences for the options available to each vehicle. The criterion for rejectability is to avoid hazards. The criterion for selectability is to seek targets while avoiding collisions or communication loss. For this problem, we assume that the joint rejectability criterion is independent of the joint selectability criterion. Thus, we may factor the interdependence function into the product of the joint selectability and joint rejectability functions as (11) If constraint violations are not imminent, then each agent operates with its own marginal selectability and rejectability functions, which are determined on the basis of the number of hazards and targets that can be encountered, respectively, for each option. If a constraint between two agents is imminent, then a joint selectability function is computed such that the constraints are honored between the two affected agents, while the third agent is free to operate without constraint. If a constraint violation involving all three agents is imminent, then a joint selectability function is computed for all three. The computational burden for computing the marginal selectability and rejectability functions depends on the way the interdependence function is factored. If no factorization is employed, the total number of operations would be on the order of ; however, by factoring the interdependence function as indicated in (11), only operations are required per agent. Computing the jointly satisficing set requires an additional operations per agent, with operations per agent required to compute the individually satisficing set. Thus, the total number of operations per agent to compute the satisficing sets for , , and is , which is much less than that for the optimal solution. Notice that this complexity is quadratic, rather than exponential, in the horizon depth. 2) Markov Satisficing Structure: For this case, the definitions of the utility functions are modified as follows: The rejectability of a move is determined by whether it leads to: a) a hazard, b) a loss of communication, or c) a collision. Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. STIRLING AND FROST: SOCIAL UTILITY FUNCTIONS PART II: APPLICATIONS 541 The selectability of a move is determined solely by whether it leads to a target. More important, we wish to consider the effects of a Markovian constraint on agent interaction. In particular, we make the selectability and rejectability of a move contemplated by to be independent of the deliberations of , conditioned on knowledge of , and vice versa. For vehicle , let us view the variable as viewed in terms of selectability, and as in terms of rejectability. There are many ways to approach the design of this system. As a general rule of thumb, it is reasonable to consider first those attributes of the problem that are the most critical.We adopt here a conservative stance and assume that the rejectability attributes are more critical to the overall success of the mission than selectability. This is appropriate since we assume that the number of hazards is greater than the number of targets. First consider rejectability. Clearly, each agent must place high rejectability on moves that would take it into cells occupied by hazards and it must also avoid loss of communication and collisions. Because of the relative positioning of the vehicles, it is clear that cannot simultaneously avoid losing communication with both ; similarly, it cannot simultaneously avoid collisions with these two agents. Thus, primary responsibility for these two rejectability criteria must fall to the wing agents. In other words, in addition to avoiding hazards, they must adjust their rejectability values to accommodate the actions of . Thus, there is a natural flow of influence from to both and Let us now consider selectability. This criterion is concerned only with attacking targets. Typically, the opportunities to hit targets, especially with a sparse target scenario, will not be the same for all vehicles. As each agent looks ahead in its reachable cone over the next moves, it can calculate the number of hits available to it. Let us designate the agent with the greatest number of possible hits as (for the current time only) the primary vehicle. The vehicle with the next greatest number of possible hits, given that the primary vehicle pursues its targets in an unconstrained manner, is the secondary vehicle. For the sake of illustration, let us assume that is primary and is secondary, and is tertiary. Accordingly, we will define a (temporally local) hierarchy which gives first priority in selecting its targets, and gives second priority. Such a hierarchy is somewhat heuristic since if the sum of the hits possible for the other two vehicles, acting together unconstrained, may result in a total number of hits that exceeds what would otherwise be possible. Nevertheless, in the interest of simplicity, we adopt this as our design policy. Our next step is to account for the interdependencies between rejectability and selectability. We will assume that rejectability and selectability for a single agent are independent; hence, there are no direct links between one agent s selectability persona and its rejectability persona. There may, however, be influence flows between selectability and rejectability between different agent personas. Let us consider how may be influenced by and . If, for example, were to reject a move to the right, then a move to the right by could be detrimental to if it could result in a loss of communication. Consequently, it would be prudent for to defer conditionally to by being willing to Fig. 5. Praxeic network for the hierarchy UAV game. reduce its selectability for moving to the right, even if hits are sacrificed. This is a manifestation of situational altruism (a willingness to defer if so doing would actually benefit the other), as opposed to categorical altruism (unconditional abdication of one s individual interest in order to benefit another, regardless of whether the other would actually benefit). Thus, there is a natural flow of influence from to . By a similar argument, there is also a flow of influence from to . The overall flow of influence is thus given by Fig. 5. The flows of influence for other hierarchies are defined similarly. A total of six different flow configurations would be possible. An important feature of this structure is that the system may be reconfigured dynamically as the relative positions of the agents in their environment evolves. Fig. 5 is a praxeic network corresponding to the factorization of the interdependence function as The structure of these conditional interdependencies is a function of the relative positions of the agents. If neither communication loss nor collisions is imminent, then the conditional rejectabilities degenerate to the marginal ones, which are determined as a function of the minimum number of hazards that an agent could encounter for each option. If a particular option were to violate a constraint, then the conditional rejectability would be adjusted to reject that option under the appropriate circumstances. Concerning selectability, if the reachable cones for each of the agents do not intersect, then the joint selectability is the product of the individual selectability marginals. If the cones do intersect, then priority is given to the primary agent and then to the secondary agent, with selectability being proportional to the maximum number of targets that are reachable with each option. Once the joint and marginal selectability and rejectability function values are computed, the joint and marginal satisficing sets may be constructed and a compromise solution identified. If, for the initial value of the negotiation index the compromise set is empty, then may be iteratively lowered until the compromise set is not empty. In this way, the individual decisions are compatible with group interests, although at the possible expense of some individual satisfaction as the players compromise in order to reach a decision that satisfies both group and individual interests. Notice that this negotiation process does Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. 542 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 4, NOVEMBER 2005 Fig. 6. Simulated UAV trajectories: (a) Satisficing design. (b) Optimal design. not require the recalculation of the selectability and rejectability functions; hence, it is computationally very inexpensive. D. Simulation Results Fig. 6 presents typical simulation results for a three-agent UAVsystem operating over ten time intervals and with a horizon depth of . The symbols and denote hazards and targets, respectively. Hazards and targets are generated randomly, with the probability of a hazard 0.7 and the probability of target being 0.1. The three line segments represent the trajectories of the three agents. Time flows from the bottom to the top of the figure along the ordinate, and the abscissa represent the lateral positions of the vehicles. Fig. 6(a) illustrates the satisficing trajectory, and Fig. 6(b) illustrates the optimal trajectory. Note that the hazard/target environment is the same. Table IV displays the results from 100 Monte Carlo simulations for the TABLE IV MONTE CARLO RESULTS: GLOBAL STRUCTURE, MARKOV STRUCTURE, AND OPTIMAL SOLUTIONS FOR HORIZON DEPTH TABLE V MONTE CARLO RESULTS: SATISFICING VERSUS OPTIMAL, Horizon Depth 2, 3, 4, AND 5 optimal solution and both satisficing solutions. The initial conditions as well as hazard and target locations are generated randomly for each trial. The score for each trial is computed as the number of targets encountered minus the number of hazards encountered for the three agents. The sample means and standard deviations for the scores of both the satisficing and optimal solutions are indicated in the table. Table V presents results for 100 Monte Carlo simulations for global structure and optimal solutions for horizon depths of 2, 3, 4, and 5. The difference between the satisficing and optimal averages is not statistically significant. These results indicate: a) performance of the global structure satisficing solution is essentially the same as the performance of the optimal solution; b) the performance of the Markov satisficing solution is slightly worse than the performance of the global structure solution. This result is due to the tighter mode of coordination between the agents. The computational burden for both satisficing concepts is significantly less (approximately an order of magnitude for a depth of ) than the computational burden for the optimal solution. VII. CONCLUSION Coordinated behavior is a difficult attribute to incorporate into a multiagent system. It is important to appreciate that coordination usually cannot be done without conflict, but conflict need not degenerate to competition, which can be destructive. Competition, however, is often a byproduct of optimization, whereby each participant in a multiagent endeavor seeks to achieve the best outcome for itself, regardless of the consequences to other participants or to the group. Satisficing game theory is not presented as a replacement for conventional decision-making methodologies in all situations. When individual rationality is an appropriate model, standard techniques will be adequate. The potential benefit of this alternative approach is that it provides a means to account for sophisticated social behavior. This theory is a significant departure from conventional methods of multiple-agent decision making. The major differences include: Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply. STIRLING AND FROST: SOCIAL UTILITY FUNCTIONS PART II: APPLICATIONS 543 replacement of asocial utility functions that express preferences of decision makers as functions of the actions of other decision makers with social utility functions that express preferences of decision makers as functions of the preferences of other decision makers; replacement of individual rationality (which does not accommodate compatible notions of group and individual interests) with satisficing rationality (which does accommodate notions of both group and individual interests); replacement of single vN-M utility functions with dual utility functions that separate the desirable and undesirable attributes of the options; replacement of unconditional utilities with conditional utilities which propagate through the system via the chain rule to create a joint interdependence function, thus allowing individual and group preferences to emerge as natural consequences of social interaction. The notable features of this approach are that it permits explicit modeling of situationally altruistic behavior; it provides a natural framework for negotiation; it may be solved using standard Bayesian network techniques such as Pearl s Belief Propagation Algorithm or the Sum-Product Algorithm for factor graphs; it serves as a systematic system design methodology that is consistent, exhaustive, and parsimonious. REFERENCES [1] W. C. Stirling, Social utility functions Part 1: Theory, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 4, pp. 522 532, Nov. 2005. [2] , Satisficing Games and Decision Making:With Applications to Engineering and Computer Science. Cambridge, U.K.: Cambridge Univ. Press, 2003. [3] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann, 1988. [4] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems. New York: Springer Verlag, 1999. [5] F. V. Jensen, Bayesian Networks and Decision Graphs. New York: Springer Verlag, 2001. [6] Y. Xiang, Probabilistic Reasoning in Multiagent Systems: A Graphical Models Approach. Cambridge, U.K.: Cambridge Univ. Press, 2002. [7] Y. Xiang and V. Lesser, On the role of multiply sectioned Bayesian networks for cooperative multiagent systems, IEEE Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 4, pp. 489 501, Jul. 2003. [8] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498 519, Feb. 2001. [9] M. Taylor, The Possibility of Cooperation. Cambridge, U.K.: Cambridge Univ. Press, 1987. [10] R. Axelrod, The Evolution of Cooperation. New York: Basic Books, 1984. [11] W. Poundstone, Prisoner s Dilemma. New York: Doubleday, 1992. [12] M. Bacharach, Economics and the Theory of Games. London, U.K.: Macmillan, 1976. [13] E. Rasmusen, Games and Information. Oxford, U.K.: Blackwell, 1989. [14] D. K. Lewis, Convention. Cambridge, MA: Harvard Univ. Press, 1969. [15] M. J. Osborne and A. Rubinstein, A Course in Game Theory. Cambridge, MA: MIT Press, 1994. Wynn C. Stirling received the B.A. degree (Hons.) in mathematics and the M.S. degree in electrical engineering from the University of Utah, Salt Lake City, in 1969 and 1971, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1983. From 1972 to 1975, he was with Rockwell International Corporation, Anaheim, CA, and from 1975 to 1984, he was with ESL, Inc., Sunnyvale, CA, where hewas responsible for the development of multivehicle trajectory reconstruction capabilities. He joined the faculty of the Department of Electrical and Computer Engineering at Brigham Young University, Provo, UT, in 1984, where he is a Professor. He is the author or coauthor of more than 70 publications. He is coauthor of a graduate-level text titled Mathematical Methods and Algorithms for Signal Processing (Englewood Cliffs, NJ: Prentice-Hall, 2000) and is the author of a monograph titled Satisficing Games and Decision Making: with Applications to Engineering and Computer Science (Cambridge, U.K.: Cambridge Univ. Press, 2003). His current research interests include multiagent decision theory, estimation theory, information theory, and stochastic processes. Dr. Stirling is a Member of Phi Beta Kappa and Tau Beta Pi. He has served on the program committees for conferences on imprecise probability theory and multiagent decision theory. Richard L. Frost (M 77) received the B.S. degree (Hons.) in physics in 1975, and the M.S.E.E. and Ph.D. degrees from the University of Utah, Salt Lake City, in 1977 and 1979, respectively. He joined the Massachusetts Institute of Technology s Lincoln Laboratory, Lexington, in 1979, then returned to the faculty of the University of Utah from 1981 to 1984. After three years with Sperry Corporation, he joined the Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT, in 1987, where he is currently an Associate Professor. His research interests include information theory, especially source coding and quantization, signal processing, and multiagent decision theory. Authorized licensed use limited to: Brigham Young University. Downloaded on February 5, 2009 at 15:33 from IEEE Xplore. Restrictions apply.