A Framework for the Comparison of Agent-based Models
We develop a methodology for comparing agent-based models that are developed for the same domain, but may differ in the data sets (e.g., geographical regions) to which they are applied, and in the structure of the model. Our approach is to learn a response surface in the common parameter space of the models and compare the regions corresponding to qualitatively different behaviors in the models. As an example, we develop an active learning algorithm to learn phase shift boundaries in contagion processes in order to compare two agent-based models of rooftop solar panel adoption developed for different regions. We present results for 2D and 3D subspaces of the parameter space, though the approach scales to higher dimensions as well.
Designing Empathic Virtual Agents: Manipulating Animation, Voice, Rendering, and Empathy to Create Persuasive Agents
Designers of virtual agents have a combinatorically large space of choices for the look and behavior of their characters. We conducted two between-subjects studies to explore the systematic manipulation of animation quality, speech quality, rendering style, and simulated empathy, and its impact on perceptions of virtual agents in terms of naturalness, engagement, trust, credibility, and persuasion within a health counseling domain. In the first study, animation was varied between manually created, procedural, or no animations; voice quality was varied between recorded audio and synthetic speech; and rendering style was varied between realistic and toon-shaded. In the second study, simulated empathy of the agent was varied between no empathy, verbal-only empathic responses, and full empathy involving verbal, facial, and immediacy feedback. Results show that natural animations and recorded voice are more appropriate for the agent's general acceptance, trust, credibility, and appropriateness for the task. However, for a brief health counseling task, animation might actually be distracting from the persuasive message, with the highest levels of persuasion found when the amount of agent animation is minimized. Further, consistent and high levels of empathy improve agent perception but may interfere with forming a trusting bond with the agent.
Toward Robust policy Summarization: Extended Abstract
AI agents are being developed to help people with high stakes decision-making processes from driving cars to prescribing drugs. It is therefore becoming increasingly important to develop "explainable AI" methods that help people understand the behavior of such agents. Summaries of agent policies can help human users anticipate agent behavior and facilitate more effective collaboration. Prior work has framed agent summarization as a machine teaching problem where examples of agent behavior are chosen to maximize reconstruction quality under the assumption that people do inverse reinforcement learning to infer an agent's policy from demonstrations. We compare summaries generated under this assumption to summaries generated under the assumption that people use imitation learning. We show through simulations that in some domains, there exist summaries that produce high-quality reconstructions under different models, but in other domains, only matching the summary extraction model to the reconstruction model produces high-quality reconstructions. These results highlight the importance of assuming correct computational models for how humans extrapolate from a summary, suggesting human-in-the-loop approaches to summary extraction.
A comparison of multiple behavior models in a simulation of the aftermath of an improvised nuclear detonation
We describe a large-scale simulation of the aftermath of a hypothetical 10kT improvised nuclear detonation at ground level, near the White House in Washington DC. We take a synthetic information approach, where multiple data sets are combined to construct a synthesized representation of the population of the region with accurate demographics, as well as four infrastructures: transportation, healthcare, communication, and power. In this article, we focus on the model of agents and their behavior, which is represented using the options framework. Six different behavioral options are modeled: household reconstitution, evacuation, healthcare-seeking, worry, shelter-seeking, and aiding & assisting others. Agent decision-making takes into account their health status, information about family members, information about the event, and their local environment. We combine these behavioral options into five different behavior models of increasing complexity and do a number of simulations to compare the models.
Stable roommates with narcissistic, single-peaked, and single-crossing preferences
The classical Stable Roommates problem is to decide whether there exists a matching of an even number of agents such that no two agents which are not matched to each other would prefer to be with each other rather than with their respectively assigned partners. We investigate Stable Roommates with complete (i.e., every agent can be matched with any other agent) or incomplete preferences, with ties (i.e., two agents are considered of equal value to some agent) or without ties. It is known that in general allowing ties makes the problem NP-complete. We provide algorithms for Stable Roommates that are, compared to those in the literature, more efficient when the input preferences are complete and have some structural property, such as being narcissistic, single-peaked, and single-crossing. However, when the preferences are incomplete and have ties, we show that being single-peaked and single-crossing does not reduce the computational complexity-Stable Roommates remains NP-complete.
Epistemic selection of costly alternatives: the case of participatory budgeting
We initiate the study of voting rules for participatory budgeting using the so-called epistemic approach, where one interprets votes as noisy reflections of some ground truth regarding the objectively best set of projects to fund. Using this approach, we first show that both the most studied rules in the literature and the most widely used rule in practice cannot be justified on epistemic grounds: they cannot be interpreted as maximum likelihood estimators, whatever assumptions we make about the accuracy of voters. Focusing then on welfare-maximising rules, we obtain both positive and negative results regarding epistemic guarantees.
Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning
Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in Castellini et al. (Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS'19.International Foundation for Autonomous Agents and Multiagent Systems, pp 1862-1864, 2019) and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.
Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making
We propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game's controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.
Online revenue maximization for server pricing
Efficient and truthful mechanisms to price resources on servers/machines have been the subject of much work in recent years due to the importance of the cloud market. This paper considers revenue maximization in the online stochastic setting with non-preemptive jobs and a unit capacity server. One agent/job arrives at every time step, with parameters drawn from the underlying distribution. We design a posted-price mechanism which can be efficiently computed and is revenue-optimal in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent's type, and the computed pricing scheme is deterministic, depending only on the length of the allotted time interval and on the earliest time the server is available. We also prove that the proposed pricing strategy is robust to imprecise knowledge of the job distribution and that a distribution learned from polynomially many samples is sufficient to obtain a near-optimal truthful pricing strategy.
Modelling and verification of reconfigurable multi-agent systems
We propose a formalism to model and reason about reconfigurable multi-agent systems. In our formalism, agents interact and communicate in different modes so that they can pursue joint tasks; agents may dynamically synchronize, exchange data, adapt their behaviour, and reconfigure their communication interfaces. Inspired by existing multi-robot systems, we represent a system as a set of agents (each with local state), executing independently and only influence each other by means of message exchange. Agents are able to sense their local states and partially their surroundings. We extend ltl to be able to reason explicitly about the intentions of agents in the interaction and their communication protocols. We also study the complexity of satisfiability and model-checking of this extension.
What values should an agent align with?: An empirical comparison of general and context-specific values
The pursuit of values drives human behavior and promotes cooperation. Existing research is focused on general values (e.g., Schwartz) that transcend contexts. However, context-specific values are necessary to (1) understand human decisions, and (2) engineer intelligent agents that can elicit and align with human values. We propose Axies, a hybrid (human and AI) methodology to identify context-specific values. Axies simplifies the abstract task of value identification as a guided value annotation process involving human annotators. Axies exploits the growing availability of value-laden text corpora and Natural Language Processing to assist the annotators in systematically identifying context-specific values. We evaluate Axies in a user study involving 80 human subjects. In our study, six annotators generate value lists for two timely and important contexts: Covid-19 measures and sustainable Energy. We employ two policy experts and 72 crowd workers to evaluate Axies value lists and compare them to a list of general (Schwartz) values. We find that Axies yields values that are (1) more context-specific than general values, (2) more suitable for value annotation than general values, and (3) independent of the people applying the methodology.
Resolving social dilemmas with minimal reward transfer
Social dilemmas present a significant challenge in multi-agent cooperation because individuals are incentivised to behave in ways that undermine socially optimal outcomes. Consequently, self-interested agents often avoid collective behaviour. In response, we formalise social dilemmas and introduce a novel metric, the , to quantify the disparity between individual and group rationality in such scenarios. This metric represents the maximum proportion of their individual rewards that agents can retain while ensuring that a social welfare optimum becomes a dominant strategy. Our approach diverges from traditional concepts of altruism, instead focusing on strategic reward redistribution. By transferring rewards among agents in a manner that aligns individual and group incentives, rational agents will maximise collective welfare while pursuing their own interests. We provide an algorithm to compute efficient transfer structures for an arbitrary number of agents, and introduce novel multi-player social dilemma games to illustrate the effectiveness of our method. This work provides both a descriptive tool for analysing social dilemmas and a prescriptive solution for resolving them via efficient reward transfer contracts. Applications include mechanism design, where we can assess the impact on collaborative behaviour of modifications to models of environments.
Exploring the influence of a user-specific explainable virtual advisor on health behaviour change intentions
Virtual advisors (VAs) are being utilised almost in every service nowadays from entertainment to healthcare. To increase the user's trust in these VAs and encourage the users to follow their advice, they should have the capability of explaining their decisions, particularly, when the decision is vital such as health advice. However, the role of an explainable VA in health behaviour change is understudied. There is evidence that people tend to change their intentions towards health behaviour when the persuasion message is linked to their mental state. Thus, this study explores this link by introducing an explainable VA that provides explanation according to the user's mental state (beliefs and goals) rather than the agent's mental state as commonly utilised in explainable agents. It further explores the influence of different explanation patterns that refer to beliefs, goals, or beliefs&goals on the user's behaviour change. An explainable VA was designed to advise undergraduate students how to manage their study-related stress by motivating them to change certain behaviours. With 91 participants, the VA was evaluated and the results revealed that user-specific explanation could significantly encourage behaviour change intentions and build good user-agent relationship. Small differences were found between the three types of explanation patterns.
On the graph theory of majority illusions: theoretical results and computational experiments
The popularity of an opinion in one's direct circles is not necessarily a good indicator of its popularity in one's entire community. Network structures make local information about global properties of the group potentially inaccurate, and the way a social network is wired constrains what kind of information distortion can actually occur. In this paper, we discuss which classes of networks allow for a large enough proportion of the population to get a wrong enough impression about the overall distribution of opinions. We start by focusing on the 'majority illusion', the case where one sees a majority opinion in one's direct circles that differs from the global majority. We show that no network structure can guarantee that most agents see the correct majority. We then perform computational experiments to study the likelihood of majority illusions in different classes of networks. Finally, we generalize to other types of illusions.
Minimality and comparison of sets of multi-attribute vectors
In a decision-making problem, there is often some uncertainty regarding the user preferences. We assume a parameterised utility model, where in each scenario we have a utility function over alternatives, and where each scenario represents a possible user preference model consistent with the input preference information. With a set of alternatives available to the decision-maker, we can consider the associated utility function, expressing, for each scenario, the maximum utility among the alternatives. We consider two main problems: firstly, finding a minimal subset of that is equivalent to it, i.e., that has the same utility function. We show that for important classes of preference models, the set of possibly strictly optimal alternatives is the unique minimal equivalent subset. Secondly, we consider how to compare to another set of alternatives , where and correspond to different initial decision choices. This is closely related to the problem of computing setwise max regret. We derive mathematical results that allow different computational techniques for these problems, using linear programming, and especially, with a novel approach using the extreme points of the epigraph of the utility function.
Assimilating human feedback from autonomous vehicle interaction in reinforcement learning models
A significant challenge for real-world automated vehicles (AVs) is their interaction with human pedestrians. This paper develops a methodology to directly elicit the AV behaviour pedestrians find suitable by collecting quantitative data that can be used to measure and improve an algorithm's performance. Starting with a Deep Q Network (DQN) trained on a simple Pygame/Python-based pedestrian crossing environment, the reward structure was adapted to allow adjustment by human feedback. Feedback was collected by eliciting behavioural judgements collected from people in a controlled environment. The reward was shaped by the inter-action vector, decomposed into feature aspects for relevant behaviours, thereby facilitating both implicit preference selection and explicit task discovery in tandem. Using computational RL and behavioural-science techniques, we harness a formal iterative feedback loop where the rewards were repeatedly adapted based on human behavioural judgments. Experiments were conducted with 124 participants that showed strong initial improvement in the judgement of AV behaviours with the adaptive reward structure. The results indicate that the primary avenue for enhancing vehicle behaviour lies in the predictability of its movements when introduced. More broadly, recognising AV behaviours that receive favourable human judgments can pave the way for enhanced performance.
Towards interactive explanation-based nutrition virtual coaching systems
The awareness about healthy lifestyles is increasing, opening to personalized intelligent health coaching applications. A demand for more than mere suggestions and mechanistic interactions has driven attention to nutrition virtual coaching systems (NVC) as a bridge between human-machine interaction and recommender, informative, persuasive, and argumentation systems. NVC can rely on data-driven opaque mechanisms. Therefore, it is crucial to enable NVC to explain their doing (i.e., engaging the user in discussions (via arguments) about dietary solutions/alternatives). By doing so, transparency, user acceptance, and engagement are expected to be boosted. This study focuses on NVC agents generating personalized food recommendations based on user-specific factors such as allergies, eating habits, lifestyles, and ingredient preferences. In particular, we propose a user-agent negotiation process entailing run-time feedback mechanisms to react to both recommendations and related explanations. Lastly, the study presents the findings obtained by the experiments conducted with multi-background participants to evaluate the acceptability and effectiveness of the proposed system. The results indicate that most participants value the opportunity to provide feedback and receive explanations for recommendations. Additionally, the users are fond of receiving information tailored to their needs. Furthermore, our interactive recommendation system performed better than the corresponding traditional recommendation system in terms of effectiveness regarding the number of agreements and rounds.
Off-line synthesis of evolutionarily stable normative systems
Within the area of multi-agent systems, normative systems are a widely used framework for the coordination of interdependent activities. A crucial problem associated with normative systems is that of synthesising norms that will effectively accomplish a coordination task and that the agents will comply with. Many works in the literature focus on the on-line synthesis of a single, (convention) whose compliance forms a rational choice for the agents and that effectively coordinates them in particular coordination situation that needs to be identified and modelled as a game in advance. In this work, we introduce a framework for the automatic off-line synthesis of that coordinate the agents in multiple that cannot be easily identified in advance nor resolved separately. Our framework roots in evolutionary game theory. It considers multi-agent systems in which the potential conflict situations can be automatically enumerated by employing MAS simulations along with basic domain information. Our framework simulates an evolutionary process whereby successful norms prosper and spread within the agent population, while unsuccessful norms are discarded. The outputs of such a natural selection process are sets of that, together, effectively coordinate the agents in multiple interdependent situations and are evolutionarily stable. We empirically show the effectiveness of our approach through empirical evaluation in a simulated traffic domain.
Mandrake: multiagent systems as a basis for programming fault-tolerant decentralized applications
We conceptualize a software application as one constituted from agents that communicate via messaging. Modern software paradigms such as microservices and settings such as the Internet of Things evidence a growing interest in decentralized applications. Constructing a decentralized application involves designing agents as independent local computations that coordinate successfully to realize the application's requirements. Moreover, a decentralized application is susceptible to faults manifested as message loss, delay, and reordering. We contribute , a programming model for decentralized applications that tackles these challenges without relying on infrastructure guarantees. Specifically, we adopt the construct of an that specifies messaging between agents purely in causal terms and can be correctly enacted by agents in a shared-nothing environment over nothing more than unreliable, unordered transport. Mandrake facilitates (1) implementing protocol-compliant agents by introducing a programming model; (2) transforming protocols into fault-tolerant ones with simple annotations; and (3) a declarative policy language that makes it easy to implement fault-tolerance in agents based on the capabilities in protocols. Mandrake's significance lies in demonstrating a straightforward approach for constructing decentralized applications without relying on coordination mechanisms in the infrastructure, thus achieving some of the goals of the founders of networked computing from the 1970s.
The distortion of threshold approval matching
We study matching settings in which a set of agents have private utilities over a set of items. Each agent reports a partition of the items into approval sets of different threshold utility levels. Given this limited information on input, the goal is to compute an assignment of the items to the agents (subject to cardinality constraints depending on the application) that (approximately) maximizes the social welfare (the total utility of the agents for their assigned items). We first consider the well-known, simple one-sided matching problem in which each of agents is to be assigned exactly one of items. We show that with threshold utility levels, the distortion of deterministic matching algorithms is [Formula: see text] while that of randomized algorithms is [Formula: see text]. We then show that our distortion bounds extend to a more general setting in which there are multiple copies of the items, each agent can be assigned a number of items (even copies of the same one) up to a capacity, and the utility of an agent for an item depends on the number of its copies that the agent is given.
Ability and knowledge: from epistemic transition systems to labelled stit models
It is possible to know that one can guarantee a certain result and yet not know how to guarantee it. In such cases one has the ability to guarantee something in a causal sense, but not in an epistemic sense. In this paper we focus on two formalisms used to model both conceptions of ability: one formalism based on epistemic transition systems and the other on labelled stit models. We show a strong correspondence between the two formalisms by providing mappings from the former to the latter for both the languages and the structures. Moreover, we demonstrate that our extension of labelled stit logic is more expressive than the logic of epistemic transition systems.
