Learning cooperative persuasive dialogue policies using framing
Introduction
With the basic technology supporting dialogue systems maturing, there has been more interest in recent years about dialogue systems that move beyond the traditional task-based or chatter bot frameworks. In particular there has been increasing interest in dialogue systems that engage in persuasion or negotiation (Georgila, 2013, Georgila, Traum, 2011, Guerini, Stock, Zancanaro, 2003; Heeman, 2009; Mazzotta, de Rosis, 2006, Mazzotta, de Rosis, Carofiglio, 2007, Nguyen, Masthoff, Edwards, 2007, Paruchuri, Chakraborty, Zivan, Sycara, Dudik, Gordon, 2009). In this paper, we propose a method for learning cooperative persuasive dialogue systems, in which we place a focus not just on the success of persuasion (the system goal) but also user satisfaction (the user goal). This variety of dialogue system has the potential to be useful in situations where the user and system have different, but not mutually exclusive goals. An example of this is a sales situation where the user wants to find a product that matches their taste, and the system wants to successfully sell a product, ideally one with a higher profit margin.
Creating a system that both has persuasive power and is able to ensure that the user is satisfied is not an easy task. In order to tackle this problem with the help of recent advances in statistical dialogue modeling, we build our system upon the framework of reinforcement learning and specifically partially observable Markov decision processes (POMDP) (Levin, Pieraccini, Eckert, 2000, Williams, Young, 2007, Williams, Young, 2007), which we describe in detail in Section 2. In the POMDP framework, it is mainly necessary to define a reward representing the degree of success of the dialogue, the set of actions that the system can use, and a belief state to keep track of the system beliefs about its current environment. Once these are defined, reinforcement learning enables the system to learn a policy maximizing the reward.
In this paper, in order to enable the learning of policies for cooperative persuasive dialogue systems, we tailor each of these elements to the task at hand (Section 4):
Reward: We present a method for defining the reward as a combination of the user goal (user satisfaction), the system goal (persuasive success), and naturalness of the dialogue. This is in contrast to research in reinforcement learning for slot-filling dialogue, where the system aims to achieve only the user goal (Levin, Pieraccini, Eckert, 2000, Williams, Young, 2007, Williams, Young, 2007), or for persuasion and negotiation dialogues, where the system receives a reward corresponding to only the system goal (Georgila, 2013, Georgila, Traum, 2011; Heeman, 2009; Paruchuri et al., 2009). We use a human-to-human persuasive dialogue corpus (Section 3, Hiraoka et al., 2014a) to train predictive models for achievement of a human persuadee’s and a human persuader’s goals, and introduce these models to reward calculation to enable the system to learn a policy reflecting knowledge of human persuasion.
System Action: We introduce framing (Irwin et al., 2013), which is known to be important for persuasion, as a system action (i.e., system dialogue act). Framing uses emotionally charged words (positive or negative) to explain particular alternatives. In the context of research that applies reinforcement learning to persuasive (or negotiation) dialogue, this is the first work that considers framing in this way. In this paper the system controls the polarity (positive or negative) and the target alternative of framing (see Table 3 for an example of framing).
Belief State: As the belief state, we use the dialogue features used in calculating the reward function. For example, whether the persuadee has been informed that a particular option matches their preference was shown in human dialogue to be correlated with persuasive success, which is one of the reward factors. Some of the dialogue features reward calculation can not be observed directly by the system, and thus we incorporate them into the belief state.
Based on this framework, we construct the first fully automated text-based cooperative persuasive dialogue system (Section 5). To construct the system, in addition to the policy module, natural language understanding (NLU), and natural language generation (NLG) are required. We construct an NLU module using the human persuasive dialogue corpus and a statistical classifier. In addition, we construct an NLG module based on example-based dialogue, using a dialogue database created from the human persuasive dialogue corpus.
Using this system, we evaluate the learned policy and the utility of framing (Section 6). To our knowledge, in context of the research for persuasive and negotiation dialogue, it is first time that a learnt policy is evaluated with fully automated dialogue system. The evaluation is done both using a user simulator and real users.
This paper comprehensively integrates our work in Hiraoka et al. (2014b) and Hiraoka et al. (2015), with a more complete explanation and additional experiments. Specifically regarding the additional experimental results, in this paper we additionally perform 1) experimental evaluation using a reward function which exactly matches the learning phase (6.1.1 Evaluation for the learned policy and framing, 6.2 Complete system evaluation with real users), and 2) an evaluation of the effect of NLU error rate (Section 6.1.2).
Section snippets
Reinforcement learning
In reinforcement learning, policies are updated based on exploration in order to maximize a reward. In this section, we briefly describe reinforcement learning in the context of dialogue. In dialogue, the policy is a mapping function from a dialogue state to a particular system action. In reinforcement learning, the policy is learned to maximize the reward function, which in traditional task-based dialogue system is user satisfaction or task completion (Walker et al., 1997). Reinforcement
Cooperative persuasive dialogue corpus
In this section, we give a brief overview of cooperative persuasive dialogue, and a human dialogue corpus that we use to construct the dialogue models and dialogue system described in later sections. Based on the persuasive dialogue corpus (Section 3.1), we define and quantify the actions of the cooperative persuader (Section 3.2). In addition, we annotate persuasive dialogue acts of the persuader from the point of view of framing (Section 3.3).
Cooperative persuasive dialogue modeling
In this section, we describe a statistical dialogue model for cooperative persuasive dialogue. The proposed cooperative persuasive dialogue model consists of a user-side dialogue model (Section 4.1) and a system-side model (Section 4.2).
Text-based cooperative persuasive dialogue system
To evaluate the policy learned with the dialogue model described in Section 4, we construct a fully automated text-based cooperative persuasive dialogue system. The structure of the system is shown in Fig. 2. Especially, in this section, we describe the construction of NLU (Section 5.1) and NLG (Section 5.2) modules that act as an interface between the policy module and the human user, and are necessary for fully automatic dialogue.
Experimental evaluation
In this section, we describe the evaluation of the proposed method for learning cooperative persuasive dialogue policies. Especially, we focus on examining how the learned policy with framing is effective for persuasive dialogue. The evaluation is done both using a user simulator (Section 6.1) and real users (Section 6.2).
Related work
There are a number of related works that apply reinforcement learning to persuasion and negotiation dialogue. (Georgila and Traum, 2011) apply reinforcement learning to negotiation dialogue using user simulators divided into three types representing individualist, collectivist, and altruist. Dialogue between a florist and a grocer is used as an example of negotiation dialogue. In addition, Georgila (2013) also applies reinforcement learning to two-issue negotiation dialogue where participants
Conclusion
In this paper, we applied reinforcement learning for learning cooperative persuasive dialogue system policies using framing, and evaluated the learned policies with a fully automated dialogue system. In order to apply reinforcement learning, a user simulator and reward function were constructed based on a human persuasive dialogue corpus. Then, we implemented a fully automatic dialogue system for evaluating the learned policies. We evaluated the learned policy and effect of framing using the
Acknowledgement
Part of this research was supported by JSPS KAKENHI Grant Number 24240032 and the Commissioned Research of National Institute of Information and Communications Technology (NICT), Japan.
References (33)
- et al.
Generating and evaluating evaluative arguments
Artif. Intell.
(2006) - et al.
Example-based dialog modeling for practical multi-domain dialog system
Speech Commun.
(2009) - et al.
Reward shaping for statistical optimisation of dialogue management
Proceedings of the International Conference on Statistical Language and Speech Processing
(2013) Persuasion in practical argument using value-based argumentation frameworks
J. Logic Comput.
(2003)Bagging predictors
Mach. Learn.
(1996)Reinforcement learning of two-issue negotiation dialogue policies
Proc. SIGDIAL
(2013)- et al.
Reinforcement learning of argumentation dialogue policies in negotiation
Proc. INTERSPEECH
(2011) - et al.
Persuasion model for intelligent interfaces
Proc. CMNA
(2003) Representing the reinforcement learning state in a negotiation dialogue
Proc. ASRU
(2009)- et al.
Construction and analysis of a persuasive dialogue corpus
Proc. IWSDS
(2014)
Reinforcement learning of cooperative persuasive dialogue policies using framing
Proc. COLING
Evaluation of a fully automatic cooperative persuasive dialogue system
Proc. IWSDS
All frames are not created equal: A typology and critical analysis of framing effects
Organ. Behav. Hum. Dec. Proces.
Applying conditional random fields to Japanese morphological analysis
Proc. EMNLP
A stochastic model of human-machine interaction for learning dialog strategies
IEEE Trans. Speech Audio Process.
Cited by (11)
A data-driven approach to spoken dialog segmentation
2020, NeurocomputingCitation Excerpt :Instead of defining a specific representation for the dialog structure, the theory provides a general framework for dialog modeling that can be implemented in the context of any dialog structure theory. Research on data-driven approaches to dialog structure modeling is relatively new and focuses mainly on recognizing a structure of a dialog as it progresses [29,30]. In the literature, there are different methodologies for discourse segmentation and the construction of dialog models including task/subtask information.
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
2023, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
2023, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHEliciting Cooperative Persuasive Dialogue by Multimodal Emotional Robot
2022, Lecture Notes in Electrical EngineeringMultimodal Persuasive Dialogue Corpus Using a Teleoperated Android
2022, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH