Elsevier

Speech Communication

Volume 84, November 2016, Pages 83-96
Speech Communication

Learning cooperative persuasive dialogue policies using framing

https://doi.org/10.1016/j.specom.2016.09.002Get rights and content

Highlights

  • System’s Performance for the user simulator is greatly improved by reinforcement learning.

  • Framing is somewhat effective for the user simulator.

  • Average rewards of system reach the minimum value with the policy where the estimated GPF reaches the highest average entropy.

  • Learning a policy with framing is effective in the text-based cooperative dialogue system.

Abstract

In this paper, we propose a new framework of cooperative persuasive dialogue, where a dialogue system simultaneously attempts to achieve user satisfaction while persuading the user to take some action that achieves a pre-defined system goal. Within this framework, we describe a method for reinforcement learning of cooperative persuasive dialogue policies by defining a reward function that reflects both the system and user goal, and using framing, the use of emotionally charged statements common in persuasive dialogue between humans. In order to construct the various components necessary for reinforcement learning, we first describe a corpus of persuasive dialogues between human interlocutors, then propose a method to construct user simulators and reward functions specifically tailored to persuasive dialogue based on this corpus. Then, we implement a fully automatic text-based dialogue system for evaluating the learned policies. Using the implemented dialogue system, we evaluate the learned policy and the effect of framing through experiments both with a user simulator and with real users. The experimental evaluation indicates that the proposed method is effective for construction of cooperative persuasive dialogue systems.

Introduction

With the basic technology supporting dialogue systems maturing, there has been more interest in recent years about dialogue systems that move beyond the traditional task-based or chatter bot frameworks. In particular there has been increasing interest in dialogue systems that engage in persuasion or negotiation (Georgila, 2013, Georgila, Traum, 2011, Guerini, Stock, Zancanaro, 2003; Heeman, 2009; Mazzotta, de Rosis, 2006, Mazzotta, de Rosis, Carofiglio, 2007, Nguyen, Masthoff, Edwards, 2007, Paruchuri, Chakraborty, Zivan, Sycara, Dudik, Gordon, 2009). In this paper, we propose a method for learning cooperative persuasive dialogue systems, in which we place a focus not just on the success of persuasion (the system goal) but also user satisfaction (the user goal). This variety of dialogue system has the potential to be useful in situations where the user and system have different, but not mutually exclusive goals. An example of this is a sales situation where the user wants to find a product that matches their taste, and the system wants to successfully sell a product, ideally one with a higher profit margin.

Creating a system that both has persuasive power and is able to ensure that the user is satisfied is not an easy task. In order to tackle this problem with the help of recent advances in statistical dialogue modeling, we build our system upon the framework of reinforcement learning and specifically partially observable Markov decision processes (POMDP)  (Levin, Pieraccini, Eckert, 2000, Williams, Young, 2007, Williams, Young, 2007), which we describe in detail in Section 2. In the POMDP framework, it is mainly necessary to define a reward representing the degree of success of the dialogue, the set of actions that the system can use, and a belief state to keep track of the system beliefs about its current environment. Once these are defined, reinforcement learning enables the system to learn a policy maximizing the reward.

In this paper, in order to enable the learning of policies for cooperative persuasive dialogue systems, we tailor each of these elements to the task at hand (Section 4):

  • Reward: We present a method for defining the reward as a combination of the user goal (user satisfaction), the system goal (persuasive success), and naturalness of the dialogue. This is in contrast to research in reinforcement learning for slot-filling dialogue, where the system aims to achieve only the user goal  (Levin, Pieraccini, Eckert, 2000, Williams, Young, 2007, Williams, Young, 2007), or for persuasion and negotiation dialogues, where the system receives a reward corresponding to only the system goal (Georgila, 2013, Georgila, Traum, 2011; Heeman, 2009; Paruchuri et al., 2009). We use a human-to-human persuasive dialogue corpus (Section 3, Hiraoka et al., 2014a) to train predictive models for achievement of a human persuadee’s and a human persuader’s goals, and introduce these models to reward calculation to enable the system to learn a policy reflecting knowledge of human persuasion.

  • System Action: We introduce framing (Irwin et al., 2013), which is known to be important for persuasion, as a system action (i.e., system dialogue act). Framing uses emotionally charged words (positive or negative) to explain particular alternatives. In the context of research that applies reinforcement learning to persuasive (or negotiation) dialogue, this is the first work that considers framing in this way. In this paper the system controls the polarity (positive or negative) and the target alternative of framing (see Table 3 for an example of framing).

  • Belief State: As the belief state, we use the dialogue features used in calculating the reward function. For example, whether the persuadee has been informed that a particular option matches their preference was shown in human dialogue to be correlated with persuasive success, which is one of the reward factors. Some of the dialogue features reward calculation can not be observed directly by the system, and thus we incorporate them into the belief state.

Based on this framework, we construct the first fully automated text-based cooperative persuasive dialogue system (Section 5). To construct the system, in addition to the policy module, natural language understanding (NLU), and natural language generation (NLG) are required. We construct an NLU module using the human persuasive dialogue corpus and a statistical classifier. In addition, we construct an NLG module based on example-based dialogue, using a dialogue database created from the human persuasive dialogue corpus.

Using this system, we evaluate the learned policy and the utility of framing (Section 6). To our knowledge, in context of the research for persuasive and negotiation dialogue, it is first time that a learnt policy is evaluated with fully automated dialogue system. The evaluation is done both using a user simulator and real users.

This paper comprehensively integrates our work in Hiraoka et al. (2014b) and Hiraoka et al. (2015), with a more complete explanation and additional experiments. Specifically regarding the additional experimental results, in this paper we additionally perform 1) experimental evaluation using a reward function which exactly matches the learning phase (6.1.1 Evaluation for the learned policy and framing, 6.2 Complete system evaluation with real users), and 2) an evaluation of the effect of NLU error rate (Section 6.1.2).

Section snippets

Reinforcement learning

In reinforcement learning, policies are updated based on exploration in order to maximize a reward. In this section, we briefly describe reinforcement learning in the context of dialogue. In dialogue, the policy is a mapping function from a dialogue state to a particular system action. In reinforcement learning, the policy is learned to maximize the reward function, which in traditional task-based dialogue system is user satisfaction or task completion (Walker et al., 1997). Reinforcement

Cooperative persuasive dialogue corpus

In this section, we give a brief overview of cooperative persuasive dialogue, and a human dialogue corpus that we use to construct the dialogue models and dialogue system described in later sections. Based on the persuasive dialogue corpus (Section 3.1), we define and quantify the actions of the cooperative persuader (Section 3.2). In addition, we annotate persuasive dialogue acts of the persuader from the point of view of framing (Section 3.3).

Cooperative persuasive dialogue modeling

In this section, we describe a statistical dialogue model for cooperative persuasive dialogue. The proposed cooperative persuasive dialogue model consists of a user-side dialogue model (Section 4.1) and a system-side model (Section 4.2).

Text-based cooperative persuasive dialogue system

To evaluate the policy learned with the dialogue model described in Section 4, we construct a fully automated text-based cooperative persuasive dialogue system. The structure of the system is shown in Fig. 2. Especially, in this section, we describe the construction of NLU (Section 5.1) and NLG (Section 5.2) modules that act as an interface between the policy module and the human user, and are necessary for fully automatic dialogue.

Experimental evaluation

In this section, we describe the evaluation of the proposed method for learning cooperative persuasive dialogue policies. Especially, we focus on examining how the learned policy with framing is effective for persuasive dialogue. The evaluation is done both using a user simulator (Section 6.1) and real users (Section 6.2).

Related work

There are a number of related works that apply reinforcement learning to persuasion and negotiation dialogue. (Georgila and Traum, 2011) apply reinforcement learning to negotiation dialogue using user simulators divided into three types representing individualist, collectivist, and altruist. Dialogue between a florist and a grocer is used as an example of negotiation dialogue. In addition, Georgila (2013) also applies reinforcement learning to two-issue negotiation dialogue where participants

Conclusion

In this paper, we applied reinforcement learning for learning cooperative persuasive dialogue system policies using framing, and evaluated the learned policies with a fully automated dialogue system. In order to apply reinforcement learning, a user simulator and reward function were constructed based on a human persuasive dialogue corpus. Then, we implemented a fully automatic dialogue system for evaluating the learned policies. We evaluated the learned policy and effect of framing using the

Acknowledgement

Part of this research was supported by JSPS KAKENHI Grant Number 24240032 and the Commissioned Research of National Institute of Information and Communications Technology (NICT), Japan.

References (33)

  • G. Carenini et al.

    Generating and evaluating evaluative arguments

    Artif. Intell.

    (2006)
  • C. Lee et al.

    Example-based dialog modeling for practical multi-domain dialog system

    Speech Commun.

    (2009)
  • L.E. Asri et al.

    Reward shaping for statistical optimisation of dialogue management

    Proceedings of the International Conference on Statistical Language and Speech Processing

    (2013)
  • T.J.M. Bench-Capon

    Persuasion in practical argument using value-based argumentation frameworks

    J. Logic Comput.

    (2003)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • K. Georgila

    Reinforcement learning of two-issue negotiation dialogue policies

    Proc. SIGDIAL

    (2013)
  • K. Georgila et al.

    Reinforcement learning of argumentation dialogue policies in negotiation

    Proc. INTERSPEECH

    (2011)
  • M. Guerini et al.

    Persuasion model for intelligent interfaces

    Proc. CMNA

    (2003)
  • P.A. Heeman

    Representing the reinforcement learning state in a negotiation dialogue

    Proc. ASRU

    (2009)
  • T. Hiraoka et al.

    Construction and analysis of a persuasive dialogue corpus

    Proc. IWSDS

    (2014)
  • T. Hiraoka et al.

    Reinforcement learning of cooperative persuasive dialogue policies using framing

    Proc. COLING

    (2014)
  • T. Hiraoka et al.

    Evaluation of a fully automatic cooperative persuasive dialogue system

    Proc. IWSDS

    (2015)
  • L. Irwin et al.

    All frames are not created equal: A typology and critical analysis of framing effects

    Organ. Behav. Hum. Dec. Proces.

    (2013)
  • ISO24617-2, 2010. Language resource management-Semantic annotation frame work (SemAF), Part 2: Dialogue acts....
  • T. Kudo et al.

    Applying conditional random fields to Japanese morphological analysis

    Proc. EMNLP

    (2004)
  • E. Levin et al.

    A stochastic model of human-machine interaction for learning dialog strategies

    IEEE Trans. Speech Audio Process.

    (2000)
  • Cited by (11)

    View all citing articles on Scopus
    View full text