Abstract
This is an extension from a selected paper from JSAI2019. Social learning is crucial in acquisition of the intelligent behaviors of humans and many kinds of animals, as it makes behavior learning far more efficient than pure trial-and-error. In imitation learning, a representative form of social learning, the agent observes specific action-state pair sequences produced by another agent (expert) and reflect them into its own action. One of its implementations in reinforcement learning is the inverse reinforcement learning. We propose another form of social learning, emulation learning, which requires much less information from another agent (pioneer). In emulation learning, the agent is given only a certain level of achievement by another agent, or a record. In this study, we implement emulation learning in the reinforcement learning setting by applying a model of satisficing action policy. We show that the emulation learning algorithm works well both in stationary and non-stationary reinforcement learning tasks, breaking the often observed trade-off like relationship between efficiency and flexibility.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Whiten, A., McGuigan, N., Marshall-Pescini, S., Hopper, L.M.: Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee. Philos. Trans. R. Soc. B: Biol. Sci. 364(1528), 2417–2428 (2009). https://doi.org/10.1098/rstb.2009.0069
Takahashi, T., Kohno, Y., Uragami, D.: Cognitive satisficing: bounded rationality in reinforcement learning. Trans. Jpn. Soc. Artif. Intell. 31(6), AI30-M\_1–11 (2016). (in Japanese)
Tamatsukuri, A., Takahashi, T.: Guaranteed satisficing and finite regret: analysis of a cognitive satisficing value function. BioSystems 180, 46–53 (2019)
Andrew Maas, J., Bagnell, A., Dey, A.K., Ziebart, B.D.: Maximum entropy inverse reinforcement learning. In: AAAI 2008 (2008)
Levy, K.Y., Shimkin, N.: Unified inter and intra options learning using policy gradient methods. In: EWRL, pp. 153–164 (2011)
Simon, H.A.: Rational choice and the structure of the environment. Psychol. Rev. 63(2), 129–138 (1956)
Ushida, U., Kono, Y., Takahashi, T.: Satisficing reinforcement learning for survival. In: Proceedings of JSAI 2017, 4C2-2in2 (2017). (in Japanese)
Kono, Y., Takahashi, T.: Autonomous optimal exploration through satisficing. In: Proceedings of JSAI 2018, 1Z3-04 (2018). (in Japanese)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 17H04696.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Shinriki, M., Wakabayashi, H., Kono, Y., Takahashi, T. (2020). Flexibility of Emulation Learning from Pioneers in Nonstationary Environments. In: Ohsawa, Y., et al. Advances in Artificial Intelligence. JSAI 2019. Advances in Intelligent Systems and Computing, vol 1128. Springer, Cham. https://doi.org/10.1007/978-3-030-39878-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-39878-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39877-4
Online ISBN: 978-3-030-39878-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)