Elsevier

Biosystems

Volume 213, March 2022, 104633
Biosystems

Softsatisficing: Risk-sensitive softmax action selection

https://doi.org/10.1016/j.biosystems.2022.104633Get rights and content
Under a Creative Commons license
open access

Abstract

Animals, humans, and organizations are known to adjust how (much) they explore complex environments that exceed their information processing capacity, rather than relentlessly search for the optimal action. The adjusted depth of exploration is supposed to depend on the aspiration level internal to the agent. This action selection tendency is known as satisficing. The Risk-sensitive Satisficing (RS) model implements satisficing in the reinforcement learning framework through conversion of action values into gains (or losses) relative to the aspiration level. The risk-sensitive evaluation of action values by RS has been shown to be effective in reinforcement learning. In this paper, first we analyze RS in comparison with UCB and Thompson sampling algorithms. We also show that RS shows differential risk-attitudes considering the risks. Then we propose the Softsatisficing policy that is a stochastic equivalent of RS and further analyze the exploratory behavior of risk-sensitive satisficing that RS and Softsatisficing implement. We emphasize that Softsatisficing has the potential of modeling risk-sensitive foraging and other decision-making behaviors by humans, animals, and organizations.

Keywords

Bounded rationality
Exploration
Optimism in the face of uncertainty
Multi-armed bandit problems

Data availability

No data was used for the research described in the article.

Cited by (0)