tao ,
@tao@mathstodon.xyz avatar

Suppose an agent (such as an individual, organization, or an AI) needs to choose between two options A and B. One can try to influence this choice by offering incentives ("carrots") or disincentives ("sticks") to try to nudge that agent towards one's preferred choice. For instance, if one prefers that the agent choose A over B, one can offer praise if A is chosen, and criticism if B is chosen. Assuming the agent does not possess contrarian motivations and acts rationally, the probability of the agent selecting your preferred outcome is then monotone in the incentives in the natural fashion: the more one praises the selection of A, or the more one condemns the selection of B, the more likely the agent would select A over B.

However, this monotonicity can break down if there are three or more options: trying to influence an agent to select a desirable (to you) option A by criticizing option B may end up causing the agent to select an even less desirable option C instead. For instance, suppose one wants to improve safety conditions for workers in some industry. One natural approach is to criticise any company in the industry that makes the news due to an workplace accident. This is disincentivizing option B ("Allow workplace accidents to make the news") in the hope of encouraging option A ("Implement policies to make the workplace safer"). However, if this criticism is driven solely by media coverage, it can perversely incentivize companies to pursue option C ("Conceal workplace accidents from making the news"), which many would consider an even worse outcome than option B. (1/3)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • test
  • worldmews
  • mews
  • All magazines