I recently finished Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell, and I highly recommend it. A lot has been written in the past 5-10 years on superintelligent AI, AI safety, and AI control. Much of what has been written suffers either from being too sensationalist (especially works from people outside the field), or from being to dismissive about concerns around superintelligent AI (especially from people inside the field). Russell is a major figure in AI, and he takes the concerns about advanced AI seriously, while remaining grounded in reality and plotting a feasible path forward.
The core idea of Russell’s book is that we should design AI agents to be uncertain about human preferences. Let’s consider an AI named Rob that operates on behalf of a human named Mary, purchasing items on her behalf, based on Rob’s knowledge about Mary’s preferences. If Rob is certain about Mary’s preferences, he will simply go about purchasing items for Mary without needing to consult with Mary. However, if he is uncertain about her preferences, Rob may occasionally need to bring Mary back into the loop to make a decision. We could characterize this as Rob being willing to switch himself off in order to attain a better outcome for Mary. In the first scenario, where Rob was certain about Mary’s preferences, that would not happen.
The Standard Model of AI
The first scenario is what Russell calls the “standard model” of AI: an agent is given an objective function to optimize, and it simply goes about optimizing it, without any uncertainty about the function. In Rob’s case, the function would involve factors such as the cost of items, e.g. it wants to attain items that Mary likes, at the lowest cost possible, and with reasonable shipping times.
The problem with the standard model is that it could be catastrophic. For example, a superintelligent AI that is trying to minimize human suffering could decide to eliminate all human life on earth, taking a path that was completely unforeseen by humans towards optimizing the objective. Adding uncertainty about preferences can help us to avoid these scenarios.
To be clear, uncertainty and probability have played a major role in AI, especially since the 1980s. But this has primarily been uncertainty about the state of the world, and about the consequences of actions. There has been very little on uncertainty about preferences and the function that is being optimized.
Transformative Experience
By a nice coincidence, I listened to this podcast discussion between Sean Carroll and philosopher LA Paul shortly after finishing the Russell book. It also has to do with uncertain preferences, but in this case it is about uncertainty regarding our own preferences in decision-making scenarios that involve transformative experience. Paul starts with a fun example regarding vampires. Say you are approached by a vampire and told that you have a limited-time offer to become a vampire, and he starts giving you the sales pitch, saying that you’ll get to drink delicious blood and wander around at night, and scare people. You might reply that these things are completely unappealing — even disgusting — to you. But he replies that when you are a vampire, these things will be incredibly pleasing to you. As a vampire, you will love the taste of blood.
What should you do? You could say no to the opportunity, which would be in accord with your current preferences. But you could also say yes, which would lead to a major transformative experience for you in which your preferences change. You might really love being a vampire.
A more down-to-earth parallel example is the decision whether to have children. Say your preferences are not to have children. But having children is a transformative experience in which your preferences can completely change. Let’s denote you and your preferences prior to having children as X, and you and your preferences after having children as X’. Having children might be the wrong decision for person X but the right decision for person X’. Unfortunately, it is X who has to make the decision. X can’t really know what their preferences will be after the decision, and the decision will transform who they are.
How to Decide
So how should we handle decision-making regarding transformative experiences where we know that we and our preferences will change? In the podcast, I think Paul says basically that there is no clear answer, but that she discusses some suggestions in her book on transformative experiences. I would suggest that we give at least some weight to opportunities that seem to be in conflict with our current preferences, recognizing them as opportunities for transformation and growth.
In the earlier discussion about AI uncertainty regarding human preferences, we saw that this uncertainty allowed the AI to delegate some decision-making back to the human in order to attain a better outcome. But with uncertainty about our own preferences in the future, we can’t simply delegate the decision in a similar manner. I suppose some religious and spiritual people talk as if they do this: letting God (or the universe) make an important decision for them. I’m not sure how that plays out in practical terms for them, but perhaps we could interpret it as meaning more openness towards new and transformative experiences.
On that note, the later part of the podcast is on potential implications of Paul’s ideas on the atheism/theism debate. It’s very fascinating, but I think I’ll have to save that for another blog post.