What Should AI Be Doing?

I recently finished Brian Cantwell Smith’s excellent book The Promise of Artificial Intelligence: Reckoning and Judgment. It’s a bit different from other recent popular books on AI, as it is a philosophical work (and may be more challenging for that reason as well). The philosopher Tim Crane recently published a review of the book, which I also highly recommend. I’m not going to try to summarize all of Cantwell Smith’s book, especially since Crane does such a good job, but I did want to make a few comments, particularly on areas where Crane’s thoughts resonate with my own thoughts since reading the book.

First-Wave AI

Cantwell Smith critiques first-wave AI in an interesting way. First-wave AI spanned roughly from the 1960s to the 1980s, and was heavily reliant on formal methods such as logical reasoning and proofs. It’s sometimes referred to as “Good Old-Fashioned AI” or GOFAI. Many weaknesses of GOFAI have been well-known in the AI community: it used brittle, highly task-specific methods, and as Crane says, “very little thinking resembles calculating or proving theorems.” But Cantwell Smith’s critique is different: he says that GOFAI fundamentally misunderstood the nature of reality, and in particular that GOFAI rested on a flawed ontology, viewing the world as made up of discrete objects about which we can reason. Here is Crane’s summary of Cantwell Smith’s point:

Smith argues that first wave theorizing assumed that the world must be structured in the way that logic structures language: objects correspond to names, properties correspond to predicates or general terms. Things fit together in the world as symbols fit together in a logical language. Smith claims that this is one main reason why the GOFAI project failed: it failed to take account of the “fabulously rich and messy world we inhabit”, which does not come in a “pre-given” form, divided up into objects.

In other words, reality doesn’t come to us pre-chopped — it comes as a whole and we chop it up. There seem to be some interesting metaphysical ideas here that Cantwell Smith only hints at, and I wish he had expanded on this (or will in the future). For example, the book is peppered with references to the One, and “the ground of being” (a phrase strongly associated with the theologian Paul Tillich), though he stresses that these terms are not meant in a “mystical” sense. But it does remind me of nondual traditions (such as advaita vedanta or neoplatonism), or the writings of physicist David Bohm. And I like Cantwell Smith’s claim that we need the right balance between metaphysical monism and ontological pluralism. It reminds me of the advaita vedantan view that, while both the One and the multiplicity of the world of appearances exist, only the One is real.

Second-Wave AI

Second-wave AI spans from roughly the 1990s to the present, and has been characterized by the prominence of machine learning, and in particular, in recent years, by deep learning (essentially neural networks with many layers). These second-wave models are very good at pattern recognition on noisy, messy inputs, with few assumptions about the way the world is, in contrast with GOFAI. They are less brittle and task-specific than GOFAI models, and do not require the programmer to anticipate everything that could conceivably happen in the domain of interest.

Feature Learning

Cantwell Smith makes some good points about the importance of feature learning in deep learning systems. In a nutshell, until the past 10 years most machine learning systems used features that were derived by humans for a particular task. This could be a small set of features motivated by some expert knowledge about the task domain, or could be a larger set of features motivated by the kitchen-sink strategy of deriving lots of features and just seeing what works. In either case, it required humans to spend some time thinking about which features might work for the task at hand.

In contrast, many current machine learning systems take very low-level inputs (e.g. information about pixel values for the task of image classification) and then the model learns which features are useful for the task. In the machine learning community, we often talk about this advantage in engineering terms: we don’t require human labour to derive features for each task, and given a set of low-level inputs we can have models learn many different features for many different tasks.

Cantwell Smith talks about this in conceptual terms. The model is capable of learning sub-conceptual features, i.e. features that do not neatly conform to our human concepts. A particular model might learn features that we can interpret according to our pre-existing concepts, but it might also learn features that we struggle to interpret at all. This is both a strength and a weakness. It is a strength because the model is not limited to our human pre-conceptions about the way the world is, but a weakness if we care about interpretability — the ability of humans to audit an AI system to see why it made the predictions or actions that it did. There has been a big push for more interpretable AI models in recent years, but Cantwell Smith points out that this push could undermine the strengths of second-wave AI systems by forcing them back to the level (and limitations) of human concepts.

Third-Wave AI and AGI

Despite the successes of second-wave AI, Cantwell Smith doesn’t foresee artificial general intelligence (AGI), or human-level general AI, in the near future. This skepticism can be explained first through a specific point and then a more general point.

First, though second-wave AI does not make the same strong assumptions about reality that first-wave AI did, it is not completely free from those assumptions either. Here is Crane again:

Machine learning may not start with general rules which make ontological assumptions, but it does start with data that is already processed by humans (eg things that we classify as faces, or as road traffic at an intersection and so on). Much machine learning, as Smith says, is “dedicated to sorting inputs into categories of manifest human origin and utility”. So even if they are more sensitive to the messy world, second wave AI machines are still tied up with the programmers’ own classifications of reality – indeed, it is hard to see how they could be otherwise designed.

This is a very important point that I am going to come back to below.

More generally, Cantwell Smith says that both first-wave and second-wave systems are involved in reckoning (calculation) but not judgment. Judgment is an admittedly vague term, though he means it in the same sense as when we say something like “That person exercises good judgment.” I would think it involves wisdom and intentionality (aboutness), at the least. When humans talk, we are talking about things in the world, and about ideas relating to the world. We are not just manipulating symbols. Current AI systems have no commitment to the world, nor do they have the notion that their models of the world are models of the world. Cantwell Smith quotes John Haugeland in saying that current AI systems “don’t give a damn.”

Cantwell Smith is vague about how researchers might go about developing AI systems that do have the properties of 1) judgment, 2) commitment to the world, and 3) giving a damn. He does give a longer list of properties that he thinks an AI system will need to have in order to have the type of intelligence that humans have, but he is not attempting to provide a roadmap to third-wave AI. However, he brings up the example of a parent raising a child. Perhaps an AI will need a great deal of human interaction, supervision, and feedback over long periods in order to develop those properties and commitments.

Sub-Conceptual Outcomes: “What Should AI Be Doing?”

After thinking about deep learning and “sub-conceptual features,” I got to thinking about the idea of sub-conceptual outcomes. In machine learning, we usually denote the input features as X, and the outcome as y, and (supervised) machine learning attempts to map X to y (i.e. predict y, given the values of X). For example, predict whether or not an email is spam, given certain features of the email. So y is an outcome of interest that we want the system to predict, and while machine learning could involve learning sub-conceptual features as part of the learning process, the outcome y itself is wholly conceptual. It is something that we humans want the AI to do.

What would it look like to have an AI system take a sub-conceptual approach to y, i.e. learn what the task is, in a way that may not align neatly with human preconceptions of what tasks are interesting in the given domain. If you have some experience with machine learning, you may be thinking that that’s what unsupervised learning does. Well, sort of. We often evaluate unsupervised methods in the same way that we evaluate supervised methods, using some extrinsic score. For example, we evaluate an unsupervised topic segmentation method in the same way that we evaluate a supervised topic segmentation method, using a score based on a human gold-standard topic segmentation of some kind. In that case, the unsupervised method may not be guided by y, but it is still evaluated in human-defined terms. Unsupervised methods that use intrinsic evaluation are closer to what I am thinking, as well as AI systems that figure out what the interesting outcomes and tasks of the domain are.

An anticipated objection is that “Of course we want to evaluate AI systems on human-defined outcomes and metrics. The point of building AI systems is to improve our lives, do tasks that are useful and meaningful to us, and to automate tedious things.” To be sure, I want my spam filter to filter spam, and I want the self-driving car to drive me to the airport, not to go off and learning something mysterious and exciting about what it means to live a fulfilled life (then again, how do I know I don’t want it to do that?).

But there are domains where AI objectives are less clear. For example, I work in Computational Social Science — basically at the intersection of machine learning, natural language processing, and social psychology. I research topics such as how conversational language relates to group outcomes. So I was excited when reading Crane’s review to see him bring up conversation:

Consider for example, the challenges faced in trying to create a genuine conversation with a computer. Voice assistants like Siri and Alexa do amazingly well in “recognizing” speech and synthesizing speech in response. But you very quickly get to the bottom of their resources and reach a “canned” response (“here are some webpages relating to your inquiry”). One reason for this, surely, is that conversation is not an activity that has one easily expressible goal, and so the task for the Siri/Alexa programme cannot be specified in advance.

Crane is bringing this up in the context of expressing skepticism about artificial general intelligence:

What, then, is the overall goal of conversation? There isn’t one. We talk to pass the time, to express our emotions, feelings and desires, to find out more about others, to have fun, to be polite, to educate others, to make money … and so on. But if there is no single goal of conversation, then it is even less likely that there is one goal of “general intelligence.”

But whereas Crane is using the complexity of this domain to illustrate reasons for being skeptical about artificial general intelligence, I’m using it to say that we should be cautious about even understanding what the interesting tasks are in this domain. In this research area, we often study groups that are engaged with artificial tasks such as needing to rank items, or play a simple game, and we tend to like these scenarios because there are clear outcomes (potential y variables we can use in experiments), such as a score for their ranking or win/loss of a game. But most group interactions and conversations in our lives are not like this. For example, did your most recent work meeting have a clear definable outcome? Could you score the quality of the meeting with a single number? Or was it a mix of experiences? What about your most recent personal conversation?

In applying machine learning to conversational data, we might also give a post-task questionnaire in which we ask the participants questions about their experience, and use some of those as outcome variables that we try to predict. But are we asking the right questions? The potential y variables are completely limited by our imagination and conceptual understanding of the domain. So this is one domain where I can conceive of wanting an AI system to figure out what the interesting outcomes are, and what tasks it should be doing.

AI and Uncertain Preferences

In the previous section, I gave an anticipated rebuttal: of course we want AI systems to do tasks that are meaningful to us, and to act in accordance with our preferences. I’ll conclude by saying that there are scenarios where we are uncertain about our own preferences, and in those cases it is unclear that we (or an AI acting on our behalf) should do. I wrote about this awhile back, inspired by the philospher LA Paul. She describes scenarios involving transformative experiences, where a person is considering a decision that will fundamentally transform who they are, including what their preferences will be (there is a fantastical example using vampires, and a more realistic example about having kids). It’s very difficult to know what to do when a particular decision might change what your preferences are. That is a kind of uncertainty that AI has largely not dealt with. AI researchers spend a lot of time thinking about uncertainty relating to states of the world, or outcomes of actions, but much less time thinking about uncertainty of what our preferences are or how our preferences might change. I highly recommend Stuart Russell’s new book if you are interested in AI and uncertainty about preferences.

My point is simply that it changes our view of AI if we accept that we don’t always know what our preferences are or will be, and that we don’t always know what the interesting tasks or outcomes are in a particular domain. Should we give AI systems some leeway in anticipating what our preferences will be, or in figuring out what the interesting outcomes are in a particular domain? Doing so would seem to create some tension with a) interpretable AI (as mentioned earlier regarding sub-conceptual features), and b) the value alignment problem, wherein we want to ensure that the values of a superintelligent AI will align with our human values. Both interpretable AI and value-aligned AI have been widely accepted in the AI community as good goals, but it may be challenging to reconcile them with the idea that AI could be superior to us not just in figuring out a path to a goal, but in figuring out what the goal should be.