How can we control intelligent systems no one fully understands?

The widespread conversations about AI took a new turn in March 2016 when Microsoft launched, then quickly unplugged, Tay, its artificial intelligence chat robot. Within 24 hours, interactions on Twitter had turned the bot, modeled after a teenage girl, into a “Hitler-loving sex robot.”

This controversy, on the heels of the Feb. 14, 2016, accident of Google’s self-driving robot car, has ignited a new debate over artificial intelligence. How should we design intelligent learning machines that minimize undesirable behavior?

While both of the aforementioned incidents were relatively minor, they highlight a broader concern, namely, that it is very difficult to control adaptive learning machines in complex environments. The famous cybernetician Norbert Wiener warned us about this dilemma more than 50 years ago: “If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively … we had better be quite sure that the purpose put into the machine is the purpose we really desire and not merely a colorful imitation of it.”

The difficulty with learning machines arises because they train themselves on what they see, the “training data,” that cannot completely represent situations the machine will see in the future. These are sometimes called “edge cases” or “Rumsfeldian unknowns,” in that they are unknowable in advance. Humans typically deal with such edge cases remarkably well with no prior “training” by applying common sense or finding analogies and learning from these samples of one.

There is no clear answer to this vexing issue at the moment. However, it is essential for designers of such systems to assess worst-case implications in terms of the risks associated with future edge cases.

It is very difficult to control adaptive learning machines in complex environments.

For starters, the analysis of errors on the training set is essential in helping us understand what a system has learned in the first place. Understanding what the machine has learned is a nontrivial activity. For example, in a recent project involving estimating the audience on TV stations at different times of day, we found that a system’s predictions were badly off at certain times in some regions of the country.

A detailed analysis of the errors revealed that the system wasn’t aware of special sports schedules in the relevant regions at those times. By adding this data to the training set and retraining the system, its performance improved dramatically in the problematic cases, while it stayed the same for the others, which suggested that the machine had learned to use the new knowledge in dealing with previously problematic cases. While this doesn’t address the problem in general, it is the first line of attack in making systems robust.

Another strategy is to leverage the globally available human intelligence on the Internet to create the edge cases where the machine is likely to fail. Crowdsourcing is a commonly used method to obtain or process such cases. Some of my colleagues have set up a system called “beat the machine,” where humans are asked to identify cases where the machine’s predictions will be wrong. Each time they are successful in identifying such cases, the machine is better able to learn about them and deal with similar cases in the future.

There are other ways, some more automated, to add an adversary into the learning process, where the role of the adversary is to trip up the learning system. For example, adversarial examples can be constructed synthetically by slightly modifying real training examples with the goal of inducing errors. These are especially useful in cases where a system is most confident about its predictions, which happens because it is unable to distinguish between the real example and the slightly modified one, suggesting that it has not learned a sufficiently robust model we should trust just yet for automated use.

The goal of such strategies is to train the system to be more robust by essentially throwing in new and possible possibly bizarre cases in training the system that are not available in the existing data.

It is also worthwhile to estimate the costs of errors in worst-case scenarios. There is a large work of literature on risk in the finance industry that uses concepts of severity and frequency to quantify risk into monetary or other similar units. To use such frameworks, we need to construct distributions of losses associated with various outcomes including edge events, even though we may not be able to anticipate what these events will be in advance. If the tails of such distributions cannot be estimated reliably, chances are that the system is not yet ready for autonomous function.

The 22nd annual KDD-2016 Conference on Knowledge Discovery and Data Mining will take place August 13-17 in San Francisco.