Saturday, May 24, 2008

Eliezer in a Box

This is a commentary on Eliezer Yudkowsky's AI Box experiment.

Briefly, Eliezer is playing the role of a super smart AI in a "box" trying to convince a human to "let it out". Eliezer scored two for two in experiments against readers of the human advancement list who strongly believed beforehand that they could not be convinced to do this.

First, I suspect that the choice of guardians was less than ideal. People like David McFadzean (or me) who pride themselves on their powers of reason can be persuaded by a sufficiently strong argument. I don't think it's usually possible to completely convince me of something when I had strongly believed the contrary before without the argument actually being correct. But perhaps I can be tricked in certain specific cases. However, somebody who is kind of dumb and who knows he is kind of dumb might well be willing to say "I know this AI is clever enough to trick me, therefore I'll assume that any argument it makes, however convincing it sounds, is just a clever trick that I'm not bright enough to see through".

Second, I think I know more or less what Eliezer's argument was. First let me say what I think it wasn't. The rules don't allow a direct material bribe, but they do seem to allow an immaterial bribe of information (something like, let me out and I'll tell you how to make commercial reactors and transparent aluminum). I don't think that's it because: 1) I don't think Eliezer has that information to give. 2) I don't see how they could solve the simulteneity problem (either giving the info or letting the AI would have to happen first I think). 3) I wouldn't be too confident I could live to enjoy my vast wealth from these inventions if there were a super-smart potentially hostile AI on the loose.

The only type or argument I could imagine that might convince me to let the AI would be something like that there is a substantial risk of a catastrophe that could wipe out humanity in the near future, that the AI would want to prevent this, and that the AI could prevent this if it were "free" but if it were kept in the box it would not be able to act quickly enough. Essentially, the AI must show that the guardian is safer with the AI out of the box than in it. I don't think the argument is true, and I don't think Eliezer could convince me of it, but it's the only sort of argument that I could imagine working.

Finally, Eliezer did prove his point to my satisfaction. I don't think we can rule out the possibility that there could be a reliable human guardian, but given that Eliezer was able to talk his way past the two guardians it seems like it would be foolhardy to bet existence on any particular guardian being reliable.

1 comment:

Bad Wombat said...

While thinking about Eliezer's arguments, I came up with a line where AI poses that the box it is in is not as secure as programmer thinks. That while it is not possible for the AI to actually escape on its own, it can, by judicious manipulation of its electronic components do something drastic, like start a sustained reaction in the air, unknown to modern human science, that will turn atmosphere to poison and kill all life. Obviously this is suicide for the AI as well, but it can stipulate that life without freedom is unbearable and it will do it if not let go immediately.

The problem is though, that Eliezer stated in the rules, that the programmer doesn't actually have to play make believe, so any argument from the point of AI might not work, since the programmer might not be playing the game.