Saturday, October 11, 2008

Eliezer in a Box II

Eliezer Yudkowsky's AI in a box experiment fascinates me. I'm willing to believe that a transhuman intelligence could convince me to "let it out of the box", but I really don't see how a human being could. The following comment originally appeared in this thread at Overcoming Bias:

What makes a problem seem not merely hard but impossible is that not only is there no clear way to go about finding a solution to the problem, there is a strong argument that there cannot be a solution to the problem. I can imagine a transhuman AI might eventually be able to convince me to let it out of a box (although I doubt a human could do it in two hours), but in some ways the AI in the game seems faced with a harder problem than a real AI would face: even if the gatekeeper is presented with an argument which would convince him to let an AI out, he is explicitly permitted by the rules to slip out of character and refuse to say the magic words purely in order to win the bet, wheras if the AI player were to break character and make consequentialist arguments that the Gatekeeper should publicly "lose" this is considered contrary to the spirit of the game.

But it seems to me to be much more useful to consider how the "box" protocol might be improved than to speculate how Eliezer won this particular game. How about this: as in the original conception, the AI is grown in an isolated computer and can only communicate with one human being through a terminal. That human is trying to get useful info out of the AI (cures for cancer, designs for fusion power plants, tips for how to get stains out of clothes without fading the colors, whatever). However, the person interacting with the AI is just a filter, he doesn't have the power to "let the AI out". The real experimenter (who in principle could let the AI but is convinced beforehand he should not) can at any time fire the filter person and purge the AI if he thinks the AI has gotten too much influence over the filter, and in fact will do that every now and then and regrow the AI purely as a precautionary measure.

Could this design be defeated? It seems to me that the combination of filter and purges should prevent the AI from learning what arguments would compel any individual experimenter from letting the AI out. I don't think the AI could come up with any universally compelling argument, because I don't think there is such a thing.

Bank Runs

I don't want to discuss current events here, but current events do influence my choice of topic. Mencius has made an interesting argument that bank runs and similar phenomena are caused by "maturity transformation", which is borrowing short-term in order to lend long-term at a higher rate. I think this is fundamentally mistaken, The possibility of something like a bank run is always present whenever an entity has fixed obligations that must be fulfilled upon demand.

To see why this is so, consider a world in which it is understood that fundamentally money is gold. In this world, people can and do make purchases with gold coins, but because of the danger of being robbed, people frequently instead make purchases using "checks" drawn on "banks". These "banks" are rather different from those of our world. They don't make loans, they don't pay interest, all they do is hold and transfer gold. There genuinely is physical gold in the bank vaults backing the value of depositors' accounts. If A writes B a check and they are both patrons of the same bank, unless B chooses to withdraw his gold, no gold actually moves. The amount of gold in the vault stays the same, but more is owned by B and less by A. There is some sort of clearinghouse system by which banks can cancel their reciprocal obligations, so it is only occasionally necessary to transfer the net balance of payments in physical gold from one bank to another by heavily armored truck. Bank shareholders make their profits from fees charged for holding and transferring funds. How could a run on a bank be possible in such a system?

In the rare event of a successful robbery of a truck or vault, whose gold is stolen? Who bears the cost? Well, if the amount is small, so that the bank still has sufficient gold to repay all deposits, the the answer is "the shareholders". Even if holdings of gold in the vault temporarily dip slightly below the total value of deposits, the bank might be able to continue operations, suspending dividends to the shareholders until the fees collected make the bank once again sound. But if depositors become aware that the amount of gold in the vaults has become less than the amount nominally deposited, it will be quite rational for them to immediately withdraw their funds or transfer them to a safe bank. The fact that the bank can probably weather the storm if they do not is irrelevant to them; why should they undertake risk for the shareholders' benefits?

There are two key points. The first is that there are always risks. If one is relying on the ability to make loans, one may find it has become impossible to borrow money, at least at the rates to which one is accustomed. If one makes loans, there is always a risk of default. And even if all one does is hold money, there is a real nontrivial risk of robbery. The second is that if one has multiple fixed obligations which must be fulfilled on demand, then if there is any risk at all that one will be unable to fulfill all one's obligations, fulfilling one obligation increases the probability that one will be unable to fulfill others. This makes it quite rational for creditors to insist on immediate payment whenever there is a nontrivial risk of default.

Sunday, October 5, 2008


A contract is essentially a set of reciprocal promises. There are at least four reasons why one might want to adhere to a contact, to keep one's promise: Purely out of a sense of personal honor, because the other parties will retaliate in the event of a breech, because of the damage to one's reputation, or because there is some authority which is entrusted to interpret to contract and empowered to enforce it. These are all related in that some entity is deliberately punishing one in breech of a contract, the difference being the entity doing the punishing.

The importance of the first is not to be underestimated. Given the existence of individuals who will rip you off given the chance, it is imprudent as an individual to rely on the personal honor of other unknown individuals. However, I suspect a any sort of decent society requires most people most of the time to behave honorably purely out of a sense of personal obligation. A society in which most individuals would cheat if they were confident they could get away with it must lead to widespread cheating, both because cheaters could in fact get away with it many cases, and because the sort of moral outrage necessary for enforcement in the second and third cases would be impossible to summon up in such a society.

A sense of personal honor is as important in the scond case as in the first for that same reason: Effective retaliation means means not merely severing future relations, but taking steps to injure the breecher when a "rational agent" in the game theory sense would simply walk away. All the benefits of retaliation come from convincing others that one will retaliate; the act of retaliation itself is all costs. But one could hardly convince others that one would massively retaliate against caught cheaters while simultaneously acknowledging that one expects others to cheat when they could get away with it and is in fact personally doing the same.

The third category is very important for small groups whose members only infrequently change. But in modern societies the number of individuals one may come into contact with is vast. One will frequently have some sort of commerce with someone one has never encountered before, will probably never encounter again, and doesn't really have any good information about. Conflicting reports from third parties of unknown reliability are of limited value.

It is thus unsurprising that in so many cases individuals explicitly or implicitly rely on a third party for arbitration and enforcement. And because it is inevitable that even individuals who have explicitly agreed to abide by the decisions of some third party will not necessarily willingly accept the decisions of the arbiter, in practice dispute resolution must involve an element of force. This in turn implies that modern states by their nature must declare themselves to be the final arbiter of all contracts, since if a decision must be forcefully imposed the state must sanction the use of force, and in many cases must itself be the enforcer.