Saturday, January 7, 2012

Self Modification (Part 2)




So we want to self modify, but not to be stupid about it.

We basically need to watch out for two types of fatal mistakes. We need to protect our terminal values, and we need to not break the part of us in control of self modification. As long as we do that, we can recover from most other mistakes. The only other requirement is that we can make progress on average.

So make sure you don't go changing your utility function. If you want something for instrumental reasons, then by all means fully update when the situation changes, but don't change what you really want. To be honest, I'm not really sure how you'd do this even if you wanted to. There are people that will stick to a silly instrumental goal to the point of acting like it's terminal, but that's just a bad case of lost purpose. Of course, since you're acting like it's a terminal goal, the results are just as bad - the difference is that it is recoverable. Sometimes people's far mode parts declare war on their near selves, but I expect them to be relatively ineffective at changing terminal values and to end up burning out.

And be really really careful before fucking with the part of you that does self modification, or any part that might affect the part that does self modification. Be aware of how much you enjoy self modification and turn the process on itself if it is ever at risk of becoming an ugh field. Keep a call stack and occasionally check on it so that you don't end up chasing a lost purpose. Make sure you actually check to see if it was an improvement.  Leave yourself a line of retreat so you can revert to last known good configuration. For example, "Cant handle thought of death, self modify to like death" is BAD. "can't handle thought of death, so don't think about it until reason to" is much less scary. That way when something comes along that gives you a fighting shot at doing something about it, you notice that death actually isn't good and you ought to act (<cough> cryonics <cough>).

I feel like this should go without saying, but don't choose to self decieve. While you actually can self decieve (and he later admitted this), the points for why you don't want to are valid and important. In short, you can't predict what sort of consequences this might have, and if you lose track of the fact that your belief is wrong, then you won't notice when your deception stops being helpful. Luckily, self deception isn't necessary. If it ever looks like people are better off being biased, then it is because the less biased people are systematically choosing predictably bad actions (kinda like the case against 'randomness'). There is always an even better option that doesn't involve irrationality. Find out which actions win, and do those with correct beliefs - even if those actions are fine grained mental processes for which we don't have good words.


So as long as you can manage that, there's not a whole lot to be scared of. Sure, you might have set backs, but now its just another project. Making progress on average shouldn't be that hard.

Regression to the mean just isn't that scary - just climb the gradient. In general, you can safely avoid regression toward the mean if you take a gradient climbing approach instead of taking large semi-blind leaps. Take baby steps, check if it was an improvement, and if the direction to move next has changed (but know that some habits are harder to get rid of than they are to create, and you don't want to end up with polar bears that you cant get rid of). Now you just have to worry about local optima and discontinuities (and those don't seem to exist in scary quantities here).

(Besides, ECT looks like it works by intentionally regressing you to the mean. It creates amnesia which seems to be correlated with success over various treatment types/intensities, and most people relapse. It looks a lot to me like running around breaking connections and throwing you into a new randomly selected brain state. This often gets you out of the depression attractor, but then you usually find your way back in and relapse if you don't change anything else. I'm not the only one to make this connection, but I can't find the reference.)

If you make your explicit reasoning trustworthy, then you should be able to avoid the valley. If you think you know better than to trust your gut, then step outside the argument, and ask whether this still looks like a good idea from the outside view, taking into account the first bullet point. Or better yet, ask your gut why it thinks what it does, and what sort of changes would be necessary to change its mind.

No comments:

Post a Comment