Daniel Kokotajlo


Medical Diagnostic Imaging, Leukemia, and Black Holes

Just chiming in to say I agree with Villiam. I actually read your whole post just now, and I thought it was interesting, and making an important claim... but I couldn't follow it well enough to evaluate whether or not it is true, or a good argument, so I didn't vote on it. I like the suggestion to break up your stuff into smaller chunks, and maybe add more explanation to them also.

For example, you could have made your first post something like "Remember that famous image of a black hole? Guess what: It may have been just hallucinating signal out of noise. Here's why." Then your next post could be: "Here's a list of examples of this sort of thing happening again and again in physics, along with my general theory of what's wrong with the epistemic culture of physics." I think for my part at least, I didn't have enough expertise to evaluate your claims about whether these examples really were false positives, nor enough expertise to evaluate whether they were just cherry-picked failures or indicative of a deeper problem in the community. If you had walked through the arguments in more detail, and maybe explained some key terms like "noise floor" etc., then I wouldn't have needed expertise and would be able to engage more productively.

Developmental Stages of GPTs

I think the purpose of the OT and ICT is to establish that lots of AI safety needs to be done. I think they are successful in this. Then you come along and give your analogy to other cases (rockets, vaccines) and argue that lots of AI safety will in fact be done, enough that we don't need to worry about it. I interpret that as an attempt to meet the burden, rather than as an argument that the burden doesn't need to be met.

But maybe this is a merely verbal dispute now. I do agree that OT and ICT by themselves, without any further premises like "AI safety is hard" and "The people building AI don't seem to take safety seriously, as evidenced by their public statements and their research allocation" and "we won't actually get many chances to fail and learn from our mistakes" does not establish more than, say, 1% credence in "AI will kill us all," if even that. But I think it would be a misreading of the classic texts to say that they were wrong or misleading because of this; probably if you went back in time and asked Bostrom right before he published the book whether he agrees with you re the implications of OT and ICT on their own, he would have completely agreed. And the text itself seems to agree.

What a 20-year-lead in military tech might look like

Thanks for the constructive criticism. I didn't cite things because this isn't research, just a dump of my thoughts. Now that you mention it, yeah I probably should have talked about area denial tech. I had considered saying something about EMP/jamming/etc., I don't know why I didn't...

For the other stuff you mention, well, I didn't think that fit the purpose of the post. The title is "What a 20-year lead in military tech might look like," not "What military conflicts will look like in 20 years."

What a 20-year-lead in military tech might look like

I mean, I did say as much in the OP basically. Defense is always more cost-efficient than offense; the problem is that when you sacrifice your mobility you have to spread out your forces whereas the enemy can concentrate theirs.

For attacking drones, winning means the same thing it usually does. I don't see how it would be different.

Is the work on AI alignment relevant to GPT?

I think it's a reasonable and well-articulated worry you raise.

My response is that for the graphing calculator, we know enough about the structure of the program and the way in which it will be enhanced that we can be pretty sure it will be fine. In particular, we know it's not goal-directed or even building world-models in any significant way, it's just performing specific calculations directly programmed by the software engineers.

By contrast, with GPT-3 all we know is that it's a neural net that was positively reinforced to the extent that it correctly predicted words from the internet during training, and negatively reinforced to the extent that it didn't. So it's entirely possible that it does, or will eventually, have a world-model and/or goal-directed behavior. It's not guaranteed, but there are arguments to be made that "eventually" it would have both, i.e. if we keep making it bigger and giving it more internet text and training it for longer. I'm rather uncertain about the arguments that it would have goal-directed behavior, but I'm fairly confident in the argument that eventually it would have a really good model of the world. The next question is then how this model is chosen. There are infinitely many world-models that are equally good at predicting any given dataset, but that diverge in important ways when it comes to predicting whatever is coming next. It comes down to what "implicit prior" is used. And if the implicit prior is anything like the universal prior, then doom. Now, it probably isn't the universal prior. But maybe the same worries apply.

What if memes are common in highly capable minds?

One scenario that worries me: At first the number of AIs is small, and they aren't super smart, so they mostly just host normal human memes and seem as far as we (and even they) can tell to be perfectly aligned. Then, they get more widely deployed, and now there are many AIs and maybe they are smarter also, and alas it turns out that AIs are a different environment than humans, in a way which was not apparent until now. So different memes flourish and spread in the new environment, and bad things happen.

What a 20-year-lead in military tech might look like

I was imagining an AI that is less smart and innovative than Elon Musk, but thinks faster and can be copied arbitrarily many times. Elon is limited to one brain thinking eighteen hours a day or so; the vast majority of the thought-work is done by the other humans he commands. If we had AI that could think as well as a pretty good engineer, but significantly faster, and we could make lots of copies fairly cheaply, then we'd be able to do things much faster than Elon can with his teams of humans.

What a 20-year-lead in military tech might look like

Interesting, could you provide a link?

Another interpretation is that the predictions were right about what was possible, but wrong about how long it would take. For example, Kurzweil's 1999 predictions of what 2009 would look like were mostly wrong, but if instead you pretend they are predictions about 2019 they are almost entirely correct.

Load More