Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

Longer bio:


AI Alignment Writing Day 2019

Transcript of Eric Weinstein / Peter Thiel Conversation

AI Alignment Writing Day 2018

Share Models, Not Beliefs


Is OpenAI increasing the existential risks related to AI?
Answer by Ben PaceAug 12, 20208Ω4

I think it's fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It's really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided. 

(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying "objectives can be changed quickly when issues surface" and "existential threats to humanity have to be explicitly designed as such".)

An obvious reason to think OpenAI's impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I'm aware that there are arguments about why a shorter timeline is better, but I'm not sold on them right now.)

There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity's prospects.

That said, I'm quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity's broader good, with some relatively sane models around these specific aspects of what's important. Those were both substantial updates for me, and make me feel pretty cooperative with them.

And of course I'm very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I'm very grateful that the organisation provides this.

Overall I don't feel my opinion is very robust, and could easily change. Here's some example of things that I think could substantially change my opinion:

  • How senior decision-making happens at OpenAI
  • What technical models of AGI senior researchers at OpenAI have
  • Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded
Is OpenAI increasing the existential risks related to AI?
Answer by Ben PaceAug 12, 20206Ω3

See all the discussion under the OpenAI tag. Don't forget SSC's post on it either.

I mostly think we had a good discussion about it when it launched (primarily due to Ben Hoffman and Scott Alexander deliberately creating the discussion).

Zombies: The Movie

Bayesop's Fables is a great name, I'm stealing it.

Tags Discussion/Talk Thread

All of them get tagged AI. Not all of the technical content gets tagged AI risk – for example, when Scott Garrabrant writes curious things like prisoner's dilemma with costs to modelling, this is related to embedded agency, but it's not at all clearly relevant to AI risk, only indirectly at best. The ones that are explicitly about AI risk get tagged that way, such as What Failure Looks Like, or The Rocket Alignment Problem get tagged AI risk.

Tags Discussion/Talk Thread

I think AI Risk is open to improvement as a name, but it's definitely a more narrow category than AI. AI includes reviews of AI textbooks, explanation of how certain ML architectures work, and just anything relating to AI. AI risk is about the downside risk and analysis of what the risk looks like.

Marcello's Shortform

Hm, I like this, I feel resolved against 'aspiring rationalist', which was always losing anyway because it's a longer and less catchy phrase.

My take on CHAI’s research agenda in under 1500 words

Curated. Short and sweet summary of a research approach being pursued in AI alignment, that I think has not been written up like this before (especially on LessWrong).

Load More