Alex Turner, Oregon State University PhD student working on AI alignment.

TurnTrout's Comments

Possible takeaways from the coronavirus pandemic for slow AI takeoff

I'm not Wei, but I think my estimate falls within that range as well. 

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

If it isn't constant-length, then it seems strange to assume Solomonoff induction would posit a large objective universe, given that such positing wouldn't help it predict its inputs efficiently (since such prediction requires locating agents).

but a solomonoff ind doesn’t rank hypotheses on whether they allow efficient predictions of some feature of interest, it ranks them based on posterior probabilities (prior probability + to what extent the hypothesis accurately predicted observations so far).

Open & Welcome Thread—May 2020

welcome! Come on in, the water's fine. 

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

The way I understand Solomonoff induction, it doesn't seem like the complexity of specifying the observer scales logarithmically with the total number of observers. it's not like there's a big phone book of observers in which locations are recorded. Rather, it should be the complexity of saying "and my camera is here". 

AGIs as populations

In other words, I think of patching your way to good arguments

As opposed to what?

TurnTrout's shortform feed

Sentences spoken aloud are a latent space embedding of our thoughts; when trying to move a thought from our mind to another's, our thoughts are encoded with the aim of minimizing the other person's decoder error.

Conclusion to 'Reframing Impact'

if you're managing a factory, I can say "Rohin, I want you to make me a lot of paperclips this month, but if I find out you've increased production capacity or upgraded machines, I'm going to fire you". You don't even have to behave greedily – you can plan for possible problems and prevent them, without upgrading your production capacity from where it started.

I think this is a natural concept and is distinct from particular formalizations of it.

edit: consider the three plans

  1. Make 10 paperclips a day
  2. Make 10 paperclips a day, but take over the planet and control a paperclip conglomerate which could turn out millions of paperclips each day, but which in fact never does.
  3. take over the planet and make millions of paperclips each day.
Conclusion to 'Reframing Impact'

Why do you object to the latter?

Conclusion to 'Reframing Impact'

if we believe that regularizing A pursuing keeps A's power low

I don't really believe the premise

with respect to my specific proposal in the superintelligent post, or the conceptual version?

Conclusion to 'Reframing Impact'

I've updated the post with epistemic statuses:

  • AU theory describes how people feel impacted. I'm darn confident (95%) that this is true.
  • Agents trained by powerful RL algorithms on arbitrary reward signals generally try to take over the world. Confident (75%). The theorems on power-seeking only apply in the limit of farsightedness and optimality, which isn't realistic for real-world agents. However, I think they're still informative. There are also strong intuitive arguments for power-seeking.
  • CCC is true. Fairly confident (70%). There seems to be a dichotomy between "catastrophe directly incentivized by goal" and "catastrophe indirectly incentivized by goal through power-seeking", although Vika provides intuitions in the other direction.
  • AUP prevents catastrophe (in the outer alignment sense, and assuming the CCC). Very confident (85%).
  • Some version of AUP solves side effect problems for an extremely wide class of real-world tasks, for subhuman agents. Leaning towards yes (65%).
  • For the superhuman case, penalizing the agent for increasing its own AU is better than penalizing the agent for increasing other AUs. Leaning towards yes (65%).
  • There exists a simple closed-form solution to catastrophe avoidance (in the outer alignment sense). Pessimistic (35%).
Load More