Iterated Amplification

Oct 29, 2018

by paulfchristiano

This is an upcoming sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification. The posts will be released in the coming weeks, on average every 2-3 days.

Preface to the sequence on iterated amplification

393moΩ 12Show Highlight
0

Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

The Steering Problem

373moΩ 9Show Highlight
4

Clarifying "AI Alignment"

543moΩ 15Show Highlight
46

An unaligned benchmark

273moΩ 8Show Highlight
0

Prosaic AI alignment

363moΩ 10Show Highlight
0

Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

Approval-directed agents

223moΩ 3Show Highlight
8

Approval-directed bootstrapping

193moΩ 4Show Highlight
0

Humans Consulting HCH

193moΩ 4Show Highlight
8

Corrigibility

393moΩ 9Show Highlight
3

The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

Iterated Distillation and Amplification

203moΩ 5Show Highlight
6

Benign model-free RL

103moΩ 3Show Highlight
0

Factored Cognition

342moΩ 10Show Highlight
3
0

AlphaGo Zero and capability amplification

251moΩ 6Show Highlight
23

What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

Directions and desiderata for AI alignment

291moΩ 10Show Highlight
1

The reward engineering problem

181moΩ 3Show Highlight
3

Capability amplification

241moΩ 7Show Highlight
4

Learning with catastrophes

261moΩ 7Show Highlight
4

Possible approaches

The fifth section of the sequence breaks down some of these problems further and describes some possible approaches.

Thoughts on reward engineering

2424dΩ 5Show Highlight
23

Techniques for optimizing worst-case performance

2320dΩ 7Show Highlight
3

Reliability amplification

2117dΩ 4Show Highlight
3