MichaelA

I’m Michael Aird, a Summer Research Fellow with the Center on Long-Term Risk (though I don’t personally subscribe to suffering-focused views on ethics). During my fellowship, I’ll likely do research related to reducing long-term risks from malevolent actors. Opinions expressed in my posts or comments should be assumed to be my own, unless I indicate otherwise.

Before that, I did existential risk research & writing for Convergence Analysis and grant writing for a sustainability accounting company. Before that, I was a high-school teacher for two years in the Teach For Australia program, ran an EA-based club and charity election at the school I taught at, published a peer-reviewed psychology paper, and won a stand-up comedy award which ~30 people in the entire world would've heard of (a Golden Doustie, if you must know).

If you've read anything I've written, you taking this survey would really help me (see here for context). You can also give me more general feedback here. (Either way, your response will be anonymous by default.)

I mostly post to the EA Forum.

If you think you or I could benefit from us talking, feel free to reach out or schedule a call.

Sequences

Information hazards and downside risks
Moral uncertainty

Comments

MichaelA's Shortform

If any reading this has read anything I’ve written on LessWrong or the EA Forum, I’d really appreciate you taking this brief, anonymous survey. Your feedback is useful whether your opinion of my work is positive, mixed, lukewarm, meh, or negative. 

And remember what mama always said: If you’ve got nothing nice to say, self-selecting out of the sample for that reason will just totally bias Michael’s impact survey.

(If you're interested in more info on why I'm running this survey and some thoughts on whether other people should do similar, I give that here.)

Please take a survey on the quality/impact of things I've written

Why I’m running this survey

I think that getting clear feedback on how well one is doing, and how much one is progressing, tends to be somewhat hard in general, but especially when it comes to:

  • Research
    • And especially relatively big-picture/abstract research, rather than applied research
  • Actually improving the world compared to the counterfactual
    • Rather than, e.g., getting students’ test scores up, meeting an organisation’s KPIs, or publishing a certain number of papers
  • Longtermism

And I’ve primarily been doing big-picture/abstract research aimed at improving the world, compared to the counterfactual, from a longtermist perspective. So, yeah, I’m a tad in the dark about how it’s all been going…[1]

I think some of the best metrics by which to judge research are whether people:

  • are bothering to pay attention to it
  • think it’s interesting
  • think it’s high-quality/rigorous/well-reasoned
  • think it addresses important topics
  • think it provides important insights
  • think they’ve actually changed their beliefs, decisions, or plans based on that research
  • etc.

I think this data is most useful if these people have relevant expertise, are in positions to make especially relevant and important decisions, etc. But anyone can at least provide input on things like how well-written or well-reasoned some work seems to have been. And whoever the respondents are, whether the research influenced them probably provides at least weak evidence regarding whether the research influenced some other set of people (or whether it could, if that set of people were to read it).

This year, I’ve gathered a decent amount of data about the above-listed metrics. But more data would be useful. And the data I’ve gotten so far has usually been non-anonymous, and often resulted from people actively reaching out to me. Both of those factors likely bias the responses in a positive direction. 

So I’ve created this survey in order to get additional - and hopefully less biased - data, as an input into my thinking about: 

  1. whether EA-aligned research and/or writing is my comparative advantage (as I’m also actively considering a range of alternative pathways)
  2. which topics, methodologies, etc. within research and/or writing are my comparative advantage
  3. specific things I could improve about my research and/or writing (e.g., topic choice, how rigorous vs rapid-fire my approach should be, how concise I should be)

But there’s also another aim of this survey. The idea of doing this survey, and many of the questions, was inspired partly by Rethink Priorities’ impact survey. But I don’t recall seeing evidence that individual researchers/writers (or even other organisations) run such surveys.[2] And it seems plausible to me that they’d benefit from doing so. 

So this is also an experiment to see how feasible and useful this is, to inform whether other people should run their own surveys of this kind. I plan to report back here in a couple weeks September with info like how many responses I got and how useful this seemed to be.

[1] I’m not necessarily saying that that type of research is harder to do than e.g. getting students’ test scores up. I’m just saying it’s harder to get clear feedback on how well one is doing.

[2] Though I have seen various EAs provide links to forms for general anonymous feedback. I think that’s also a good idea, and I’ve copied the idea in my own forum bio.

MichaelA's Shortform

See also Open Philanthropy Project's list of different kinds of uncertainty (and comments on how we might deal with them) here

MichaelA's Shortform

Ok, so it sounds like Legg and Hutter's definition works given certain background assumptions / ways of modelling things, which they assume in their full paper on their own definition. 

But in the paper I cited, Legg and Hutter give their definition without mentioning those assumptions / ways of modelling things. And they don't seem to be alone in that, at least given the out-of-context quotes they provide, which include: 

  • "[Performance intelligence is] the successful (i.e., goal-achieving) performance of the system in a complicated environment"
  • "Achieving complex goals in complex environments"
  • "the ability to solve hard problems."

These definitions could all do a good job capturing what "intelligence" typically means if some of the terms in them are defined certain ways, or if certain other things are assumed. But they seem inadequate by themselves, in a way Legg and Hutter don't note in their paper. (Also, Legg and Hutter don't seem to indicate that that paper is just or primarily about how intelligence should be defined in relation to AI systems.)

That said, as I mentioned before, I don't actually think this is a very important oversight on their part.

MichaelA's Shortform

Firstly, I'll say that, given that people already have a pretty well-shared intuitive understanding of what "intelligence" is meant to mean, I don't think it's a major problem for people to give explicit definitions like Legg and Hutter's. I think people won't then go out and assume that wealth, physical strength, etc. count as part of intelligence - they're more likely to just not notice that the definitions might imply that.

But I think my points do stand. I think I see two things you might be suggesting:

  • Intelligence is the only thing that increases an agent’s ability to achieve goals across all environments.
  • Intelligence is an ability, which is part of the agent, whereas things like wealth are resources, and are part of the environment.

If you meant the first of those things, I'd agree that "“Intelligence” might help in a wider range of environments than those [other] capabilities or resources help in". E.g., a billion US dollars wouldn't help someone at any time before 1700CE (or whenever) or probably anytime after 3000CE achieve their goals, whereas intelligence probably would. 

But note that Legg and Hutter say "across a wide range of environments." A billion US dollars would help anyone, in any job, any country, and any time from 1900 to 2020 achieve most of their goals. I would consider that a "wide" range of environments, even if it's not maximally wide.

And there are aspects of intelligence that would only be useful in a relatively narrow set of environments, or for a relatively narrow set of goals. E.g., factual knowledge is typically included as part of intelligence, and knowledge the dates of birth and death of US presidents will be helpful in various situations, but probably in fewer situations and for fewer goals than a billion dollars.

If you meant the second thing, I'd note in response the other capabilities, rather than the other resources. For example, it seems to me intuitive to speak of an agent's charisma or physical strength as a property of the agent, rather than of the state. And I think those capabilities will help it achieve goals in a wide (though not maximally wide) range of environments. 

We could decide to say an agent's charisma and physical strength are properties of the state, not the agent, and that this is not the case for intelligence. Perhaps this is useful when modelling an AI and its environment in a standard way, or something like that, and perhaps it's typically assumed (I don't know). If so, then combining an explicit statement of that with Legg and Hutter's definition may address my points, as that might explicitly slice all other types of capabilities and resources out of the definition of "intelligence". 

But I don't think it's obvious that things like charisma and physical strength are more a property of the environment than intelligence is - at least for humans, for whom all of these capabilities ultimately just come down to our physical bodies (assuming we reject dualism, which seems safe to me).

Does that make sense? Or did I misunderstand your points?

TurnTrout's shortform feed

This seems right to me, and I think it's essentially the rationale for the idea of the Long Reflection.

MichaelA's Shortform

“Intelligence” vs. other capabilities and resources

Legg and Hutter (2007) collect 71 definitions of intelligence. Many, perhaps especially those from AI researchers, would actually cover a wider set of capabilities or resources than people typically want the term “intelligence” to cover. For example, Legg and Hutter’s own “informal definition” is: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” But if you gave me a billion dollars, that would vastly increase my ability to achieve goals in a wide range of environments, even if it doesn’t affect anything we’d typically want to refer to as my “intelligence”.

(Having a billion dollars might lead to increases in my intelligence, if I use some of the money for things like paying for educational courses or retiring so I can spend all my time learning. But I can also use money to achieve goals in ways that don’t look like “increasing my intelligence”.)

I would say that there are many capabilities or resources that increase an agent’s ability to achieve goals in a wide range of environments, and intelligence refers to a particular subset of these capabilities or resources. Some of the capabilities or resources which we don’t typically classify as “intelligence” include wealth, physical strength, connections (e.g., having friends in the halls of power), attractiveness, and charisma. 

“Intelligence” might help in a wider range of environments than those capabilities or resources help in (e.g., physical strength seems less generically useful). And some of those capabilities or resources might be related to intelligence (e.g., charisma), be “exchangeable” for intelligence (e.g., money), or be attainable via intelligence (e.g., higher intelligence can help one get wealth and connections). But it still seems a useful distinction can be made between “intelligence” and other types of capabilities and resources that also help an agent achieve goals in a wide range of environments.

I’m less sure how to explain why some of those capabilities and resources should fit within “intelligence” while others don’t. At least two approaches to this can be inferred from the definitions Legg and Hutter collect (especially those from psychologists): 

  1. Talk about “mental” or “intellectual” abilities
    • But then of course we must define those terms. 
  2. Gesture at examples of the sorts of capabilities one is referring to, such as learning, thinking, reasoning, or remembering.
    • This second approach seems useful, though not fully satisfactory.

An approach that I don’t think I’ve seen, but which seems at least somewhat useful, is to suggest that “intelligence” refers to the capabilities or resources that help an agent (a) select or develop plans that are well-aligned with the agent’s values, and (b) implement the plans the agent has selected or developed. In contrast, other capabilities and resources (such as charisma or wealth) primarily help an agent implement its plans, and don’t directly provide much help in selecting or developing plans. (But as noted above, an agent could use those other capabilities or resources to increase their intelligence, which then helps the agent select or develop plans.)

For example, both (a) becoming more knowledgeable and rational and (b) getting a billion dollars would help one more effectively reduce existential risks. But, compared to getting a billion dollars, becoming more knowledgeable and rational is much more likely to lead one to prioritise existential risk reduction.

I find this third approach useful, because it links to the key reason why I think the distinction between intelligence and other capabilities and resources actually matters. This reason is that I think increasing an agent’s “intelligence” is more often good than increasing an agent’s other capabilities or resources. This is because some agents are well-intentioned yet currently have counterproductive plans. Increasing the intelligence of such agents may help them course-correct and drive faster, whereas increasing their other capabilities and resources may just help them drive faster down a harmful path. 

(I plan to publish a post expanding on that last idea soon, where I’ll also provide more justification and examples. There I’ll also argue that there are some cases where increasing an agent’s intelligence would be bad yet increasing their “benevolence” would be good, because some agents have bad values, rather than being well-intentioned yet misguided.)

Good and bad ways to think about downside risks

Yes, this seems plausible to me. What I was saying is that that would be a reason why the EV of arbitrary actions might often be negative, rather than directly being a reason why people will overestimate the EV of arbitrary actions. The claim "People should take the pure EV perspective" is consistent with the claim "A large portion of actions have negative EV and shouldn't be taken". This is because taking the pure EV perspective would involve assessing both the benefits and risks (which could include adjusting for the chance of many unknown unknowns that would lead to harm), and then deciding against doing actions that appear negative.

Good and bad ways to think about downside risks

I find the idea in those first two paragraphs quite interesting. It seems plausible, and isn't something I'd thought of before. It sounds like it's essentially applying the underlying idea of the optimiser's/winner's/unilateralist's curse to one person evaluating a set of options, rather than to a set of people evaluating one option? 

I also think confirmation bias or related things will tend to bias people towards thinking options they've picked, or are already leaning towards picking, are good. Though it's less clear that confirmation bias will play a role when a person has only just began evaluating the options.

Most systems in our modern world are not anti-fragile and suffer if you expose them to random noise. 

This sounds more like a reason why many actions (or a "random action") will make things worse (which seems quite plausible to me), rather than a reason why people would be biased to overestimate benefits and underestimate harms from actions. Though I guess perhaps people's failure to recognise this reason why many/random actions may make things worse, despite this reason being real, will then lead to them systematically overestimating how positive actions will be.

In any case, I can also think of biases that could push in the opposite direction. E.g., negativity bias and status quo bias. My guess would be there are some people and domains where, on net, there tends to be a bias towards overestimating the value of actions, and some people and domains where the opposite is true. And I doubt we could get a strong sense of how it all plays out just by theorising; we'd need some empirical work. (Incidentally, Convergence should also be releasing a somewhat related post soon, which will outline 5 potential causes of too little caution about information hazards, and 5 potential causes of too much caution.)

Finally, it seems worth noting that, if we do have reason to believe that, by default, people tend to overestimate the benefits and underestimate the harms that an action will cause, that wouldn't necessarily mean we should abandon the pure EV perspective. Instead, we could just incorporate an adjustment to our naive EV assessments to account for that tendency/bias, in the same way we should adjust for the unilateralist's curse in many situations. And the same would be true if it turned out that, by default, people had the opposite bias. (Though if there are these biases, that could mean it'd be unwise to promote the pure EV perspective without also highlighting the bias that needs adjusting for.)

Load More