# All Posts

Sorted by Magic (New & Upvoted)

# Wednesday, August 5th 2020Wed, Aug 5th 2020

Frontpage Posts
Personal Blogposts
Shortform
2mr-hire9hCFAR's "Adjust Your Seat" principle and associated story is probably one of my most frequently referenced concepts when teaching rationality techniques. I wish there was a LW post about it.

# Tuesday, August 4th 2020Tue, Aug 4th 2020

Shortform
14Buck19hI used to think that slower takeoff implied shorter timelines, because slow takeoff means that pre-AGI AI is more economically valuable, which means that economy advances faster, which means that we get AGI sooner. But there's a countervailing consideration, which is that in slow takeoff worlds, you can make arguments like ‘it’s unlikely that we’re close to AGI, because AI can’t do X yet’, where X might be ‘make a trillion dollars a year’ or ‘be as competent as a bee’. I now overall think that arguments for fast takeoff should update you towards shorter timelines. So slow takeoffs cause shorter timelines, but are evidence for longer timelines. This graph [https://www.dropbox.com/s/camtto747uqyqqq/IMG_1553.jpg?dl=0] is a version of this argument: if we notice that current capabilities are at the level of the green line, then if we think we're on the fast takeoff curve we'll deduce we're much further ahead than we'd think on the slow takeoff curve. For the "slow takeoffs mean shorter timelines" argument, see here: https://sideways-view.com/2018/02/24/takeoff-speeds/ This [https://sideways-view.com/2018/02/24/takeoff-speeds/ This] point feels really obvious now that I've written it down, and I suspect it's obvious to many AI safety people, including the people whose writings I'm referencing here. Thanks to Caroline Ellison for pointing this out to me, and various other people for helpful comments. I think that this is why belief in slow takeoffs is correlated with belief in long timelines among the people I know who think a lot about AI safety.
8Adele Lopez1dHalf-baked idea for low-impact AI: As an example, imagine a board that's lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant. In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling exponent is bounded above by a certain amount. Anyway, to even do this, you still need to make sure the agent's model is honestly evaluating the scaling exponent. And you would still need to define this stuff a lot more rigorously. I think this idea is more useful in the case where you already have an AI with high-level corrigible intent and want to give it a general "common sense" about the kinds of experiments it might think to try. So it's probably not that useful, but I wanted to throw it out there.
4Charlie Steiner1dIt seems like there's room for the theory of logical-inductor-like agents with limited computational resources, and I'm not sure if this has already been figured out. The entire trick seems to be that when you try to build a logical inductor agent, it's got some estimation process for math problems like "what does my model predict will happen?" and it's got some search process to find good actions, and you don't want the search process to be more powerful than the estimator because then it will find edge cases. In fact, you want them to be linked somehow, so that the search process is never in the position of taking advantage of the estimator's mistakes - if you, a human, are making some plan and notice a blind spot in your predictions, you don't "take advantage" of yourself, you do further estimating as part of the search process. The hard part is formalizing this handwavy argument, and figuring out what other strong conditions need to be met to get nice guarantees like bounded regret.

# Sunday, August 2nd 2020Sun, Aug 2nd 2020

Shortform
6AllAmericanBreakfast3dI'm experimenting with a format for applying LW tools to personal social-life problems. The goal is to boil down situations so that similar ones will be easy to diagnose and deal with in the future. To do that, I want to arrive at an acronym that's memorable, defines an action plan and implies when you'd want to use it. Examples: OSSEE Activity - "One Short Simple Easy-to-Exit Activity." A way to plan dates and hangouts that aren't exhausting or recipes for confusion. DAHLIA - "Discuss, Assess, Help/Ask, Leave, Intervene, Accept." An action plan for how to deal with annoying behavior by other people. Discuss with the people you're with, assess the situation, offer to help or ask the annoying person to stop, leave if possible, intervene if not, and accept the situation if the intervention doesn't work out. I came up with these by doing a brief post-mortem analysis on social problems in my life. I did it like this: 1. Describe the situation as fairly as possible, both what happened and how it felt to me and others. 2. Use LW concepts to generalize the situation and form an action plan. For example, OSSEE Activity arose from applying the concept of "diminishing marginal returns" to my outings. 3. Format the action plan into a mnemonic, such as an acronym. 4. Experiment with applying the action plan mnemonic in life and see if it leads you to behave differently and proves useful.
5TurnTrout3dIf you measure death-badness from behind the veil of ignorance, you’d naively prioritize well-liked, famous people with large families.
4Raemon3dAn interesting thing about Supernatural Fitness (a VR app kinda like Beat Saber) is that they are leaning hard into being a fitness app rather than a game. You don't currently get to pick songs, you pick workouts, which come with pep talks and stretching and warmups. This might make you go "ugh, I just wanna play a song" and go play Beat Saber instead. But, Supernatural Fitness is _way_ prettier and has some conceptual advances over Beat Saber. And... I mostly endorse this and think it was the right call. I am sympathetic to "if you give people the ability to choose whatever, they mostly choose to be couch potatoes, or goodhart on simple metrics", and if you want people to do anything complicated and interesting you need to design your app with weird constraints in mind. (Example: LessWrong deliberately doesn't show users the view-count of their posts. We already have the entire internet as the control group for what happens if you give people view-counts – they optimize for views, and you get clickbait. Is this patronizing? Yeah. Am I 100% confident it's the right call? No. But, I do think if you want to build a strong intellectual culture, it matters what kinds of Internet Points you give [or don't give] people, and this is at least a judgment call you need to be capable of making) But... I still do think it's worth looking at third options. Sometimes, I might really want to just jam to some tunes, and I want to pick the specific In the case of Supernatural Fitness, I think it is quite important that your opening experience puts you in the mindset of coaches and workouts, and that songs are clustered together in groups of 15+ minutes (so you actually get a cardio workout), and that they spend upfront effort teaching you the proper form for squats and encouraging you to maintain that form rather than "minimizing effort" (which I think Beat Saber ends up teaching you, and if you're coming from Beat Saber, you might have already internalized habits around) At f

# Saturday, August 1st 2020Sat, Aug 1st 2020

Shortform
9AllAmericanBreakfast4dAre rationalist ideas always going to be offensive to just about everybody who doesn’t self-select in? One loved one was quite receptive to Chesterton’s Fence the other day. Like, it stopped their rant in the middle of its tracks and got them on board with a different way of looking at things immediately. On the other hand, I routinely feel this weird tension. Like to explain why I think as I do, I‘d need to go through some basic rational concepts. But I expect most people I know would hate it. I wish we could figure out ways of getting this stuff across that was fun, made it seem agreeable and sensible and non-threatening. Less negativity - we do sooo much critique. I was originally attracted to LW partly as a place where I didn’t feel obligated to participate in the culture war. Now, I do, just on a set of topics that I didn’t associate with the CW before LessWrong. My guess? This is totally possible. But it needs a champion. Somebody willing to dedicate themselves to it. Somebody friendly, funny, empathic, a good performer, neat and practiced. And it needs a space for the educative process - a YouTube channel, a book, etc. And it needs the courage of its convictions. The sign of that? Not taking itself too seriously, being known by the fruits of its labors.

# Friday, July 31st 2020Fri, Jul 31st 2020

Personal Blogposts
1Ann ArborAug 19th
0
Shortform
7AllAmericanBreakfast5dI'm annoyed that I think so hard about small daily decisions. Is there a simple and ideally general pattern to not spend 10 minutes doing arithmetic on the cost of making burritos at home vs. buying the equivalent at a restaurant? Or am I actually being smart somehow by spending the time to cost out that sort of thing? Perhaps: "Spend no more than 1 minute per $25 spent and 2% of the price to find a better product." This heuristic cashes out to: * Over a year of weekly$35 restaurant meals, spend about $35 and an hour and a half finding better restaurants or meals. * For$250 of monthly consumer spending, spend a total of $5 and 10 minutes per month finding a better product. * For bigger buys of around$500 (about 2x/year), spend $10 and 20 minutes on each purchase. * Buying a used car ($15,000) I'd spend $300 and 10 hours. I could use the$300 to hire somebody at $25/hour to test-drive an additional 5-10 cars, a mechanic to inspect it on the lot, a good negotiator to help me secure a lower price. * For work over the next year ($30,000), spend $600 and 20 hours. * Getting a Master's degree ($100,000 including opportunity costs), spend 66 hours and $2,000 finding the best school. * Choosing from among STEM career options ($100,000 per year), spend about 66 hours and $600 per year exploring career decisions. Comparing that with my own patterns, that simplifies to: Spend much less time thinking about daily spending. You're correctly calibrated for ~$500 buys. Spend much more time considering your biggest buys and decisions.
6mr-hire5d"Medium Engagement Activities" are the death of culture creation. Expecting someone to show up for a ~1-hour or more event every week that helps shape your culture is great for culture creation, or requiring them to wear a dress code - large commitments are good in the early stages. Removing trivial inconveniences to following your values and rules is great for building culture, doing things that require no or low engagement but help shape group cohesion. Design does a lot here - no commitment tools to shape culture are great during early stages. But medium commitment tools are awful, a series of little things that take 5-50 minutes a week to work on - these things are death to early stage cultures. It's death by a 1000 cuts of things they can't see clear immediate benefit for, and which they can see clear immediate cost for. I don't know why exactly this is, and haven't really mapped out what's behind this intuition, it's something about the benefits of building identity vs. the time required, it's ushaped, the tails are a much more effective tradeoff than the middle.

# Thursday, July 30th 2020Thu, Jul 30th 2020

Personal Blogposts
Shortform
8Raemon6dWith some frequency, LW gets a new user writing a post that's sort of... in the middle of having their mind blown by the prospect of quantum immortality and MWI. I'd like to have a single post to link them to that makes a fairly succinct case for "it adds up to normality", and I don't have a clear sense of what to do other that link to the entire Quantum Physics sequence. Any suggestions? Or, anyone feel like writing said post if it doesn't exist yet?
4Gurkenglas6dThe wavefunctioncollapse algorithm measures whichever tile currently has the lowest entropy. GPT-3 always just measures the next token. Of course in prose those are usually the same, but I expect some qualitative improvements once we get structured data with holes such that any might have low entropy, a transformer trained to fill holes, and the resulting ability to pick which hole to fill next. Until then, I expect those prompts/GPT protocols to perform well which happen to present the holes in your data in the order that wfc would have picked, ie ask it to show its work, don't ask it to write the bottom line of its reasoning process first. Long shortform short: Include the sequences in your prompt as instructions :)
4romeostevensit6dOne of the things the internet seems to be doing is a sort of Peter Principle sorting for attention grabbing arguments. People are finding the level of discourse that they feel they can contribute to. This form of arguing winds up higher in the perceived/tacit cost:benefit tradeoff than most productive activity because of the perfect tuning of the difficulty curve, like video games.
2[comment deleted]6d

# Wednesday, July 29th 2020Wed, Jul 29th 2020

Shortform
8TurnTrout7dThis might be the best figure I've ever seen in a textbook. Talk about making a point! Molecular Biology of the Cell, Alberts.
3Sherrinford7dThe results of Bob Jacob's LessWrong survey [https://www.lesswrong.com/posts/yn4Aw6jejS3SHKuzu/2020-lesswrong-demographics-survey-results] are quite interesting. It's a pity the sample is so small. The visualized results (link in his post) are univariate, but I would like to highlight some things: 49 out of 56 respondents identifying as "White", 53 out of 59 respondents born male and 46 out of 58 identifying male cisgender 47 of 59 identifying as heterosexual (comparison: https://en.wikipedia.org/wiki/Demographics_of_sexual_orientation [https://en.wikipedia.org/wiki/Demographics_of_sexual_orientation) 1]) 1 out of 55 working in a "blue collar" profession Most people identify as "left of center" in some sense. At the same time, 30 out of 55 identify as "libertarian", but there were multiple answers allowed. 31 of 59 respondents think they are at least "upper middle class"; 22 of 59 think the family they were raised in was "upper middle class". (Background: In social science surveys, wealthy people usually underestimate their position, and poor people overestimate it but to a lesser extent.) I would not have guessed the left-of-center identification, and I would have slightly underestimated the share of male (cisgender).
2AllAmericanBreakfast7dA checklist for the strength of ideas: Think "D-SHARP" * Is it worth discussing? * Is it worth studying? * Is it worth using as a heuristic? * Is it worth advertising? * Is it worth regulating or policing? Worthwhile research should help the idea move either forward or backward through this sequence.

# Tuesday, July 28th 2020Tue, Jul 28th 2020

Personal Blogposts
Shortform
4avturchin8dSome random ideas how to make GPT-base AI safer. 1) Scaffolding: use rule-based AI to check every solution provided by GPT part. It could work for computations or self-driving or robotics, but not against elaborated adversarial plots. 2) Many instances. Run GPT several times and choose random or best answer - we already doing this. Run several instances of GPT with different parameters or different training base and compare answers. Run different prompt. Median output seems to be a Shelling point around truth, and outstanding answers are more likely to be wrong or malicious. 3) Use intrinsic GPT properties to prevent malicious behaviour. For example, higher temperature increases randomness of the output and mess up with any internal mesa optimisers. Shorter prompts and lack of long memory also prevents complex plotting. 4) Train and test on ethical database. 5) Use prompts which include notion of safety, like "A benevolent AI will say..." or counterfactuals which prevents complex planing in real world (An AI on the Moon) 6) Black boxing of internal parts of the system like the NN code. 7) Run it million times in test environments or tasks. 8) Use another GPT AI to make "safety TL;DR" of any output or prediction of possible bad things which could happen from a given output. Disclaimer: Safer AI is not provably safe. It is just orders of magnitude safer than unsafe one, but it will eventually fail.
1ejenner8dGradient hacking is usually discussed in the context of deceptive alignment. This is probably where it has the largest relevance to AI safety but if we want to better understand gradient hacking, it could be useful to take a broader perspective and study it on its own (even if in the end, we only care about gradient hacking because of its inner alignment implications). In the most general setting, gradient hacking could be seen as a way for the agent to "edit its source code", though probably only in a very limited way. I think it's an interesting question which kinds of edits are possible with gradient hacking, for example whether an agent could improve its capabilities this way.
1Bob Jacobs8dThe Law of the Minimization of Mystery [https://news.ycombinator.com/item?id=941387] by David J. Chalmers [https://en.wikipedia.org/wiki/David_Chalmers] is just a less precise version of the Law of Likelihood [https://en.wikipedia.org/wiki/Likelihood_principle#:~:text=The%20law%20of%20likelihood,-A%20related%20concept&text=is%20the%20degree%20to%20which,if%20less%2C%20then%20vice%20versa.]

# Monday, July 27th 2020Mon, Jul 27th 2020

Personal Blogposts
Shortform
15Sunny from QAD9dIt's happened again: I've realized that one of my old beliefs (pre-LW) is just plain dumb. I used to look around at all the various diet (Paleo, Keto, low carb, low fat, etc.) and feel angry at people for having such low epistemic standards. Like, there's a new theory of nutrition every two years, and people still put faith in them every time? Everybody swears by a different diet and this is common knowledge, but people still swear by diets? And the reasoning is that "fat" (the nutrient) has the same name as "fat" (the body part people are trying to get rid of)? Then I encountered the "calories in = calories out" theory, which says that the only thing you need to do to lose weight is to make sure that you burn more calories than you eat. And I thought to myself, "yeah, obviously.". Because, you see, if the orthodox asserts X and the heterodox asserts Y, and the orthodox is dumb, then Y must be true [https://www.lesswrong.com/posts/qNZM3EGoE5ZeMdCRt/reversed-stupidity-is-not-intelligence] ! Anyway, I hadn't thought about this belief in a while, but I randomly remembered it a few minutes ago, and as soon as I remembered its origins, I chucked it out the window. Oops! (PS: I wouldn't be flabbergasted if the belief turned out true anyway. But I've reverted my map from the "I know how the world is" state to the "I'm awaiting additional evidence" state.)
2G Gordon Worley III9dI feel like something is screwy with the kerning on LW over the past few weeks. Like I keep seeing sentences that look like they are missing space between the period and the start of the next sentence but when I check closely they are not. For whatever reason this doesn't seem to show in the editor, only in the displayed text. I think I've only noticed this with comments and short form, but maybe it's happening other places? Anyway, wanted to see if others are experiencing this and raise a flag for the LW team that a change they made may be behaving in unexpected ways.
2AllAmericanBreakfast9dWhy isn’t California investing heavily in desalination? Has anybody thought through the economics? Is this a live idea?
1crabman9dYou started self quarantining, and by that I mean sitting at home alone and barely going outside, since december or january. I wonder, how's it going for you? How do you deal with loneliness?