Gurkenglas

I operate by Crocker's rules.

Comments

Gurkenglas's Shortform

I expect that all that's required for a Singularity is to wait a few years for the sort of language model that can replicate a human's thoughts faithfully, then make it generate a thousand year's worth of that researcher's internal monologue, perhaps with access to the internet.

Neural networks should be good at this task - we have direct evidence that neural networks can run human brains.

Whether our world's plot has a happy ending then merely depends on the details of that prompt/protocol - such as whether it decides to solve alignment before running a successor. Though it's probably simple to check alignment of the character - we have access to his thoughts. A harder question is whether the first LM able to run humans is still inner aligned.

What should an Einstein-like figure in Machine Learning do?

Can you locally replicate GPT? For example, can GPT-you compress WebText better than GPT-2?

Power as Easily Exploitable Opportunities

SOTA: Penalize my action by how well a maximizer that takes my place after the action would maximize a wide variety of goals.

If we use me instead of the maximizer, paradoxes of self-reference arise that we can resolve by inserting a modal operator: Penalize my action by how well I expect I would maximize a wide variety of goals (if given that goal). Then when considering the action of stepping towards an omnipotence button, I would expect that given that I decided to take one step, I would take more, and therefore penalize the first step a lot. Except if there's plausible deniability, because the first step towards the button is also a first step towards my concrete goal, because then I might still expect to be bound by the penalty.

I've suggested using myself before in the last sentence of this comment: https://www.lesswrong.com/posts/mdQEraEZQLg7jtozn/subagents-and-impact-measures-full-and-fully-illustrated?commentId=WGWtoKDrnN3o6cS6G

PSA: Tagging is Awesome

Long outputs will tend to naturally deteriorate, as it tries to reproduce the existing deterioration and accidentally adds some more. Better: Sample one tag at a time. Shuffle the inputs every time to access different subdistributions. (I wonder how much the subdistributions differ for two random shuffles...) If you output the tag that has the highest minimum probability in each of a hundred subdistributions, I bet that'll produce a tag that's not in the inputs.

PSA: Tagging is Awesome

You make it sound like it wants things. It could at most pretend to be something that wants things. If there's a UFAI in there that is carefully managing its bits of anonymity (which sounds as unlikely as your usual conspiracy theory - a myopic neural net of this level should keep a secret no better than a conspiracy of a thousand people), it's going to have better opportunities to influence the world soon enough.

PSA: Tagging is Awesome

Just ask GPT to do the tagging, people.

Gurkenglas's Shortform

The wavefunctioncollapse algorithm measures whichever tile currently has the lowest entropy. GPT-3 always just measures the next token. Of course in prose those are usually the same, but I expect some qualitative improvements once we get structured data with holes such that any might have low entropy, a transformer trained to fill holes, and the resulting ability to pick which hole to fill next.

Until then, I expect those prompts/GPT protocols to perform well which happen to present the holes in your data in the order that wfc would have picked, ie ask it to show its work, don't ask it to write the bottom line of its reasoning process first.

Long shortform short: Include the sequences in your prompt as instructions :)

How will internet forums like LW be able to defend against GPT-style spam?

The obvious answer to spammers being run by GPT is mods being run by GPT. Ask it whether every comment is high-quality/generated, then act on that as needed to keep the site functional.

Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

It was meant as a submission, except that I couldn't be bothered to actually implement my distribution on that website :) - even/especially after superintelligent AI, researchers might come to the conclusion that we weren't prepared and *shouldn't* build another - regardless of whether the existing sovereign would allow it.

Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide)

Answering with a point estimate seems rather silly. Shouldn't it answer with a distribution? Then one question would be enough.

Load More