~www_lesswrong_com | Bookmarks (682)

DeepMind: Frontier Safety Framework — LessWrong

lesswrong.com

Published on May 17, 2024 5:30 PM GMTDeepMind's RSP is here. Excerpt from the blogpost:Today, we...
Published on May 17, 2024 5:30 PM GMTDeepMind's RSP is here. Excerpt from the blogpost:Today, we are introducing our Frontier Safety Framework - a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. Our Framework focuses on severe risks resulting from powerful capabilities at the model level, such as...
1
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning — LessWrong

lesswrong.com

Published on May 17, 2024 4:25 PM GMTA short summary of the paper is presented below.This...
Published on May 17, 2024 4:25 PM GMTA short summary of the paper is presented below.This work was produced by Apollo Research in collaboration with Jordan Taylor (MATS + University of Queensland) .TL;DR: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the...
1
AISafety.com – Resources for AI Safety — LessWrong

lesswrong.com

Published on May 17, 2024 3:57 PM GMTThere are many resources for those who wish to...
Published on May 17, 2024 3:57 PM GMTThere are many resources for those who wish to contribute to AI Safety, such as courses, communities, projects, jobs, events and training programs, funders and organizations. However, we often hear from people that they have trouble finding the right resources. To address this, we've built AISafety.com as a central hub—a list-of-lists—where community members maintain and curate these...
1
My Hammer Time Final Exam — LessWrong

lesswrong.com

Published on May 17, 2024 9:28 AM GMTEpistemic Status: I thought about and wrote each paragraph...
Published on May 17, 2024 9:28 AM GMTEpistemic Status: I thought about and wrote each paragraph in 10 minutes total, with slight editing afterwards.I hope I'm not too late to the party! I wrote this up quite a few months ago and found that I delayed indefinitely editing it before publication. I decided it's probably best to post a not-maximally-edited version of my final exam. This...
1
Is There Really a Child Penalty in the Long Run? — LessWrong

lesswrong.com

Published on May 17, 2024 11:56 AM GMTA couple of weeks ago three European economists published...
Published on May 17, 2024 11:56 AM GMTA couple of weeks ago three European economists published this paper studying the female income penalty after childbirth. The surprising headline result: there is no penalty.Setting and MethodologyThe paper uses Danish data that tracks IVF treatments as well as a bunch of demographic factors and economic outcomes over 25 years. Lundborg et al identify the causal effect...
1
Is there a place to find the most cited LW articles of all time? — LessWrong

lesswrong.com

Published on May 17, 2024 1:20 AM GMTI expect it would be useful when developing an...
Published on May 17, 2024 1:20 AM GMTI expect it would be useful when developing an understanding of the language used on LW.Discuss
1
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures — LessWrong

lesswrong.com

Published on May 17, 2024 12:25 AM GMTThis is a D&D.Sci scenario: a puzzle where players...
Published on May 17, 2024 12:25 AM GMTThis is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.Duke Arado’s obsession with physics-defying architecture has caused him to run into a small problem. His problem is not – he affirms – that his interest has in any way waned: the menagerie...
1
To an LLM, everything looks like a logic puzzle — LessWrong

lesswrong.com

Published on May 16, 2024 10:21 PM GMTI keep seeing this meme doing the rounds where...
Published on May 16, 2024 10:21 PM GMTI keep seeing this meme doing the rounds where people present ChatGPT with a common logic problem or riddle, only with some key component changed to make it trivial. ChatGPT has seen the original version a million times, so it gives the answer to the original, not the actually correct and obvious answer.The idea is to show...
1
AI Safety Institute's Inspect hello world example for AI evals — LessWrong

lesswrong.com

Published on May 16, 2024 8:47 PM GMTSharing my detailed walk-through on using the UK AI...
Published on May 16, 2024 8:47 PM GMTSharing my detailed walk-through on using the UK AI Safety Institute's new open source package Inspect for AI evals.Main points:Package released in early May 2024 is here: https://github.com/UKGovernmentBEIS/inspect_aiSeems easy to use and removes boiler-plate code. I am new to evals so I do not know what experienced researchers would look for in such a tool. I am...
1
Feeling (instrumentally) Rational — LessWrong

lesswrong.com

Published on May 16, 2024 6:56 PM GMTContra this post from the SequencesIn Eliezer's sequence post,...
Published on May 16, 2024 6:56 PM GMTContra this post from the SequencesIn Eliezer's sequence post, he makes the following (excellent) point:I can’t find any theorem of probability theory which proves that I should appear ice-cold and expressionless.This debunks the then-widely-held view that rationality is counter to emotions. He then goes on to claim that emotions have the same epistemic status as the beliefs...
1
How is GPT-4o Related to GPT-4? — LessWrong

lesswrong.com

Published on May 15, 2024 6:33 PM GMTGPT-4o both has a new tokenizer and was trained...
Published on May 15, 2024 6:33 PM GMTGPT-4o both has a new tokenizer and was trained directly on audio (whereas my understanding is that GPT-4 was trained only on text and images). Is there precedent for upgrading a model to a new tokenizer? It seems like it's probably better to think of it as an entirely new model. If that's the case, what actually...
1
[Linkpost] Please don't take Lumina's anticavity probiotic — LessWrong

lesswrong.com

Published on May 15, 2024 6:03 PM GMTI suspect some number of LWers have taken or...
Published on May 15, 2024 6:03 PM GMTI suspect some number of LWers have taken or are are considering using Lumina's probiotic. If you're in either of those camps, Klee's post might be worth reading. He paints a picture of an unprofessional company skirting regulations and risking customers health to sell a dubious health product. I can't speak to the veracity of those claims,...
1
Was Partisanship Good for the Environmental Movement? — LessWrong

lesswrong.com

Published on May 15, 2024 5:30 PM GMTThis is the third in a sequence of posts...
Published on May 15, 2024 5:30 PM GMTThis is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan?SummaryRising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions.Public OpinionOne hypothesis is that partisanship was useful, or even necessary, for...
1
Quantized vs. continuous nature of qualia — LessWrong

lesswrong.com

Published on May 15, 2024 12:52 PM GMTThis question is not very well-posed, but I've done...
Published on May 15, 2024 12:52 PM GMTThis question is not very well-posed, but I've done my best to make it as well-posed as I can.Suppose that humans with sufficiently functional brains are able have subjective experiences that transcend the "easy problems of consciousness".I'm interested in understanding if this can be reasonably accepted without also concluding a theory of some sort of "panpsychism". For...
1
How to be a messy thinker — LessWrong

lesswrong.com

Published on May 15, 2024 11:57 AM GMTCrossposted from my blog: https://invertedpassion.com/how-to-be-a-messy-thinker/I love thinking about thinking....
Published on May 15, 2024 11:57 AM GMTCrossposted from my blog: https://invertedpassion.com/how-to-be-a-messy-thinker/I love thinking about thinking. Give me a research paper on rationality, cognitive biases or mental models, and I’ll gobble it up. Given the amount of knowledge I’ve ingested on these topics, I had always assumed that I’m a clear thinker.Recently, though, it hit me like a lightning strike that this belief is...
1
Embedded Whistle Synth — LessWrong

lesswrong.com

Published on May 15, 2024 2:50 AM GMT A few years ago I ported my whistle...
Published on May 15, 2024 2:50 AM GMT A few years ago I ported my whistle synth system from my laptop to a Raspberry Pi. This was a big improvement, but I still wasn't that happy: To get good quality audio in and out I was using a 2i2 audio interface, which is expensive, bulky, and has a lot of buttons and knobs that...
1
Catastrophic Goodhart in RL with KL penalty — LessWrong

lesswrong.com

Published on May 15, 2024 12:58 AM GMTTLDR: In the last two posts, we showed that...
Published on May 15, 2024 12:58 AM GMTTLDR: In the last two posts, we showed that optimizing for a proxy can fail to increase true utility, but only when the error is heavy-tailed. We now show that this also happens in RLHF with a KL penalty.This post builds on our earlier result with a more realistic setting and assumptions:Rather than modeling optimization as conditioning...
1
Ilya Sutskever and Jan Leike resign from OpenAI — LessWrong

lesswrong.com

Published on May 15, 2024 12:45 AM GMTIlya Sutskever and Jan Leike have resigned. They led...
Published on May 15, 2024 12:45 AM GMTIlya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.Reasons are unclear (as usual when safety people leave OpenAI).The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it...
1
my note system — LessWrong

lesswrong.com

Published on May 15, 2024 12:20 AM GMTI've been told that my number of blog posts...
Published on May 15, 2024 12:20 AM GMTI've been told that my number of blog posts is impressive, but my personal notes are much larger than my blog, over a million words and with higher information density. Since I've had a bit of practice taking notes, I thought I'd describe the system I developed. It's more complex than some integrated solutions, but it's powerful,...
1
MIRI's May 2024 Newsletter — LessWrong

lesswrong.com

Published on May 15, 2024 12:13 AM GMTMIRI updates:MIRI is shutting down the Visible Thoughts Project.We...
Published on May 15, 2024 12:13 AM GMTMIRI updates:MIRI is shutting down the Visible Thoughts Project.We originally announced the project in November of 2021. At the time we were hoping we could build a new type of data set for training models to exhibit more of their inner workings. MIRI leadership is pessimistic about humanity’s ability to solve the alignment problem in time, but this...
1
GPT-4o is out — LessWrong

lesswrong.com

Published on May 13, 2024 6:33 PM GMTOpenAI just announced an improved LLM called GPT-4o.From their...
Published on May 13, 2024 6:33 PM GMTOpenAI just announced an improved LLM called GPT-4o.From their websiteToday, GPT-4o is much better than any existing model at understanding and discussing the images you share. For example, you can now take a picture of a menu in a different language and talk to GPT-4o to translate it, learn about the food's history and significance, and get...
1
Somerville Porchfest Thoughts — LessWrong

lesswrong.com

Published on May 13, 2024 5:20 PM GMT This Saturday was Porchfest in Somerville, an annual...
Published on May 13, 2024 5:20 PM GMT This Saturday was Porchfest in Somerville, an annual festival where musicians around the city play on their porches and people wander around listening. As in the past few years Cecilia and I ( Kingfisher) played for contra dancing: Harris Lapiroff called: If anyone has pictures of videos from the set, I'd love to see them as...
1
Branding AI Safety Groups: A Field Guide — LessWrong

lesswrong.com

Published on May 13, 2024 5:17 PM GMTThis article is the first in a series I plan to...
Published on May 13, 2024 5:17 PM GMTThis article is the first in a series I plan to publish on different aspects of AI Safety group strategy. The aim is that, eventually, these articles will form the basis for a new resource center for AI Safety Groups. Note that these articles aren’t being published in any particular order.TL;DR: AI safety groups should carefully consider their branding strategy to...
1
Against Student Debt Cancellation From All Sides of the Political Compass — LessWrong

lesswrong.com

Published on May 13, 2024 2:55 PM GMTA stance against student debt cancellation doesn’t rely on...
Published on May 13, 2024 2:55 PM GMTA stance against student debt cancellation doesn’t rely on the assumptions of any single ideology. Strong cases against student debt cancellation can be made based on the fundamental values of any section of the political compass. In no particular order, here are some arguments against student debt cancellation from the perspectives of many disparate ideologies.Equity and FairnessStudent...
1

~www_lesswrong_com | Bookmarks (682)

Domains