Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, focusing on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism and has a doctorate in philosophy from the University of Oxford.
Much of his work is about trying to help us orient wisely towards humanity’s long-term future. He delves into questions about meta-ethics and rationality at the foundation, feeding into questions about ethics (and especially about effective altruism), which motivate concern for the long-term future.
Session Summary
Join us to explore Joseph Carlsmith’s insights into his ongoing work and thoughts on issues including AI alignment, lesser-known future risks, infinite ethics and digital minds, and the sublime utopia.
Carlsmith shares his concerns in ensuring that advanced AI systems behave beneficially for humanity. However, he emphasizes the importance of broadening the horizon to identify and address other critical factors beyond technical AI alignment. For instance, he delves into infinite ethics, addressing the ethical considerations involving infinite impacts and numbers of people – necessary if we are going to meet future realities.
Despite the challenges, this episode is devoted to the exploration of utopia. Beyond the lesser boundary of a concrete utopia, Carlsmith envisions the sublime utopia: a realm of aspirational goals and visions. Although filled with vulnerabilities tied to hoping for extraordinary and unprecedented outcomes, Carlsmith lays out the essential nature of such a pursuit.
Joseph is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and has a doctorate in philosophy from the University of Oxford.
Philipp Lenssen from Germany has been exploring between technology and art for all his life. He developed sandbox universes Manyland and wrote a technology blog for 7 years. He's currently working on new daily pictures at Instagram.com/PhilippLenssen.
Carlsmith’s vision isn't focused on specific technological advancements or societal changes. Instead, it's centered on a fundamental transformation in our understanding and perception. He beautifully describes a future where we see "less dimly" than we do now.
Quoting Corinthians, Carlsmith says, "Now we see as a mirror dimly, then we will see face to face. Now we know in part, then we will know in full.” He envisions a future where humanity has a deeper, more lucid understanding of ourselves, each other, and the universe.
The future he dreams of is one of unfiltered clarity, where we perceive and interact with the world around us in a more direct and honest way. It's a future where we meet each other and the universe, in some sense, "face to face".
‍Beatrice Erkers: Thank you so much for coming, Joe, to our Existential Hope group and our Existential Hope podcast. Joe is a research analyst at Open Philanthropy. I believe you are focusing mainly on existential risks from AI, and I know you also have a doctorate from the University of Oxford in philosophy. I basically came across your writing, especially on utopias, and that is why I thought it would be really fun to have you here in the group. I read your work, Actually Possible: Thoughts on Utopia article, and I know you have written some stuff on infinite ethics that I am really curious to hear about. I thought it was a perfect fit for coming on this podcast, so welcome! I will start off by asking who are you and what are you working on?
Joseph Carlsmith: Thanks for having me. So yes, I work on existential risk from AI and kind of more broadly, on how we should orient toward the long-term future of humanity. I got into that via philosophy and, to some extent, became kind of interested in big-picture questions. Then that sort of connected me with folks who were thinking specifically about the future. I think AI is an especially kind of pivotal technology with respect to how the future goes. So that has been my focus there. Then I do some philosophical work in writing independently, and I also work at open philanthropy, which is a foundation that cares a lot about this kind of stuff.
Beatrice Erkers: Wonderful. In terms of what you are working on, what does normal, everyday life look like for you as a researcher at Open Philanthropy?
Joseph Carlsmith: It is a lot of reading, writing, slack, and Google Docs. It is nothing too exotic. I think a unique aspect of my work at Open Phil, in particular, is that it offers a one-of-a-kind intersection of high-level philosophical considerations, feeding into fairly object-level kinds of choices, interventions, and grants. That has been especially exciting for me as a philosopher. I think many philosophers are kind of hungry for a translation of their ideas into some sort of impact, and Open Phil is a cool chance to do that.
Beatrice Erkers: Would you want to maybe dive into anything, in particular, that you are researching right now, with a focus on AI?
Joseph Carlsmith: Yes, right now, I am thinking about AI alignment and what sort of techniques, ideas, and plans are available for making sure that our AI systems are understood and behaving in ways that are beneficial for humanity at a somewhat higher level. Over the past six months or so, I have been working on a cluster of topics that assume we are going to have really transformational AI technology within the next couple of decades. These are also focused on scanning the horizon for important features, levers, interventions, and issues other than technical AI alignment to make sure a lot of the discourse at Open Philanthropy and nearby communities, with respect to AI, focuses on this question of alignment and whether these systems will be suitably under our control. So, I have been looking at questions about this and seeing if there are other things that need to be on our radar, too. For example, things that have to do with moral nationhood, suffering, misuse risk, and galaxy-branch things, such as infinities and the multiverse. I have been looking at these more broadly and seeing if there are other things we should be thinking about. Very recently, however, I have pulled back into alignment.
Beatrice Erkers: When you look at alignment, and it is not technical alignment, is it policy? What is it?
Joseph Carlsmith: Right now, I am looking on a more technical level at different proposals for aligning AI systems. I think it is probably closer to that kind of alignment than policy, but I think they merge. However, I am doing ML experiments. It is at a higher level than that.
Beatrice Erkers: Â I have been really curious to dive into your infinite ethics writing. Does that go into what you are doing right now? Also, maybe what infinite ethics is for someone who does not know?Â
Joseph Carlsmith: Sure. So infinite ethics is an ethical theory that tries to grapple with infinities in various respects. In particular, things would include thinking about how you choose between actions that have some possibility of impacting an infinite number of people or otherwise having a kind of infinite impact on the world. More broadly, you can also think about how you would rank different situations that involve infinite numbers of people. This is an interesting and important topic at a theoretical level because, basically, most of our ethical theory is developed in a context that just ignores infinities. That makes things very simple. It is also instructive, however, where, insofar as when you bring the infinities in, suddenly, a lot of these theories break.Â
In fact, we can actually go further and show that there are a number of impossibility results that you can get in an infinite context where a number of very attractive axioms or ethical principles you might have thought were compatible are actually incompatible in the infinite case. So, there is a kind of theoretical challenge. I think there's also actually just a practical challenge, which is that there is some chance that we live in an infinite universe or some chance that we can have an infinite impact on our actions. I think we actually face these questions in a practical context, too.
Beatrice Erkers: Do you have an example of one of those things you think we are taking for granted that is working as an assumption today, but if we bring in the infinite aspect, it does not hold?
Joseph Carlsmith: I think a very simple one is that there is a sort of reasoning that people sometimes use in thinking about how much to prioritize the long-term future. This was made famous by Nick Bostrom in his piece, Astronomical Waste, where he does a quick back-of-the-envelope calculation as to how many people might exist in the future. In it, he gets this astronomical number and expectation in virtue of the possibility that we go out and settle the stars, turn them into optimal computational substrates, run minds on them, and stuff like that. So, you get like 10 to the power 50 people could exist an expectation in the future. Then, if there is an event that cancels that future, an existential risk, then the thought becomes an astronomical catastrophe. Reducing it is, in utilitarian terms, reducing the probability of existential risk, which should be an overwhelming ethical priority. The argument relies on a combination of certain population ethics that values these future people in a certain way. It also relies on a certain expected value reasoning.Â
There is a standard way of doing expected value reasoning, where you weigh the actions by different worlds and what they do, and how much value is at stake in each of those worlds. However, if you try to do that with infinities, it just breaks immediately. If you have different infinite outcomes with infinite value at stake, you have some probability of doing infinite harm, infinite goodness, and both, which will help and hurt an infinite number of people at the same time. Basically, that is a very early way that infinities start to matter. If you are doing Evie reasoning, it just gets a lot harder. Actually, we are doing Evie reasoning quite a lot and many arguments are made in those terms, but they just sort of bracket the infinite case.
Beatrice Erkers: If we try to get more concrete in terms of this thinking of the future, given you wrote your thoughts on utopia and its possibility, can we dive into that? First, you suggest that there are two different types of utopias. May you explain those two different versions first?
Joseph Carlsmith: Sure. So the post distinguishes between what I call concrete utopias and sublime utopias. Concrete utopias are basically just utopias that are described in concrete ways that we can directly imagine and engage with. This includes a lot of fictional utopias and even some of the kind of political philosophy that tries to deal with utopias when there is a situation with humans. When you imagine them, maybe they have kind of somewhat rearranged political structures, maybe they have more abundance, or maybe there is a certain vibe. In these situations, people are secretly describing a dystopia and pointing out the flaws of something that would have seemed good.Â
I am resistant to concrete utopias because I worry they will undersell what is possibly at stake and just how good the future could be. There is this example in the post that I got from Nick Bostrom, which focuses on imagining early primates deciding whether to evolve into humans. They are given a chance to imagine the future that they could have if they evolved into humans today. They ask something like, “Will we get a lot of bananas if we become human?” And they will; We have quite a lot of bananas now, but it is not the right question to be asking. There are ways in which it misses what is really at stake in this horizon, which is beyond our comprehension in some sense.Â
I am personally more sympathetic to modes of engagement with the utopia that route via what I am calling the sublime conception of utopia. It is conceiving of utopia in a way that foregrounds its otherness and incomprehensibility, but also where that otherness and incomprehensibility is, in some sense, an extension of the direction that things travel when they become better. I think that has downsides too. It can kind of be empty of content, and it can lose resonance with our direct emotional experience. But I think it is also more accurate, and I think there are ways of getting a grip on it that bring it to life.
Beatrice Erkers: Yeah. I think your boss, Holden Karnofsky, has written this thing also on conservative utopias towards more radical utopias. What is interesting, in terms of this concrete utopia versus sublime utopia, is what you are saying regarding our capacity to imagine greatness in the future. Then there is also the trade-off with the more sublime utopia, where you lose a few people along the way. What he writes with the conservative utopia, I think, is the world as it is minus cancer or something like that. That is maybe a bit too boring for most of us.
However, this somewhere in between, I think, is more like imagining an episode of Friends or something like that, where everyone has their material needs met. The challenge is that we are free, but still, it is fairly similar to this world. First, what is your personal idea of this is where I would like to go? Then secondly, what do you think is the best in terms of communicating utopia, trying to get more people on board with the ideas, or getting excited about it?Â
Joseph Carlsmith: Cool. Yeah. I think they are somewhat different, and it also does depend on how people are oriented. There are two steps in my kind of conception of utopia. One has to do with a lower bound, which is the extrapolation of the no more cancer thing where you just think about the true horrors of the world, such as poverty, disease, death, and oppression. Not the sort of, “Oh, maybe it is ultimately good to have a little bit of it,” but that this should just end. There is a basic and very deep feature of the idea that, at the very least, we can just end the horrors of this world. There are questions at the edges about exactly what that looks like, but I think there is something about just making gentle the life of this world at least, which you do not need to get into transhumanism to resonate with. I think that is a kind of lower bound.Â
Then I think we do not know what the upper bound is. People have different appetites for alienness in that respect. My own favorite way of engaging with it, and also in terms of presenting it to others and to myself, is using a method that I call extrapolating the direction of our best experiences, or the best kind of collective or individual experiences. It is a notable feature of just our everyday human lives, for many of us, that the best moments, whether you know, by ourselves or with others, there are times that are just remarkably better than the average or the mundane. Whether that is some kind of experience of love, or music, or people who do drugs, or meditation etc, there are these different peak experiences of different kinds where you have this. This is a kind of degree of aliveness and reality that is kind of a qualitative shift from some sort of baseline. So in seeing this is possible, we see that, at the very least, life can be as good as the best it has ever been for anyone, and that is really good.Â
It can be easy to forget how good things can be there. Generally, there is something that is hard about remembering really intense experiences, maybe especially really good experiences when you are not having them. You remember it was really amazing, but you are caught up in some mundane fog. It can be hard to stay in connection with that, but when you are there, you are like, wow. At the very least, we can get that. I think that is another step above a kind of lower bound. Then I think what we should do is see the direction that our minds, our lives, or our communities moved when they became that qualitative level of better, look in that direction, and then extrapolate. Suppose you could go much much further in that direction. What would that be?Â
I think we basically also know that that is possible, too, especially with respect to individual experience. It seems very, very likely to me that our capacity for good experiences is in some sense gated or constrained by our brains and are specific contingencies of our current biology. So the space of mind that we currently occupy is really tiny. If you really imagine being able to explore the whole space of ways of being and ways of arranging our minds, our communities, and our relationships, it seems very likely that there's just an expansion of scale and that is really quite radical. It is not just a pretty version of our current world. It is really something profoundly different, but still recognizable as an extension of what is good now.
Beatrice Erkers: Yeah, I like the idea of having it somewhat recognizable. I have spent some time in the Foresight community now, and I think we have a lot of transhumanists, postfeminist members. So, with a lot of the futures that are being discussed, I am starting to feel familiar with it, and it is less alien. However, it does seem, when communicating on a broader scale, it seems that you need to have humans in the future, I guess. I would be curious to hear if you think it is important to think about these things. What function does it fill to think about the future or a utopia?
Joseph Carlsmith: I think it is important. At a basic level, and this is one of the things I say in that essay, I feel it is just true that the future could be this good. I think our prior should be if there is a fact. It is good to recognize that fact, especially if it seems as if it might be relevant to the conception of our situation and our goals. I do not think there is like a big burden of proof in recognizing this. Now, I do think there is a history of failure modes with respect to human engagement with the notion of utopia, which is important to have in mind. However, just to say do not think about it, I think, is a strange conclusion.Â
On top of that, there is also just a basic way in which it is a source of hope. I mean, this series of existential hope, I think, is actually quite important to our sense of the stakes of what we do. Sometimes, our culture can discourage people from inhabiting and taking their deepest aspirations, dreams seriously, and this sort of best. There is a way in which cynicism and various curtailing of your aspirations for life and your devotion to life where people may think there is something cool, or maybe safe, about being more jaded, despairing, or pessimistic. I think we can talk about that, but I think it is just wrong.Â
Empirically, it is just the case that, in fact, things might well be extraordinary in the future. Now, I am not sure, and we have to do our part in that respect. I also think knowing that gives you a sense of the actual story that we are a part of here. What are the stakes? It is a different narrative arc or possible narrative arc, and your sense of the culmination of the human story and what human pain, joy, and striving will have ultimately been a part of if we get things right. So for me, it is just an important structural feature of how I orient toward my own choices and toward my sense of what is going on.
Beatrice Erkers: Yeah, I appreciate that. I think I am going to try and back up a bit. Would you describe yourself as optimistic about the future? If so, what made you? If not, why not?
Joseph Carlsmith: I think I am probably optimistic relative to a sort of baseline. There is a way of engaging with the future that maybe has not done very much in the way of imagining grand, very long-term futures. It instead thinks about the future centrally through questions about, “Oh, what is going to happen with climate change?” Or “Does it seem like our political system is on a good trajectory or a bad one?” It seems a few centuries out and centrally with reference to current political conditions or current issues. I think that it is easy for teams both on the left and on the right. I am not sure if there is a hopeful party or hope. Everyone seems to get more oomph out of pessimism.
I think I am probably optimistic relative to that baseline. I think there is a different baseline that I am more pessimistic relative to. There is a worldview made famous by Steven Pinker, where to some extent, there is a sort of pro-growth. I think transhumanists have some of this vibe or have in the past, where the forces in the world have shown massive improvements in the world along tons of dimensions. These include areas of poverty, literacy, and democracy over the past century and a half or something like that. I feel drawn to that narrative that if we keep going with that, then we are just on a great trajectory.Â
I also think I am more relative to people with that specific view, which I kind of wish I could hold. I have more concern about sort of ways of irreversibly messing up, and my work on AI is a big part of that. I also think there are, unfortunately, a variety of other ways in which we could drastically curtail the value at stake in the future. I take that pretty seriously. However, I am more optimistic than, say, Eliezer Yudkowsky, who thinks we are definitely doomed and stuff like that. Nevertheless, I am still spending most of my time paying attention to the worst-case scenarios, so in that sense, I do not look like an optimist.
Beatrice Erkers: Yeah, I wanted to ask you about that as well. Existential risks seem obvious in terms of things we should pay attention to. The term existential hope was taken from a paper by Toby Ord and Owen Cotton-Barrett, where they were trying to define both of the concepts. How do you relate to both of the concepts in your everyday life? Most of the actions you take seem to be focused on preventing existential risks. I suppose it is somewhat driven by an existential hope angle.
Joseph Carlsmith: Yeah, I think it is driven by existential hope. I actually think this is a problem or is an insufficiently questioned aspect of the discourse around existential risk because there is an assumption that the future you get if you don't fall prey to an existential risk is kind of massively good in expectation. In some sense, you get the Steven Pinker narrative unless you fall off the train. You are on this upward trajectory, but the existential risk community adds the concern that you are going to fall off. But if you don't fall off, then it is all up and up to the stars and this kind of blissful light. So, I think that narrative is actually quite a hopeful one.Â
You will see that there are communities of people who are much more pessimistic about what the default future looks like. They think it might be net negative or actively bad to make it to the future. Those people will be correspondingly less worried about existential risk and more interested in things like improving the future conditional on it being very long or making sure there is not a lot of suffering in the future, which is a somewhat different orientation.Â
I actually think there are a lot of bad futures, even without existential risk. I think we could have things that are not irreversible lock-in events, something that is not a discrete thing. We could have just mediocrity, and we could just mess up along tons of dimensions. So there is a form of hope that I don't have, which is just as long as we do not mess up totally, irrevocably, then everything will be good. There might actually be more effort to be made with respect to attending to worlds where we do not have traditional existential risk. Nevertheless, we need to improve it. But, ultimately, yes, I think making it to these really, really hopeful scenarios are a lot of what drives me.
Beatrice Erkers: In general, what would you say makes you excited about the long-term future?
Joseph Carlsmith: Yeah, I mean, I think it is a lot of what I said about ending the horrors and going the distance in terms of how good things can be. There is maybe another dimension of that for me personally, though, it is less clear how this scales. I care a lot about some comprehension or truth and knowledge-related value. Right now, in addition to being small in our capacity for welfare and joy, we are also just radically under comprehending of our situation. We are not able to look the universe in the eye. There is a huge amount that we do not know. We are in the earliest stage of becoming aware of what is really going on, and a part of me cares about having civilization understand that situation fully. That is an animating value for me but less clear as to how it plays into how you value scenarios overall.
Beatrice Erkers: Usually in this podcast, we ask for a specific positive vision of the future. Could you share a vision of existential hope for the future? The way that we also do it in this podcast is that we try to take that eucatastrophe from the paper by Toby and Owen. It is basically the opposite of a catastrophic event, where there is much more value after it happens. Then we try to use AI art generators to create an art piece based on this prompt. If you could share a positive vision of the future or a eucatastrophe, please do.
Joseph Carlsmith: Yeah, I do not have a concrete one, as I tend to prefer the sublime mode of conceiving of the future. One thing that comes to mind for me is a line in Corinthians about something like, “Now we see as a mirror dimly, then then we will see face to face. Now we know in part, then we will know in full.” So, I think that is an important image for me. I would say meeting each other and the universe, in some sense, face to face or seeing less dimly than we do now.
Beatrice Erkers: Basically, seeing clearly. I like that. I feel as though both scientists and philosophers answer about seeing reality or learning more about reality, which seems like an endless pursuit. There is also the term eucatastrophe, which is a terrible word. We have tried to find better versions of it. If you have any proposals for what we should call it instead, we would be very happy to take them. Do you have any suggestions? It is also okay if you don’t.
Joseph Carlsmith: I wonder about something such as existential victory or something like that. I mean, it seems that the big disadvantage eucatastrophe has is that it has got the word “catastrophe” in it, so it is just very hard for that to be good. Anyway, I wonder about centrally replacing that one.
Beatrice Erkers: I like that. One of the more common suggestions is anastrophe, but it still has the “trophe.” I think something like existential victory sounds good. So, in terms of your vision, it was more broad. In terms of trying to think of what you think would be us moving in the right direction, obviously AI risk is a big one for you. Are there any undervalued risks or challenges that you think we need to get around to ensure we are on the right trajectory?
Joseph Carlsmith: Yeah, I think we need to be doing a lot more to prepare for treating digital minds well and thinking about how there are a lot of new questions that are going to arise. If we get to a point where we can create agents and forms of artificial intelligence that can control or have sufficient understanding of, there is still a question of what is the moral status for the different minds we are creating in that respect. What is the right set of rights and protections to give to those minds? There is also a cluster of related issues that actually have more to do with how do our social, political, and legal systems adapt in the face of new practical features of a population once we are talking about digital minds.Â
A very concrete example in a paper from Boston and Schulman that I like is that we have this deep principle in democracies of one person and one vote. Then there is the question, once you can make more citizens very easily if you have more GPUs, do they count as distinct people? Then I can maybe tweak them a little bit, and then they vote, and then they maybe get deleted or collapse back. Basically, there are really tough questions about how do we start to govern and allocate influence once a lot of the people, minds, or stakeholders are digital. For instance, if you are a sadist or a person without a conscience, you will be able to create suffering if you have a computer. How are we going to govern that?Â
Currently, we have these questions about animals, but it is not possible to grab a human and do what you like with them. There is a way in which kind of digital minds are at the whim of people that control their hardware in a way that I think we are not ready for. There are a bunch of questions there that I think partly do not get as much attention by virtue of this narrative that I tried to push back on a little bit. The narrative of, “Well, if we do not have the existential risk, then everything will be good, and we will figure everything out.” I think a lot of what I am worried about is, well, even if we do not die or have an AI take over, we have failed to take the rights, moral status, and stakes of digital minds seriously. Obviously, there is a long history of oppression of different minds that are especially nonhuman or not of our group type mind. That is a big one for me that I would flag, and there are various others.
Beatrice Erkers: That is a super interesting one that I have not really heard a lot about before. It is very interesting, also in relation to how we treat animals. Do you have any recommended reading? Is it that paper by Bostrom and Schulman if one wants to look into this?
Joseph Carlsmith: I think that would be the place I would start. It is a paper called Propositions Concerning Digital Minds and Society. There is also another good paper by them, which is a bit more philosophical, which is called Sharing the World with Digital Minds. It gets into questions regarding ways in which any conception of moral patienthood can pretty easily see digital minds as capable of what they call super-patienthood, which is their more neutral term for what another context called utility monster. It is something that, in virtue of its capacity for preference, satisfaction, or welfare, outweighs a bunch of other things. So there are some tough questions there, too. Overall, I would start with the Propositions Concerning Digital Minds paper. I think that is great work.
Beatrice Erkers: Yeah, thank you. In terms of your work on AI alignment, there is so much happening there right now. I do not know what your timeline is, but in the next five years, is there anything in particular, in terms of a breakthrough, that would make you feel as though we are on the right track?
Joseph Carlsmith: Things that will make me feel substantially more optimistic over the next five years if I get to just ask for them, I think, would be if we just understood what was going on in neural networks much, much better than we do. In particular, if we were at a point of being able to re-engineer or create using more traditional programming methods or human-readable code systems that were as capable as GPT4. Or I would say if we could reverse engineer what is going on in GPT4. Just sort of radical progress and interpretability, I think, is relatively easy to imagine, but it is quite difficult to actually do it. We are very far away, but a fairly deep intuition for why we are not ready for AI and why AI alignment is a serious problem is just that we do not know how these systems work. We are kind of working with black boxes, and that just doesn't look like a recipe for handling the hottest fire we have ever tried to handle. That is something I would love to see.Â
Also, somewhat more realistically, I think there are good forms of coordination, regulation, evaluation and auditing regimes that the world could converge on in the next five years. I think that is more realistic, but there is still a question of how much that gets you. Obviously, there are questions about the right way to do that. However, very broadly, we want to be getting to a point where we have quantitative consensus tests, such that if your AI is capable of replicating itself if connected to the internet, we would know that before you deploy and then we won’t deploy it or something like that. Knowing these concrete things that gate the deployment of systems and then having effective coordination across the relevant actors would also help with my optimism, though there will still be a lot to do even if we get that.
Beatrice Erkers: Yeah, thank you for sharing that. I think in terms of your working on AI alignment, could you provide an overview of the field you are operating for someone new entering? What are the possibilities and the challenges?
Joseph Carlsmith: Sure. With respect to the AI alignment I am concerned about, it is about making sure that, by the time we are deploying very advanced AI systems, we are able to aim them and control them in ways that result in alignment with our intentions. Very broadly, you can break that down into governance-related work, which is about setting up the social, political, and coordination environment to implement whatever technical solutions are required to cause AI systems we deployed to have the relevant safety and alignment properties. Then there is the technical work of figuring out what those techniques are. Within the technical side, I think the biggest breakdown in my head is work that is focused on interpretability, which is focused on understanding the internals of these models and building up a better understanding of how they are doing the tasks that they are doing. I think that work is really great if you can get it, but I think it is my own read that it is very, very far behind and unlikely to progress with shorter timeline scenarios.Â
I am more personally interested in working on what is called scalable oversight. Right now, the way you train the model and make sure that it is behaving in accordance with your intentions is this process called reinforcement learning from human feedback, or LHF. Basically, you have the model. You have humans, roughly speaking, observing what it is doing, reading it, and whether it is behaving well, and then you update the model. From there, it is more likely to behave in ways that are given high ratings. The issue is that it works when humans can evaluate how good the behavior is. However, if we start to move into a regime where models are doing incredibly complicated things, then humans cannot evaluate because the machines are smarter than humans. For instance, the model gives you this giant code base, so here is all this code, but what is going to run it? How do you scale our ability to provide oversight for these models? There is a bunch of work on that. That is the area I am most excited about and would direct a generic, technically talented young person towards maybe first.
Beatrice Erkers: That seems like very important work to be doing at this time. Do you have any reading or listening recommendations to learn more about this?
Joseph Carlsmith: To learn about AI alignment?
Beatrice Erkers: Yes, good starting material.
Joseph Carlsmith: Let's see. I mean, I have a long report that I wrote about the nature of the problems, and there are a number of other people like Richard Ngo, who have pieces that tried to lay out the basics of why this is an issue. If you are interested in interpretability, I would recommend looking at Christopher Olah’s work. I think if you are interested in scalable oversight, I would recommend maybe starting with Paul Christiano’s work and related work on a variety of different ideas for how to do scalable oversight. Paul is maybe a good place to start there.
Beatrice Erkers: Thank you. We have about 10 minutes left. I want to dive in and go back to the route of existential hope and a positive future angle. The first question that I will ask relates to it being seemingly hard for people to envision positive long-term futures. Envisioning dystopias are easier, and that is what we are seeing in our fiction. It seems we are more so being fed dystopias in our culture currently. How do you think we can best change that?
Joseph Carlsmith: I don't have a great story about why. One thing I will say is that I think, fiction-wise, there are some pressures toward having a plot, and plots require tension. In some sense, stories about rich people who have all their needs met are kind of utopias, but maybe that is one pressure. Whatever it is that makes people so negative, even in the midst of the world, improving in a lot of ways could play a role. I am not sure. In terms of changing, getting compelling positive visions that really kind of speak to people's deepest aspirations seems to be a good point.Â
There seems to be something kind of vulnerable about having really direct hope for something really amazing. The kind of more you hope, the more you are kind of setting yourself up for disappointment or something. Or there is some way in which if you really, really care about something where it can be scary. People do have these worries about utopian thinking, extremism, and ways in which that has gone wrong. Similarly, and rightly to some extent, people have deep connections between this utopian thinking and animating impulse impulses that work in religion. I actually think that is okay, and that is something I talked about in the essay. Nevertheless, I think that is maybe another source of resistance here, and it should warrant engagement.
Beatrice Erkers: Do you have a personal recommendation of something that showcases a more positive future, such as a book or movie?
Joseph Carlsmith: There is this book, The Precipice by Toby Ord, which I helped with, so I am a bit biased here. However, the last chapter, chapter eight, has a nice statement of the possible stakes of a good future. There is also an essay by Nick Bostrom, Letter from Utopia, which has some kooky parts, which has been an important component for me.Â
Beatrice Erkers: Yeah, I think those are both great recommendations. On the existential hope website, we have a lot of resources listed for what people have recommended, and those are definitely on there.
Joseph Carlsmith: I will throw in one more that I think is underappreciated. Will MacAskill’s book What We Owe The Future. There is a QR at the end that links to a story that he wrote, which is his vision of a really good future. That is another one you can check out.
Beatrice Erkers: I came across that one, “Afterwards” I think it is called. It was one of my favorites so far. It mentioned people have the ability to talk to animals, and those are my favorite futures. Is there anything else in terms of what you have been working and writing on in relation to the future that you think we have yet to get into? Is there anything else we should try to touch on as well?
Joseph Carlsmith: There is a frame that has been more salient to me recently, though I am still trying to understand it well. This is also something in Nick Bostrom’s work. In addition to thinking entirely in terms of how we have a good future for ourselves, there is a frame on understanding humanity or Earth's technological trajectory as also entering into what might be a sort of community throughout the universe of advanced civilizations. It is unclear as to the different types of exotic forms of interaction these kinds of advanced civilizations can have. It is a question of whether our causally accessible part of our code has any other intelligent life in it, and it looks like maybe not. There are more exotic modes of interaction between civilizations that I have spent a little bit of time thinking about.
I think that if you get into that mindset where you are less alone, and you think there are already quite a lot of intelligent civilizations somewhere, then there is a somewhat different frame, which questions how you can be a good citizen of that cosmos as a whole. Or what are productive, unhealthy, cooperative, and harmonious forms of interaction between everyone, not just people on earth or AIs? Anyway, it is an exotic domain, but it is one that I think is kind of both interesting and possibly irrelevant. I think it is a sort of intuition pump of what citizenship is and what are the norms and virtues that we bring to our interactions with human society. What would that look like if we sort of extrapolate to a broader context as well?
Beatrice Erkers: I think it is super interesting how you are using your philosophy background and applying it to these possibly real-world scenarios. We may never know if there are others in the universe. Given the things you mentioned about what it means to be a citizen and be a useful citizen, do you think it is useful to think about these things so far out in a way because it helps us see more clearly where we are today?
Joseph Carlsmith: I am not sure exactly what level of useful that particular frame is. I think a part of it is that there can be a vibe in some of these contexts where, especially in the context of AI alignment, the central narrative people use to frame what they are doing is something akin to a kind of competition for power between different value systems. So the fear is that the AIs are going to have different values from us, and the resources are scarce. The question is sort of which value systems end up with rich resources and stuff like this. I think the vibe of this can end up voracious, competitive, and uncooperative.Â
I am interested in ways in which that framing can neglect the sort of accumulated wisdom of things going on regarding topics such as liberalism and property rights. How do we leave agents with different boundaries and a domain in which they can act autonomously and where we do not interfere? There is just a bunch of stuff that I think a very naive ontology of rational agents competing for resources does not necessarily capture. There are questions about what level of abstraction those considerations are relevant. I have some sense that thinking about citizenship, boundaries, and a bunch of other things might be useful as we start to enter a world with agents that are potentially much more different than the other different agents we have been used to interacting with.
Beatrice Erkers: That is true if we stay too much in our scarcity mindset, then it does not really help us in this situation or help us get to the truth. Is there anything you think we can do to prepare better for this future we are heading towards?Â
Joseph Carlsmith: To some extent, all of this work on existential risk and other things is about preparing for a potentially crazy future. I think one intervention that I will flag here is one that aims to improve our epistemology and our wisdom more broadly, which seems more robustly useful. There are questions about how we can do better forecasting and maybe AI-assisted forecasting. How can we use AI to help us discern the truth about things, reason well, understand our values well, deliberate well, and cooperate well? There are a host of virtues in the vicinity of reason, wisdom, clarity, and truth that seem pretty good and important as we start to transition into an especially unfamiliar environment. New technologies might be able to help us with that.
Beatrice Erkers: Thank you. Yeah, like I said, there are a lot of transhumanists in the Foresight community. A lot of those technologies are here already. The future is here. It is just unevenly distributed. I think we only have one minute left now, but I will ask you the final question. What is the best advice you have gotten?
Joseph Carlsmith: I am not sure if this counts as advice, but I remember in 2009, I think, I visited the San Francisco Zen Center for the first time. I remember this quote from Dogen written on the bell that they hit, which stated, “Great is the matter of life and death; Life slips quickly; Wake up; Wake up; Do not waste a moment.” Anyway, I think those are words to live by.
Beatrice Erkers: Yeah, that is beautiful. Do not waste this life. Also very powerful if you add a gong to it. Thank you so much, Joe, for coming and sharing your thoughts on this. We definitely have your article on utopia linked on the existential hope website. Thank you so much for coming.
Joseph Carlsmith: Thank you for having me.
‍
Recommended Reads Mentioned:
Terms Mentioned: