Last week in AI | #153 - Taylor Swift Deepfakes, ChatGPT features, Meta-Prompting, two new US bills
Challenges of AI Misuse Featuring Public Figures
There has been a notable issue with AI-generated images using public figures such as Taylor Swift, which were created non-consensually and distributed widely.
In response to this and other concerns, the US Senate introduced the Defiance Act to combat the unauthorized dissemination of deepfakes and to address AI safety measures.
AI in Creative Industries and Legislative Hurdles
The integration of AI into various industries continues to raise questions around regulations and ethical considerations.
Legislators struggle to address cloud service loopholes and ethical AI use in warfare while balancing innovation and security.
Technical Enhancements and Policy Developments
Researchers are improving AI model efficiency through novel methods like speculative sampling and enhancing language models for code generation.
Companies like OpenAI and Google are mandated to disclose AI training details, highlighting ongoing discussions about AI's role in national security.
AI and Warfare Ethics
Conclusion
Read the full discussion in the transcript below 👇
Transcript: Last week in AI #153
Hello and welcome to SkyNet Today's Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.ai for articles we did not cover in this episode. I'm one of your hosts, Andrey Krankov. I finished my PhD, focused on AI last year, and I now work at a generative AI startup. And I'm your other host, Jeremy Harris. I'm the co-founder of Gladstone AI, which is an AI safety company. We do stuff with AI national security, export controls, alignment, weaponization, all that fun, fun stuff that has us think about AI as a potentially WMD-like or WMD-enabling piece of tech. What a week. What a week. I feel like we're saying this more and more. When we came back from the holidays, there was like this lull, and there weren't that many papers. Everybody's kind of taking a little bit of time off. Most of the big research was actually China-based, which I found really interesting, kind of what you might expect to. It was kind of a disproportionate amount of China-based research. Now, the Western world's kind of woken up, and we have ICLR. We have a whole bunch of different conferences coming around the corner. Here we go. I feel like we're just hitting the ground running in 2024. Yeah, we're getting going. I feel like also the news in general is getting going in a sense. We had sort of a slight acceleration coming with the election and more and more incidents related to AI happening with that. Now, as we'll discuss, this Taylor Swift story happened and might be the biggest AI story of the year so far, surprisingly. That's something I would have predicted, but there you go. That's how it happens. Before we get into the news, let's just quickly take care of our sponsor. Once again, we are sponsored by the Super Data Science Podcast, which is a great resource to learn about machine learning, AI, data careers, all that stuff. It's interviews with all sorts of people working with data science or AI, hosted by John Krone, chief data scientist and co-founder of the machine learning nebula, and the author of the bestselling book, Deep Learning Illustrated. They post twice a week and have 700, 800 episodes now, a crazy amount with all sorts of people. If you would like to hear from people in the AI space, in addition to hearing about the news as you do on this podcast, we do think the Super Data Science Podcast is a great option for you to check out. Yeah, and everybody already knows that I'm a giant John Krone fan boy. John Krone, very, very good interviewer, very gifted, awesome, awesome guests. I can't recommend the podcast highly enough. Please check it out. As always, you can always start with Jeremy's episodes and learn more about his views on policy and stuff. I keep forgetting that we plug that every time. I feel like it's like a, yeah, it's like it's like an ad for me somehow, I feel uncomfortable. Well, with that, let's go ahead and get started. And I want to do something a little bit different this time, just because it does feel like the biggest news and it's going to come back a little bit. So instead of starting with tools and apps, as we do usually, I figure let's start with synthetic media and art. And as I mentioned, maybe the biggest story of the last couple of weeks, or at least seemingly the most widespread story, and a story that seems to have some impact, we are starting with the deep fakes of Taylor Swift on Twitter or X. So in case you haven't heard, there were AI generated explicit images of Taylor Swift that were widely circulated on X. For instance, a prominent post with these images had 45 million views and hundreds of thousands of interactions before the user's account was suspended for violating the platform's policy. And yeah, there was just a lot of them to the point that at some point, Twitter restricted search for Taylor Swift, as some reported. And some reporting turned up that with many of these, it appeared to be that they were actually created with Microsoft designer, Microsoft's tool, that you're not supposed to be able to do this sort of thing with, but there was a conversation among some people that was found via Telegram. And it seems that they basically found a way to hack it, to make the model do things it wasn't supposed to do via prompt hacking. And they generate these images, post them to X. And there was this crazy storm where suddenly Taylor Swift and AI were in the same sentence a lot. And as we will cover in the rest of the episode, there were actually some pretty serious consequences to this happening. Yeah. And I feel like this is another instance where we're realizing that as a society, as a civilization, we just haven't come to terms with this problem of language models, image generation models, and all that being vulnerable to exploits, being vulnerable to jailbreaks. These companies are putting in tons of effort, millions and millions of dollars, tens of millions of dollars to secure these models and have them say no when you ask them for help, to bury a dead body or to make an explicit image that has Taylor Swift in it or what have you. And these things always, always, always have workarounds. And I think as a society, we are still in the phase where we're somehow pretending that companies actually have the technical chops to prevent misuse of these tools. And we're leaning on that an awful lot. The reality is once somebody discovers a jailbreak, that might take some time to uncover, but then it spreads at the speed of software and you can have these very rapid takeoffs of very risky and unfortunate behavior. In this instance, the number one post hit 45 million views, had 24,000 shares. So tons of interaction, all this stuff, and it was only removed 17 hours after it was initially posted. And this is kind of another dimension of it, which is, you know, like Mark Twain said, a lie gets halfway around the world as the truth is still putting on its shoes or something like that. You know, in AI, that is extra true, right? In the age of social media, you post something, it goes viral. The correction is never as viral as the original post. The correction always comes after people have already gotten riled up and where the damage has been done. And you really can't, you can take down the tweet, but you can't undo history and that data is still online forever. So kind of an interesting situation for Twitter to be in, especially given they are currently being investigated by the EU over claims that X is being used to disseminate illegal content and disinformation. And so this is sort of a bit of hot water for them to be in, in particular, in light of those investigations. So, you know, difficult times ahead for, I think everyone in tech right now is trying to figure out who owns responsibility for certain misuses. You know, is it Microsoft designer? Is it the software that was used to produce these things? Is it the platform that was used to share it? Is it the individuals who shared it? You know, like we kind of got to sort this out and we've got an election around the corner. And this is a real reminder that this exact thing, you know, non-conceptual deepfakes often being pornographic or explicit, you know, has been one of the concerns with AI now for a while, really, since the beginning of deepfakes going back to like 2019, 2020, there were already tools being created to make it easy and accessible to do this sort of thing. And now it's getting easier and easier with the advance of technology. So still a very real kind of problem category for AI and something that now with this Taylor Swift story, I guess, really hit the mainstream with a lot of discussion and conversation and actions being taken, I think, to really address the situation more rapidly than I guess we had in the past years, even though like the notion of deepfakes and the notion of deepfake pornography has been around for a while. Next story, speaking of celebrities, this is a follow-up on a previous story we had. And the story is that YouTube deletes a thousand videos of celebrity AI scam ads. So this is following up with this investigation by 404 Media that pointed out all these scam ads that were being posted with celebrities like Taylor Swift and Steve Harvey and Joe Rogan and various other famous people. After that investigation came out, YouTube did address, I guess, its findings. So according to the story, there were about a thousand videos deleted and these videos had almost 200 million views in total. So I guess, yeah, another example of a platform dealing with deepfakes and having to address them rapidly after they come to light. Yeah. And the interesting angle that pops up here too is you start to think about the liability that YouTube might have over hosting videos that are, you know, I mean, these are reputationally damaging, right? If you're Taylor Swift, if you're Steve Harvey and you see a bunch of videos coming out and you're promoting some Medicare scam, now that does brand damage in a very serious way. And if these videos stay up long enough, as you said, to collect 200 million views, like you can assign a dollar value to each of those views in terms of how much damage is potentially being done to their brand. So yeah, I mean, I have no idea what the liability regime is going to end up looking like around this stuff. I think it's, to my understanding, this is something that's still to some degree up in the air and unresolved. So I think probably, I don't know, 2024 seems like a year where we'll also see, in addition to a whole bunch of precedent being set on election interference, we'll probably also see a lot of interesting precedent set in the courts as people try to figure out where responsibility ultimately lies on some of these issues. And just a couple more stories in this section, then moving on to the lightning round for some quick ones. We have yet another deepfake story, actually. This next story is that Iceland has had its own AI George Carlin moment and considers law against deepfaking the dead. So we've covered, there was this AI synthesized comedy special starring the famous comedian George Carlin that kind of hit the news a couple of weeks ago. And now a similar thing has happened in Iceland. There was video that featured comedian Hemi Gunn, I think that's how it's pronounced, who passed away in 2013. And there was, yeah, this video that aired during actually a television event created with assistance from Icelandic startup Overtune, which creates AI voiceovers or has that capability. So interestingly, yeah, this led to a lot of conversations and now there's a whole consideration of what do we do about deepfakes in response to this in a way similar to a Taylor Swift situation. Yeah. I just want to pause and highlight the fact that this is the most 2024 thing I've ever heard. A headline that says Iceland considers law against deepfaking the dead. That's kind of like, you know, we're in the black mirror phase of human history right now. It's pretty insane. Yeah. And like deepfaking, reanimating deceased people or deepfaking them in whatever form, deepfakes are also not the only way that can be done, right? Like you can have chat bots, you can have audio file or audio generation alone, all those things. And it all kind of seems like something, again, as a civilization, we have to make those decisions now. It's like, you know, philosophy is checking our homework after 2000 years, or 10,000 years. We don't really have much to show for it at this point. And let's round out to a section and this initial slate of stories or something a little bit more fun. I figured let's just throw it in. So there was a story about how Guns N' Roses, the famous rock band, has shared an AI generated video for a new song titled The General. So you can go ahead and check that video out. If you take a look, it's kind of interesting. They really went sort of low tech, so to speak, for AI. It's very obvious that it's AI. It's a lot of filters. It's kind of very wavy. They're not trying to make a very beautiful image or anything. It's really more like a filter. You could go like a year ago and see this sort of stuff. I do think this kind of highlights for a very mainstream band like Guns N' Roses, really old school band, you could say. Them using AI in this video is yet another pointer to how it is becoming more mainstream. And I suppose how AI tooling presumably is making its way into more and more creative professionals' toolboxes. And then moving on to tools and apps, and it's a bit of a continuation, I guess, in a sense of some of the stuff we've been talking about. First one is Microsoft makes Swift changes to AI tool. And so this is in response to the Taylor Swift stuff that we talked about. Obviously, the case of designer being used potentially to create some of this explicit imagery of Taylor Swift, these nude images, which, by the way, did come out of 4chan and a Telegram channel. So that was kind of the place, as so often is, it is where these things were shared initially. This is basically Microsoft's response to this. They're saying, look, we've introduced a whole bunch more protections into designers. Hopefully those stick. They are explicitly kind of on the back of this Taylor Swift situation and sort of issuing these standard corporate reassurances. We're investigating these reports and are taking appropriate action to address them, says Microsoft. And then they reinforce the fact that their code of conduct already prohibits their use of tools for this sort of thing. And then they're highlighting that they actually have large teams working on the development of guardrails and other safety systems. But one of the challenges really is the technical one. You can have as large a team as you want working on this, but we're facing down a situation where there are fundamental technical constraints that companies are apparently facing right now. Nobody knows how to solve, for example, the problem of AI alignment, the problem of getting AI systems to reliably do what we want them to do. And as long as that remains the case, that not only creates potentially catastrophic risk in the longterm, but the way it gets expressed today is that you can't prevent jailbreaks. You can't predict how these systems are going to behave in a wide range of circumstances under a wide range of prompts. And so really the idea of developing guardrails, that's good. It's good. It's helpful. But there is kind of this fundamental limit to how far that can go. At least there seems to be at this point until we make some really basic breakthroughs in the science of understanding AI. That's right. And this story has some good examples going into a bit more detail of how this happened. So designer was not supposed to allow you to do this, as you said, it was meant to prevent you from generating images of Taylor Swift. But in this article, they show how if you type Taylor Swift, it would prevent you. But if you type Taylor Singer Swift, it would go ahead and generate an image of Taylor Swift and how it would prevent you from explicitly describing sexual acts or sexual scenarios. But if you just use suggestive wording and kind of indirect descriptions, it would still go ahead and do that for you. So that's kind of what we meant by prompt hacking is basically tweaking the prompt a little bit to get around the guardrails. And as the article said, now I guess it's updated to prevent that. And yeah, another reminder of that in general for AI products, you are going to have to be really careful about this sort of thing. Next story is that OpenAI has dropped prices and supposedly fixed the lazy GPT-4. The price drop here is about GPT-3.5 Turbo. And for that one, one of the most popular APIs, the prices have actually been reduced by 50% for your input to model and 25% for outputs, pretty substantial drop. There was also an update of the model, which is, they say, improved in various ways. And there's now a preview model called GPT-4 Turbo, along with some of these fixes that fix an issue we covered a couple of weeks ago with GPT-4 supposedly being lazy and refusing to do the work essentially to reply. So if you tell it, write me a little short story, sometimes GPT-4 would do something like say, okay, here's your first paragraph and then fill in the remainder or something along those lines. So along with this layer of updates, they say that they've addressed those kind of concerns. Yeah. And a lot of their updates have to do with two new text embedding models that they're releasing. So text embedding is this thing where you take in a piece of text, you feed it to your AI system. And rather than predicting an output in the usual way, when you use chat GPT, just kind of get some sort of text output, this instead will give you a list of numbers that represents essentially the meaning that's encoded in that piece of text. So you kind of turn it into a numerical representation that allows you to kind of do math, if you will, on the meaning behind those words. And that's known as an embedding. The embedding is sort of interesting. It's very useful for a lot of backend applications. If you want to compare the meaning of different things, if you want to compare, for example, for the purpose of making a ranking of product reviews, which product reviews are the most positive, which are the most negative, that sort of thing. So what's happening right now is OpenAI is releasing a new small text embedding model that's designed to be very efficient, not very costly. And they have a commonly used benchmark for this sort of thing for multi-language retrieval. It's called Miracle. The score apparently on that has gone up by over 10% from 31% to 44% for their new text embedding model relative to the old one. So it's going to be pretty clearly a big kind of upgrade. And the pricing for that model has gone down by a factor of five. So we're seeing not only better quality, but also better pricing, something sort of mirrored by the larger new text embedding model they're also releasing. And they've got, anyway, all kinds of really exciting developer tools in there. I won't get into the details too much, but this is actually a pretty big set of updates. It's a bit of a smorgasbord of different things. As you said, there have been a lot of complaints about this idea of GPT being very lazy. Apparently that's been fixed. The way the laziness would manifest, if you recall from previous episodes, is people would ask GPT to do some task and it would kind of go, well, you could probably do it by doing these steps. And what you're asking it is to do those steps, but it'll just kind of tell you what steps it should do rather than executing them. It's kind of a common way that would manifest. Apparently that's fixed. No information about how exactly it was fixed, but good to know that that's no longer an issue. That's right. Yeah. In the release notes here for GPT-4 Turbo, they said this model completes tasks like code generation more thoroughly than the previous PPP model, and is intended to reduce cases of quote, laziness, where the model doesn't complete a task. So I assume we're working on it throughout, I guess, their model slate, including GPT-4 Turbo and with this GPT-3.5 that has various improvements. On to the lightning round, and one more story about OpenAI, and it is that ChatGPT now has AtGPT, a new feature that allows you to basically mention specific instances of GPTs. So it's a better feature and it allows you to converse with multiple versions of ChatGPT from their GPT store in the same chat window, by basically addressing them with at, you know, music GPT, at teacher GPT, et cetera, et cetera, et cetera. So yeah, it's in beta and I guess they are working toward integrating more and more of the store in various ways. Yeah. And the example they give here is they have a Biden GPT and a Trump GPT talking to each other in the same thread. So you can kind of summon the Biden GPT by basically doing like at Biden GPT and then like getting it to generate an output and doing that back and forth. This is really interesting because it's a fundamental shift in the way that we interact with these systems, right? Normally with ChatGPT, you have to give it a prompt. That prompt, you can think of it as a thing that activates a particular version of ChatGPT, right? You're telling it, for example, like, hey, I want you to act like Elon Musk helping me to solve some problem in rocketry or something. And then it'll do an impression, effectively an impression of Elon Musk in that context. So every time we give a prompt to ChatGPT in this case, we summon like a different version of ChatGPT. Now here we have, and that's one way to do it. You can also do fine tuning to also get different behavior. What this allows you to do is in a single dialogue, invoke all of these different versions of GPT that have been pre-built, these GPTs from the GPT store so that they can interact with each other so that it's more convenient and so on and so forth. This is deeply related, by the way, to a paper we'll be talking about in the research and advancement section on meta-prompting. And I think that's not a coincidence. I think the whole field increasingly is moving in this direction of, you know, how do we benefit from both the generality of these models and the kind of expertise of more specialized versions, right? So if you're a specialized Trump bot, a specialized Biden bot, for example, you want to be able to benefit from that, but not also lose the value of the generality of the base model. And this way of interacting in the same chat window with a bunch of these different ones, whether the person who's leading the interaction is a human in this case, or as we'll see later, an AI for meta-prompting, it's just this way of kind of navigating the trade-offs between generality and specificity. This is a big strategic play for OpenAI, you know, Sam Altman was talking to Bill Gates on a podcast and he was saying, you know, customizability, personalization are really key things on OpenAI's development roadmap. That really maps onto this, right? You have a whole bunch of customized and tailored bots and you can get them to sort of orchestrate some interaction through these new windows. So kind of an interesting plot twist on the way that we are interacting with these sorts of systems. And next we have a story that's a little bit more insider baseball, maybe something you wouldn't see in the New York Times, but I think is fun if you're a regular listener. We have a story that Chajabriti finally has competition, Google barred with Gemini, just matched it on the large model systems organizations, chatbot arena. So this was discussed on Reddit and Twitter, like in AI circles where Google barred after an update that possibly added a brag to it. So possibly there's sort of some cheating here with retrieval of extra information. But anyway, the story is that on this leaderboard, Google barred now matches Chajabriti and that is the case for the first time. So yeah, there was some excitement of seeing finally a competing chatbot seemingly perform on par at least. Yeah. And just for context, I mean, so the LMS's chatbot arena leaderboard, which is kind of like the thing that's being used to assess that, yes, in fact, barred does seem to perform better or this version of it anyway, is kind of an interesting tool. We talked a lot about the hugging face leaderboard, usually for open source models and tracking specific benchmarks. That's a really good one. The way this one works, they end up, yeah, you've got these two models pitted head to head at any given time. So you'll write a prompt, the prompt gets sent to two models, but you don't know which ones. And after the response is shown to you, you pick which one is best. And so over many, many rounds, you kind of end up aggregating these scores. So not actually so dissimilar from other approaches to this, but notable that through that process, which is human driven, it's not like AI evaluation or anything like that, you do see Bard performing really well. And this is, by the way, the first time that Bard has ever actually beaten the base version of GPT-4. So it is an interesting kind of development as Google and Microsoft or really Google and opening eye go to war on who has the best chatbot. It's still in second spot. The first spot is actually GPT-4 Turbo, but they're pretty close and it does beat the base GPT-4, like you said. And yeah, this was big enough news, I guess the reason people kind of discussed it partially was Jeff Dean, one of the leads of AI at Google, even posted about it on Twitter. There's a Gemini Pro-scale model, very ambiguous, but whatever they did, they'd make it jump quite a bit. And yeah, maybe Bard is kind of good now. Next story, going back to something we had last week, we have some more browser updates with AI integrations. And this story is covering specifically how both browsers, Brave and Arc are integrating AI stuff. Arc is this minimalistic browser, kind of a smaller one, and they are adding the ability for users to switch their default search engine to the perplexity AI-driven search that we've covered quite a bit. So that is the extent of it. You can just always do perplexity by default. Whereas Brave is possibly a bit more mature, privacy-focused browser. And this week we announced that Leo, their AI browser chatbot assistant, is getting upgraded to Mixedraw 8.7b, this cutting edge or top-of-the-line open source model that is pretty much the best you can do as far as open models. So yeah, browsers continue to move along and add more AI. Yeah, and I think this is really interesting from the standpoint of perplexity, right? Because strategically, they seem to really be leaning into this idea of using partnerships as a way to drive adoption, recognizing they're just not going to be competing head-to-head with Google.com for the search market. So we've seen them now with this Arc integration, but also earlier we talked about their partnership with Rabbit and various other companies that are bringing in users, bringing in eyeballs. And those partnerships seem to be the way they're trying to leverage their way up to a position of maybe competing with Google at some point. So kind of interesting, and it does seem to be paying off right now. I mean, it's something to note that we have seen their name come up an awful, awful lot on Last Week in AI in the last, well, couple of weeks. One last story for this section. It's that Baidu's Ernie AI chatbot will power Samsung's new Galaxy S24 smartphones. So I think just last week we were talking about how the smartphones will have live translation capabilities powered by Gemini and Nano. And this story is highlighting how in China, it actually will not be Gemini that will be in the phone. Instead, it will be Ernie AI from Baidu, Baidu being a massive company in China, somewhat like the Google of China, you could say. So yeah, this is really highlighting Baidu still being a very serious player in the AI space, especially with this Ernie AI chatbot being one of the top ones in usage. And now I guess it will be deployed even more. Yeah, I think we covered a few weeks ago how Ernie bot had actually reached the 100 million user threshold, though there were a bunch of caveats around whether that was monthly active users or what specifically that meant. The reason they were reporting that was that OpenAI was famous for being the fastest piece of software ever developed that reached 100 million users. In that case, they were pretty transparent about what that meant. In this case, maybe less so, but it is interesting. It's also a sign that China's strategy of creating their great firewall is helping their domestic companies bootstrap themselves up to be credible players in the space. Obviously, if OpenAI had free reign to compete in that market, they would very likely completely trounce all these Chinese companies because they are consistently about, depending on who you ask and how you calculate, 12 to 18 months ahead, certainly on the language modeling side. But what we're seeing here is the domestic companies like Baidu taking advantage of this wide open market and following up with some pretty solid tech. This is market ready, it seems. And onto applications and business. Speaking of markets, the first story is AI companies lose $190 billion in market cap after Alphabet and Microsoft reports. So it is the season for financial reports. And the latest ones have been a little disappointing, at least looking at the story. There were a couple companies that lost out after seeing their numbers. So for instance, Alphabet dropped 5.6%. After there was a missed expectations on ad revenue, there was even a little bit of a drop in Nvidia, 2%, drop in AMD, 6%, even a tiny drop in Microsoft, which has really been doing crazy good on top of its AI push. So Microsoft dropped 0.7%. So yeah, it's worth, I guess, remembering that AI is still not a point where it's printing money for most companies, except maybe Nvidia. I mean, yeah, and Nvidia was down as well, it seems a little bit, but I think these things are all relative. When we think about, oh, well, it was a disappointing performance on the whole. We see these relatively, yes, significant in the near term drops. In the long term, if you think about where these companies were at before the chat GPT era, on the promise of AI, their stock value has just rocketed up. And this might just be a slight correction. I think it's interesting that we're seeing a mix of not just like cloud providers, but also AI hardware designers, AMD and Nvidia in particular, plus model developers. Like really, it's the whole gamut of AI related companies. Usually when you look at a space like AI, it doesn't rise and fall universally, right? You see there's a lot more consumer adoption than expected. So companies that actually serve models might do better, or there's some issue with the AI hardware supply chain, and as a result, the companies that specialize in AI hardware get dinged, but the others don't. This is interesting because we're seeing kind of this like more universal blanket drop on all these things, which may tie more into broader market sentiment than indicating there's a fundamental issue, but it's hard to know. Again, one big thing that people are pricing in when they look at these kinds of companies is absolutely the prospect that AI may be somewhere fairly radical three years from now, for example. So you start to look at valuations in the trillions of dollars as being maybe small through that lens, right? So it's an expected value calculation as all investment decisions are, but I think that's kind of the big question. These small micro moves I don't think are the big thing to be tracking. In the long run, the question is who's going to be able to automate a big chunk of human labor, and that's a big part of what's propping up these prices for everything from the model developers more kind of like Google, but they do hardware too, but all the way down to AMD and NVIDIA. Yeah. That's a good point. In the big picture, 190 billion in market cap sounds dramatic, but these are relatively small movements and they reflect, as was noted, things like the ad revenue of Alphabet or the revenue of AMD that they reported, even though they did project strong sales for its AI processors. So yeah, there's still a lot of regular business that needs to fund all of these AI efforts. And it seems investors are still a little sensitive to the money being brought in being significant. And on the second story, we were touching on something we haven't talked about in a little while, and that is Cruise. So just a bit of background, Cruise has been in the news for some months now, I think since about October, November, there was a major incident where Cruise was involved, Cruise for context being one of the leading self-driving car companies along with Waymo, they offered a commercial offering in San Francisco where you could hail a robo-taxi and it will take you anywhere. So last year, they were involved in an accident where a human did a hit and run on someone crossing the street, unfortunately, by some bad luck, the human was then launched onto a cruise car and the cruise car did a pullover maneuver that dragged this person. And then afterwards, there was a meeting and the cruise people at this meeting apparently failed to fully inform the regulators. And that led to a whole bunch of trouble, like a lot of fallout for Cruise ever since that happened, a lot of bad stuff. So this news is about how there was a report by a law firm, Quinn, Emanuel, Lueckert and Sullivan, an outside firm hired by Cruise to essentially investigate what happened here and why was there this breakdown in communication and so on. And it's somewhat interesting. So it appears that Cruise, whoever was in this meeting, did try to inform the regulators of what happened and they were unable to because of bad internet. And so what happened was, yeah, they wanted to just play the video and here's the video that shows everything. And bad internet prevented the video from fully playing. And apparently, Cruise had this plan to let the video speak for itself to whoever they're meeting with. And that, according to the support, is essentially what happened. They intended to not hide this aspect of an incident, dragging of a person that they originally failed to mention. And so because they didn't verbally go over it and because they were unable to play the video due to bad internet, they didn't actually get that across and a whole bunch of bad stuff happened. And just one more thing is, ultimately, it's a long report and the firm does say that it was fundamentally flawed and probably not a good idea to assume that the video can speak for itself and not explicitly disclose the details to regulators. Yeah. And the report that's filed here is kind of chalking this up to Cruise having too much of an us versus them attitude with respect to regulators, which may perhaps serve to explain why they were not so forthright necessarily with flagging the dragging incident. One of the things that the article notes is that there were over a hundred Cruise employees who were aware of the pedestrian dragging incident prior to the meeting that was held in the mayor's office in this context anyway. The bottom line is, there is an awareness at Cruise at the time that these things maybe ought to have been disclosed, but Cruise, yeah, chose not to verbally say anything about it. And they were just like, we'll let the video speak for itself. But then if the video itself does not contain that crucial, decisive step in the story here, not mentioning it verbally, it's just kind of hard to see. I'm not a lawyer, but it's just hard to see rationally how you can justify that. Certainly the report seems to take a pretty dim view of that particular approach. So who knows? I mean, I guess, saved by the bell, saved by the bad internet, that may not be enough of a legal defense for Cruise here. I don't know. And the full report, it's almost 200 pages and you can actually read the full PDF. It's a very exhaustive breakdown of what happened. So I don't know, kind of interesting to see in a way it's a dramatic story, right? Because in the aftermath of this, GM just recently announced they're slashing spending on Cruise by half, by like a billion dollars. Cruise stopped all testing in the US. They stopped offering commercial offering. It was catastrophic what happened. You look at the details here and it turns out like there wasn't some cover-up planned. It was just, Cruise was very focused on pointing out that the initial... incident was not because of them. It was a human driver who hit someone. And then as a result, kind of downplayed too much the actual part of the cruise deal play of dragging a person and creating additional harm. So yeah, in a way, as a narrative, it's quite interesting to think of how over the course of a few days, because of some bad decision-making, a lot of bad stuff happened. Yeah. It kind of feels like relitigating, reliving the same discussion that we had earlier in the context of the Taylor Swift piece, right? Where does the responsibility live? Does it live with the human who made a bad judgment call early on in the process, or does it live, or how much of it lives with cruise itself and with the downstream consequences of what they did or what their car did? I don't think we just have these answers ready to go. So again, 2024, I think going to be a big year for liability in AI. There's obviously legislation before Congress that people are considering now too. So I think we'll have a much clearer set of answers to these things. They may not be satisfactory, but we're definitely going to see a lot of this stuff getting chewed on in public in the future. Onto the lighting round. The first story is that hugging face teams up with Google to accelerate open AI development. So this is an announcement of a strategic calibration that would give developers on Google Cloud a streamlined way to use models on hugging face. Teams that use these models could train and serve them with Google Cloud more easily. This is just announced. It's not yet available. These are probably going to come out in the first half of 2024, but yeah, interesting, I guess, calibration for Google and hugging face. Yeah. It's always been the case that you could make a lot of money by building wrappers around cloud infrastructure because cloud infrastructure is so painful to use. It takes so much expertise to know how to use, especially like AWS, which is notoriously rough for this, but Google Cloud and other services like this. Famously, back in the day, there was a company called Heroku who basically were just like a wrapper around Amazon web services that made it easier to use because AWS is just such a crap fest if you're a developer, such a pain to learn. And so this is maybe a version of that kind of a play where hugging face is making itself the user-friendly face of Google Cloud essentially through this deal, or at least that's in part what they're doing, except instead of doing what Heroku did and facing software engineers, what they're doing is they're facing machine learning engineers, AI developers, that sort of thing. But there's a dimension of this that has a lot of overlap with that hugging face play. Hugging face, by the way, it became, I think, a $7 billion company. So there's a lot of value you can unlock just by making things easier to use. And that's basically it. And a lot of good as well opportunity for Google to draw in more usage of their products, just without having to do all the work involved in making it user-friendly and all tight. So kind of an interesting collaboration and certainly a good move for hugging face because that deeper partnership with Google is going to make a lot of AI hardware essentially available to their users. By the way, I should mention for context, if people don't know, hugging face is a big player in the AI space as kind of a host of AI models. So they are a repository, and they currently have over 500,000 AI models and 250,000 data sets. So if you are developing an AI model, you train a neural net, many post their AI models on there. And they have been offering the ability to serve models to say, okay, here's an open model that's on here. With just a few clicks, I can go ahead and deploy it on one of several cloud providers. So this is, I guess, extending their offerings to now also Google Cloud and to Google's TPUs, Tensor Processing Units, and various cool hardware. So really an extension of hugging face already doing. Up next, we have Elon Musk's xAII's $6 billion in funding to challenge chat GPT maker OpenAI. Okay. So xAI, of course, Elon Musk's AGI play. Roughly speaking, you can think of this as the thing that Elon Musk did because he was worried that after OpenAI parted ways with him, he no longer has an AGI play. And he, of course, is and thinks of himself as being a big player in the AGI space. So he spins up xAI, hires Igor Babushkin, a whole bunch of other former DeepMind, OpenAI people, and they start up xAI. There was a time when xAI had submitted some documents with the US SEC, where they'd set a target for a billion dollars in fundraising that they were going to try to reach. This suggests that they've now exceeded that, or at least that they're planning to. So rather than a billion dollars in funding, it looks like they might be looking to raise $6 billion. It's a little unclear right now. There's nothing concrete. And then the valuation of $20 billion. Now the valuation of $20 billion is interesting, right? Because when we think about the other companies in this space, OpenAI, obviously, way ahead, they've got a valuation, seems like it might be in the $80 billion range plus. But when you think about some of the other players, Anthropic has raised at around an $18 billion valuation, if my memory serves. And so this would actually place xAI slightly ahead of Anthropic's latest valuation. That's an interesting question, because so far, we don't really have that many proof points from xAI other than Grok, to the extent that's been involved in a lot of that their work has led to that. So we don't have the kinds of proof points that I would argue Anthropic has so far with Claude and Claude2 at scale. So it's an interesting question about how they defend that valuation. Part of the answer may be who they're engaging to raise funds from. So it seems like xAI has been focusing on family offices in Hong Kong and sovereign wealth funds in the Middle East, that sort of thing. This raises interesting national security questions, right? Because we've had situations in the past that we've reported on the last couple of weeks where the US is taking a dim view of, for example, Sam Altman trying to raise funds for his chip initiatives from folks in the Middle East and so on, and from the UAE in particular, because of their affiliations with Chinese-based organizations, Chinese-funded organizations. Well, here is xAI apparently engaging with family offices in Hong Kong, which is now very much in the Chinese, not just sphere of influence, but absorbed into China at this stage. So kind of interesting from a national security perspective, what are the implications of raising funds from these sorts of sources? Last thing to note is apparently Morgan Stanley is coordinating this whole fundraise. They've got obviously a whole bunch of experience doing tons of international kind of big money stuff. And they were also involved in the acquisition of X or Twitter, as it was then when Elon kind of acquired it. So anyway, really interesting situation, big, big fundraise, lots of interesting counterparties, like folks who are actually throwing money at this initiative from countries that raise interesting national security questions. And I'm really curious how this will all play out and what the implications for xAI will be going forward. This, by the way, is according to reporting by the Financial Times. They cited various sources saying that this is what's happening. Elon Musk, in response, posted on Twitter or X that they were not raising capital at all, basically denying all of this. So hard to say, maybe the reporters got it wrong, but either way, this is what was covered. That seems weirdly specific. Yeah. We don't know, but that's what the reporting said. And onto the last story, AI chip startup Rebellions snags funding to challenge Nvidia. This is a South Korean AI chip startup, Rebellions Incorporated, and they have secured $124 million in Series B funding to develop their next generation AI chip, Rebel, which is specifically designed for running large language models. So pretty significant raise. This was the only partner in 2020. They are partnering with Samsung to fabricate its chips using four nanometer technology. So a very high tech bet for this one. Four nanometer, the significance of that too is the Nvidia, we talked about this before on the podcast, so sorry if it sounds familiar, but the Nvidia A100 GPU, that's at seven nanometers, the seven nanometer process. The H100 is a five nanometer process. So when you're looking at four nanometer technology, that truly is kind of next generation stuff. This sort of thing usually takes a really long time to mature. So the fact that Rebellion was founded in 2020, right now the timeline for sub five nanometer tech is, I'm trying to remember now, I think Nvidia is looking at, was it 2025, 26, maybe for that next node size. These are called nodes, by the way, the X nanometer or whatever. So the four nanometer, sorry, the three nanometer node size is 2025-ish. I'm sorry. Sorry, let me take that back. The availability of three nanometer node sizes for AI purposes will probably be unlocked kind of on that time horizon. Three nanometer node already because it's being used for the iPhone. But anyway, it's a whole thing. So this would be very advanced tech. It would not be, on my reading, it would not be truly cutting edge when it comes out. But it does suggest that you have this new entrant who is trying to play an important role here and partner with Samsung, which is sort of right now in second place relative to TSMC, Taiwan Semiconductor Manufacturing Company, in terms of its ability to make cutting edge chips. So yeah, it's still early days. Believe it or not, $124 million raise in this space is not actually that big. And the valuation is $650 million. So not an enormous, I know it sounds weird, but not an enormous kind of amount of money inbound because this activity is so capital expensive. You just need that much money to get a chip project off the ground these days. So worth tracking. There are a whole bunch of other early stage, similar companies, Tence Torrent, for example, we've talked about on the podcast before. So we'll just see where this goes and what their proof points end up being. And onto the projects and open source section where we have a fun trio of stories all dealing with code generation. The first one is about AlphaCodeum that was inspired by DeepMind's AlphaCode, but is open source and now seemingly surpassed it. So they announced improvements. And one of the cool things that got some attention was that this one has a neat thing called flow engineering. So instead of just kind of generating text, as you do with large language models, they have a whole kind of little architecture of how to generate code with iteration on generated things, this discriminator type thing adversarial model that provides code integrity through testing, reflection, and spec matching, etc. So yeah, this is a cool model that is now quite good. And it is developed by Codeum AI, a Tel Aviv based startup. Yeah, it is a specialist. So there's this thing that happens in a lot of these announcements where people will pitch their model as being really exciting in that it beats GPT-4. And then you'll be like, well, wait a minute, like, are we talking about you're beating GPT-4 with a general purpose model or a model that specializes in, for example, coding? And that's actually what's going on here. So we have the specialist model, AlphaCodeum, that is able to outperform GPT-4 at coding by an interesting margin. I mean, it's 19% to 44% on the code contest benchmark. So that's not nothing and that's a legit benchmark, but it is a specialist model versus a generalist. So that's, I think, an important thing to flag. Another interesting thing is, as you said, Andre, the strategy here for AlphaCodeum is basically built around a fancy prompt engineering technique, right? Where they have the system generate code, they have other instances of the system kind of review the code and it goes back and forth in this very kind of GAN-inspired way, like inspired by generative adversarial networks, where you have one network that generates a thing and then another that kind of critiques it, if you will, in a sense, and they go back and forth. But yeah, they're calling this flow engineering. You almost see them, at least my read on the article and a lot of the quotes was they're really trying to make flow engineering a thing. And it makes sense that this is something that's, I don't know how truly unique it is. I mean, I feel like I've seen a lot of schemes like this in the backend, but hey, it works. There's no arguing the 44% performance on the code contest benchmark. So really interesting. Yeah, it's a new thing. I wonder if flow engineering will stick and we'll be using that term a lot more in the future. And the second industry of models related is coming from Meta. It is Code Llama and they announced an update to the largest version of it, Code Llama 70B, that is quite a bit better. So the numbers that we do have is that it scored 53% inaccuracy on human eval, outperforming GPT 3.5, not outperforming GPT 4, not any details on specific coding benchmarks, at least in this new story, but yeah, another big code specific specialized model. Yeah. And it's another one based on Llama 2 as well. So continuing to see that one be used as one of the default ones, at least when you reach into your bag of open source, frontier open source models. And it is apparently, so Code Llama 70B, the one that they're making here is still available for commercial and research uses. So they haven't made the license any more restrictive, which is good to know. And the full $70 billion, sorry, $70 billion, the full 70 billion parameter model was trained apparently on a billion terabytes of code and code related data. So it's actually quite, it's a decent amount. So anyway, kind of an interesting development and another feather in Meta's cap here as they look to promote the Llama 2 series with Llama 3, I guess, on the horizon too. It's about that time. And to round it out, the last of these models is DeepSeq Coder. So they have yet another large language model that specializes for programming. And this one ranges from 1.3 billion to 33 billion parameters and trained on this time, 2 trillion tokens. So a little bit more, they show on various benchmarks that it is state of art among open source models and is close or actually better than Codex NGP 3.5, although not quite to a level of GPT-4 Turbo. So yeah, there you go. Three separate announcements of code related models. And yeah, this is a major space. I think if you look at sort of where a lot of the generation of text by AI models is already happening, a large chunk of it is probably in programming because I think adoption has been pretty rapid. And when you code, it's pretty much just constantly generating for you the suggestions of what do you code next. So I guess it does make sense in that vein why there's a lot of focus on it. It's just already being used a lot. And moving on to our research and advancement section, this is an idea that I'm really excited about. I think it's very interesting. It's a paper called Metaprompting, Enhancing Language Models with Task-Agnostic Scaffolding. Okay. Maybe it makes sense to... Oh, by the way, so one of these authors is from OpenAI, the other is from Stanford. So they're very kind of closely tracking obviously the space. But one of the things that you find when you try to get more juice out of language models, you can think about prompting models directly to answer a question, like write me an essay about Shakespeare or something like that. You can think about prompts that try to turn more explicitly your model into a specialist. So you might imagine saying something like, you are a PhD in the history of Shakespearean English. You answer questions in such and such a way, in such and such a style, and blah, blah, blah. And then you ask a question, right? So now you're really adding a lot of context to turn that model into a specialist. You can also imagine fine-tuning the model and so on and so forth. But there's this kind of challenge that comes up where if you're going to turn that model into a specialist with a very specific prompt, for example, you're going to lose some of its generalization ability. And so one of the fundamental questions that people are facing right now at the frontier as we speak, at the frontier of language models, is how do we navigate this trade-off between generality and specificity? How do we get experts, expert large language models that specialize in certain domains? How do we benefit from that expertise without sacrificing the general knowledge of the base model? And this is an attempt to answer that question. So one of the ways that they approach this is to say, OK, you can actually think about the process of getting an answer from a language model as having two parts. The first is figuring out what the right prompt is for that model. And the second is actually running that prompt and getting the output from the model. Now, historically, humans are the ones who think of the prompt that we're going to give to the model, right? We're the ones who are responsible for figuring out, what the hell am I going to, how am I going to get this model just so that it answers my question the way I want? And this is an attempt to change that. It's an attempt to say, well, actually, even the process of finding the prompt is something that in principle we could automate. It's just a subtask that we could ask the LLM to do. And so what we're going to find is we're going to break down with metaprompting, we're going to break down this process into a series of steps. First, we'll take a high-level instruction where the human isn't bothered to be very specific about how exactly the task needs to be solved. We'll trust a high-level conductor language model, like an orchestra conductor, the main high-level meta model, to break down that task into smaller manageable pieces. We're then going to have it assign each of these pieces to a specialized expert model. And those expert models, they're going to have prompts that turn them into experts. And those prompts are going actually to be created by the meta model at the top, like the orchestra conductor that you originally asked your query to. So you're now getting this parent model to come up with the specific prompts that will turn other versions of itself actually into the experts that it can then query. So you're benefiting from the general knowledge of that parent model, but also from the specificity, specialized nature of the experts that it will itself create and instantiate with a kind of custom prompt. And then that parent model will also oversee communication between these models. It'll also kind of apply critical thinking. So for example, it might ask a math model to figure out how do we translate Celsius to Fahrenheit or something like that. And then another model, like, okay, well, what does that mean about whether it's likely to rain tomorrow? And kind of combine those two results together, and it would apply its own critical thinking to kind of glue those pieces together. So essentially you have this kind of overarching model that is the orchestra conductor that also creates on its own, kind of spawns new models that have prompts that turn them into specialists. And you, the user, never have to tell it which specialist models to prompt or how to prompt them. You just get to sit back, ask your simple question at the top and see this thing kind of go through. And there's a whole bunch of interaction between these systems on the fly. And the last thing I'll say, I just think this is such a fascinating thing, not just because it works really well, but also because of what it tells us about the future of AI. I think we talked about this as early as like last year, but what we're starting to see increasingly is more and more of the compute budget that's going into getting good results from language models or other AI systems, is going in not to training, not to the training phase, but rather to the inference phase. In other words, we train this model once, right? We train the parent model once, and then we're allowing it to like generate not just one output, but a whole bunch of outputs. For example, we're allowing to figure out, okay, what are all the experts I need to summon? What are all the prompts I need to summon for those experts? How should I make those experts interact with each other to get the final result? So there's a lot more computation going on at the answer generation phase, the so-called inference phase of the process, and like relative to the training phase. And this is what you might expect as these systems get more and more context awareness during training, you can kind of trust them to figure more stuff out during the inference stage. They have enough context to come up with new solutions. So I thought that was really interesting as an instance of that trend. We're really seeing a whole lot of different independent inferences that are required to do this. This was not possible back in the day when we had way less efficient models like GPT-3, models that also had less world knowledge, so you couldn't lean on them as much during the inference phase. But because computing has gotten cheaper, and again, this is last week in AI talking about hardware all the time, this is why we do it. This makes it possible to do things like this. And at the same time, these models have more world knowledge, so you can actually rely on them to reason sensibly at inference time. So I thought this was just a fascinating little data point on this journey we seem to be on towards more and more general purpose models. Right. Yeah. I think it's very much a trend to, as you said, focus more on the inference stage and to do various things like information retrieval, right? There's a big one. You can have things like BART or perplexity that behind the scenes do some sort of search through some database, give that information to the language model, and the language model then synthesizes the response. This is, in a sense, similar, right? It's the same basic idea of there are tools that the model can draw on and get these things behind the scenes that inform its answer that you're not aware of. So it's no longer just a large language model anymore, right? It's not just a neural net given some text, spits out some new text. There's a lot of stuff happening in the background. And just to give some concrete examples, they have various experts that they play around with various tasks. So for instance, they have this game of 24, which I guess is some tricky rules related to numbers and puzzles kind of thing. And they have things like an expert mathematician, an expert in Python, an expert in problem solving that work on that one. They have another task that is sonnet writing, and they have an expert poet and an expert essayist. Then they have, yeah, quite a few of these. So another example is word sorting, and they have an expert linguist, an expert proofreader, and an expert essayist. And for all of these, as you might imagine, for these various tasks, they have about a dozen of them using this meta-prompting approach, this kind of combination of different models with one model taken in all the various outputs and synthesizing final output. That works better than any other prompt engineering hack you could try with GPT-4. So yeah, also, I think to me, interesting in the sense of there's always been a question of, is it just down to scaling? Are we going to get one giant neural net that just does it all if you make it big enough and train it enough data? This is, in some sense, could be said to be neuro-symbolic, or at least it's not just one big model that's scaled up. It's a whole system where you make these individual neural parts and you set up the orchestration between them and how they interact and so on. So there's a cognitive architecture, you can almost argue. And that's something that, for instance, Yann LeCun seems to be in favor of, that we'll need some sort of architecture of different components and not just one big, giant, crazy neural net. No, for sure. The idea of this being on a continuum to neuro-symbolic reasoning is a really good point. It's not at all something that's obvious too, because when you approach it from the standpoint of, well, philosophically, we don't want to throw on terminology, there's connectionism is the other end of that spectrum, where you're like, everything's one big neural network. And one of the interesting things about this approach too is it is much more interpretable. You have intermediate readouts that are human readable. And for safety reasons, some people think that's a really good thing. There's disagreement there, but anyway, that's a really interesting philosophical point. Where is the line between, yeah, neuro-symbolic reasoning and whatever the hell this is. And onto the next story. It's about EGLE. Speculative Sampling Requires Reading Thinking Feature Uncertainty. That's the name of the paper and it is about inference. So when you want to actually generate an output from your large language model, there are various ways to speed it up. And one of them is this idea of speculative sampling. Basically, doing something like what CPUs do, you speculatively decide on what the model might want to go for and speed it up with sampling that way by not necessarily querying a model quite as much. This, without getting super technical, they show that with some rethinking of how you do this speculative sampling, you can get a pretty major speed up. So without any sort of loss in performance, no fine tuning, nothing, you can get three times speed up with this sampling than vanilla decoding and two times faster than other proposed things like this. So yeah, it's a pretty significant speed up in the ability to get outputs just by changing how you generate that output over a more naive way of doing it. Yeah. And just by way of background, we talked about speculative sampling. One way to think of that is in contrast to what's known as greedy sampling. So you can imagine your model is going to take in a prompt and then it's got to figure out what the next word is that it's going to generate based on that prompt or based on the previous words it's generated. And it's going to have a whole bunch of predictions about like, is the next word the, is the next word happy or whatever? And for each of those possibilities, it's going to assign a probability, right? So language models are really just that. They're engines that produce a distribution of probabilities over words. And then one option is to say, okay, well, why don't we just pick the most likely word, right? If the most likely word is the, then let's just go with that. That's called greedy sampling. The alternative to that is to say, well, wait a minute. If we just anchor on like the very next word and we keep doing the sort of like at every next step, we're like, all right, what's the single best kind of highest probability prediction. We've got to get locked into a particular train of thought, but sometimes it's hard to tell what outputs are ideal before you've actually generated a number of words in that output. So you can kind of go like, oh, I see where this is going. Yeah. Maybe the first word didn't look great, but when I actually let this play out, it's like, damn, that's a good sentence. So this is what speculative sampling is meant to do. It's meant to allow you to kind of explore different possible, um, continuations at, at each step of the text generation process and then sample from those. So it's kind of a, a, a difference between saying, like, let me just do a one-step look ahead and actually like, let's see how this would play out if we actually followed these different, you know, a couple of different trains of thought. Um, so anyway, that's kind of what, what's, uh, what's at the heart of the speculative sampling thing that's being discussed. Yeah. To get into a little bit more detail, uh, specifically the way this works is, uh, you have a second model producing these probabilities of like, what is the probability of V? What is the probability of, uh, what is the probability of any given next word? And what you can do is have like a smaller model that gives you these probabilities as a proxy for a big model that is very expensive to get these probability estimates out of. And you can basically go ahead and say, okay, let me just go ahead with what the small model said. I'll verify with my big model that that was actually a good call, but I'll also continue going forward, uh, hoping that that is the correct answer. So in this, uh, paper, they essentially just showed how you can get this small proxy model to work better and, uh, get a very nice speed ups and moving on to our lightning round. Sorry, that's, uh, I'm trying to do some sound effects. It's a low budget show folks. We don't, we can't afford a sound guy. We don't have a sound guy. Anyway, the paper here we're talking about is called our vision transformers, more data hungry than newborn visual systems. And if you look at that title and go newborn visual systems, are we talking about chickens? Then the answer is yes. Uh, so what they're doing is they're basically just trying to settle an argument that people have been having over whether transformers, transformer models, the same sort of models that, you know, the, the T and chat GPT that we all know and love are these models when applied to vision? Um, are they as efficient or are they similarly, similarly as efficient as sort of biological systems? And the reason you might want to ask this question is, is first off just about fundamental curiosity and AI, like how much further can we go in terms of squeezing more efficiency out of these systems? If biological systems are proof that we can actually go further, then that's sort of an interesting note that maybe we can push, you know, squeeze more out of less. Um, and in other pieces, kind of more on the neuroscience side and transformers are sort of related to, uh, some current hippocampus models in computational neuroscience, and they can reproduce, uh, the precisely tuned spatial representations of the hippocampal formation, which is a lot of syllables, which basically says, you know, their behavior mimics some of the behavior that we do see in biological systems. Uh, people have run tests like this where they can validate that convolutional neural networks, which are sort of the spiritual predecessor of vision transformers, or at least not spiritual, but they were, these were the things that you used to use if you wanted to solve computer vision problems. Um, and, and they actually do, uh, look a lot, end up performing roughly as, as comparable to newborn chicks. Um, but a lot of that was assumed to come from the fact that convolutional neural networks, um, have a structure that is really, really designed to explicitly lean into some of the symmetries that images contain. Um, we'll get into the details of that too much, but if you, if you know, you know, and convolutional neural networks are, uh, are very, you know, have a, what's known as a strong inductive bias. They're very much designed for vision. Um, whereas vision transformers don't really have that. They're more general purpose. And so what they do in the study is they raise a bunch of newborn chicks in a really controlled, extremely boring environment that has one object. Um, and basically they, they try to, um, simulate the chicks experience. So, um, create a first person chick simulator, uh, that simulates what the world would look like if you were that chick moving around in that very, very simple, very boring chamber with that one object. And then they train a vision transformer on that data and they test it the same way they then test the chicks. So, so they, they expose essentially chicks to the same visual data that a vision transformer is exposed to. And, uh, long and the short of it is they end up finding actually surprisingly similar performance for these, these two cities, two systems, the chicken, the vision transformer, um, is pretty remarkable. They, they look at like how many training samples do you need to get the vision transformer to match chick performance? The estimate, you know, ends up being with around 80,000 images or so that are used to train these models, which by the way, does map pretty reasonably based on their calculations to like an estimate of the amount of visual experience that a chick might have over the course of its life. Um, you actually find sort of similar performance and they test these, uh, these two kind of entities, these two intelligent systems, if you will, um, by, uh, by having them look at a new object that they hadn't seen before, or an object, the same object that they'd seen, but from a different viewpoint to try to see if it can recognize that same object and distinguish between the two. So a really interesting piece, especially if you're interested in like the, the, this question of AI versus natural intelligence, you know, biology, what's, what's the Delta there? Um, this was a big update for me. I honestly didn't expect vision transformers to be this efficient. Uh, so yeah, interesting thing to know. Yeah, that's right. I think, uh, it's probably unexpected that the answer to the question of a paper, the title, our vision transformer is more data hungry than newborn visual systems. At least in this experiment is no, they are not more data hungry or they're, you know, roughly comparable. Now, of course, you know, this isn't learning to see in the same way humans are. This is on this specific task of like, can you recognize this thing that you've seen before, but from a new perspective. So this is under very specific experimental condition, but I do agree with you that it's kind of interesting to see that they are sort of comparable in this setting. They also, by the way, they just kind of slip this in, but I think, I believe that they're casually inventing a new vision transformer, um, in the context of this paper, which I think is just worth mentioning super quickly because it does have a pretty simple operating, um, uh, kind of method. Um, so the way they, they set this up is they have, you know, you imagine a video, which is really a collection of images, time ordered. And what they do is they train the system to look at one frame and they're like, well, you know what, um, from one frame to the next things shouldn't really have changed that much. Right. And one from a video, the scene, you know, the substance, the, the meaning behind the scene should not have changed that much. And so let's see if we can train the system to sort of like recognize, um, frames that are close together, uh, close to, to, you know, a given frame and, and distinguish those from frames that are further away and make sure that they're represented in the neural network in similar ways, if they're close together and represented differently, if they're far away, um, sort of in time. And that approach it's, it's called contrastive learning. We've seen variants of that going all the way back to clip, I think was the first time I remember seeing it back in like 2021, uh, from open AI, but, um, yeah, they're applying it to video generation here. And, uh, the reason they're using this is that this is believed to be how animals also learn to recognize scenes and do a lot of vision learning. We kind of assume that things haven't changed much. Our brains do, um, over short periods of time, and then kind of distinguish that from like later periods where a new scene might be arising in that anyway, helps inform how we represent the world internally within our brains. So really interesting neuroscience and AI paper all bundled together into one. Next paper, circuit component reuse across tasks in transformer language models. So this is going to get a little bit technical. This is in the field of interoperability and specifically mechanistic interoperability, which is a thing that anthropic and certain researchers have pushed hard on where essentially you try to discover interpretable parts of a big messy neural net that you can say, okay, this combination of elements of a neural net does this thing. And that roughly speaking is what a circuit is like, you know, a combination of these smaller parts together, do whatever a model is doing. And so this paper is looking at a specific circled circuit. There was a previous paper from, uh, 2022 where they discovered a circuit for indirect object identification. So a set of units in the network that seemed to really specialize in this task. And this paper looked and saw that in another task, the same circuit was actually reused. So this is meant to showcase the usefulness of this for interoperability, where if you identify a bunch of circuits, you can then for various situations, be able to understand what the neural net is doing in terms of the circuits being involved. Yeah, this is actually pretty significant from a safety standpoint. One of the big hopes for people who are pessimistic about how hard it will be to align AI systems, align super intelligent AI systems, AGI's and make them safe. Um, one of the, one of the hopes is, look, we may not be able to do that. That may not be possible, but at least we can be able to understand the reasoning of these systems so that they don't take us by surprise and therefore analyze, you know, what the goings on are inside these networks. So mechanistic interpretability, which by the way, is something that Anthropic is really big on and they've done some excellent work here, um, is, uh, is really focused on in part answering that question. And one concern has been, look, we are able to identify circuits that are associated with certain ideas. You know, that's great for this AI safety question. Um, but usually the way you do that is you identify those circuits in the context of a very narrow problem set. So you look at like, you know, identifying cars and you're like, oh, okay. You know, these neurons always fire when there's a car in the picture. Um, but in the, like the, the, the worry here is does that actually generalize like in the more general case where we just have a system, we don't know what its reasoning process is. It could be a very open ended, you know, language model. model, reasoning on a wide range of tasks, will the things that we have learned by studying one specific use case actually generalize? And in the most pessimistic case, every different task could be handled uniquely by the model, like in a different way. And if that's true, then having a circuit for every task would leave us, as they put in the paper, leave us no better off from an interpretability standpoint than having the full model itself. Like, basically, everything is bespoke, every single task. And so it really doesn't, like, your interpretability work doesn't generalize. It's kind of useless. This is why they're really so focused on the idea of reuse of these circuits, right? Can we show that in two different tasks, we actually find reuse of a given circuit? The tasks they choose are related, or at least they believe that they're somewhat related, but they're not exactly overlapping. And you can sort of see how early on we are in mechanistic interpretability, as people have to kind of come up with very, very controlled experiments to validate that, in fact, there is a reuse of a particular circuit. There's a lot more to say about this paper. I think this is something really, really fascinating. At some point, I want to do a deeper dive into this whole field. Actually, maybe this is the stuff of a special episode. But for now, just, like, worth flagging, this is all research that was done on a version of GPT-2, right? So a lot of mechanistic interpretability research, you'll see it done on smaller versions, sort of older models that are easier to work with. This is a big part of the reason. So the space is moving fast, to be clear. Interpretability research is moving by leaps and bounds, especially now that there's so much attention on it for safety reasons. But still, you can tell how early days it is by how tightly controlled these experiments have to be, and how limited the generalization ability of a lot of these tools, at least is suspected to be at this point. And one last story for this section, a shocking amount of the web is already AI translated trash. That's the headline of the story. And yeah, this is about a study by researchers at Amazon, who found that over half of the sentences on the internet have been translated into two or more languages, and often with poor quality due to AI or machine translation, thus the AI translated trash. So they generated a corpus of 6.38 billion sentences from the web, and found that 57.1% of the sentences were translated into at least three languages. To be a little bit more detailed, the trash here is more often than not when translation is towards low resource languages, so resources that are not spoken as much, or don't have as much data on them, like English or French. So for low resource languages, like African languages, like Xhosa, they say that they basically have much worse translations, which kind of makes sense. You would expect the models to be worse at translating for these low resource languages. Yeah, so I guess another indicator of the web and just our digital world being kind of, maybe you could say spammed, or just generally filled with AI generated stuff, as I'm sure will be the case going forward, pretty much indefinitely. Yeah, I think there's also this implicit question here about the selection bias, right? The worse the generation is, the easier it is to detect as AI generated content. That sort of makes me wonder, are we potentially undercounting the amount of AI generated high quality content that's out there? Really hard to know, because of course, in the limit, it starts to look exactly like human generated text. And so I think this is just one of those additional challenges we have. It's difficult to even look around at us and sort of concretely, comprehensively assess what the internet is made up of today, because AI has gotten so good. And onto the policy and safety section, starting out with a throwback to the very first thing we touched on in the episode, Taylor Swift once again. So this news is that the Taylor Swift AI images prompt US bill to tackle non-conceptual sexual defects. So yeah, this is like hot off the press yesterday evening, just before we started recording, a bipartisan group of US senators introduced a bill Tuesday that would criminalize the spread of non-conceptual and sexualized images generating by AI. So this is coming from Dick Durbin of the US Senate Majority Whip, and Senators Lindsey Graham, Amy Klobuchar, and Josh Hawley. The bill itself is the Disrupt Explicit Forged Images and Non-Consensual Edits Act of 2024, or for short, the Defiance Act. And yeah, they explicitly call out Taylor Swift. To quote here, this month, fake sexual explicit images of Taylor Swift were generated by AI swept across social media. Although the imagery may be fake, the harm to the victims from redistribution is very real. So this is from a press release, and presumably, I guess, if it worked as before, but I guess possibly because of the story coming out, they did go ahead and introduce it soon after. Yeah. I was just realizing, so first of all, the title of draft bills is always hilarious. They have to have an acronym for it. The CHIPS Act. Anyway, it's a very common thing. It's also, this is the first time I've actually realized this is the same thing that AI people do with their papers as well. We come up with elaborate reasons that it has to be called LAMA, and the acronym is all bungled, and you can tell the people are trying really hard. So hey, Congress has something in common with AI researchers. But yeah. No, really, it's interesting to see this come off so recently off the heels of the Taylor Swift thing that obviously made a really big impact, and it's so hard to predict which of these things will land and which ones won't. Because the set of issues has been alive for a long, long time. This is not the first time we've been talking about, what do you call them, deepfakes or anything like that. But also interesting is the bipartisan consensus around this. The bipartisan nature of these issues varies a lot from one to the next. On stuff like this, you tend to see a lot of consensus. You tend to see less consensus on anticipatory stuff, like before certain risks manifest, up to and including election interference stuff, but also physical risk from AI systems that we may have plenty of reason to think is coming. There tends to be a lot more resistance to that sort of thing. So this is a nice way to flex those bipartisan muscles, and hopefully something like this will end up being incorporated into US law. And I do think it's interesting, just last week we covered the Preventing Deepfakes of Intimate Images Act that was also bipartisan and also introduced, actually reintroduced last week. So this seemingly is a totally separate effort that was just done in parallel, also dealing with sexually explicit deepfakes. As we covered last week, that other act was inspired by a totally different event dealing with deepfakes at a high school. Here I suppose they do call out Taylor Swift as an example, but it probably is a broader pattern. So the details here are probably different in terms of the specifics of a deal, but broadly speaking, this will explicitly allow people to sue for spreading non-conceptually AI generated imagery. Yeah. And the liability piece, we'll talk about another bill in a minute, but the liability piece is so, so important because that's what causes corporations to move, right? If they have the sense that, wait, actually we have legal exposure for doing X, Y, or Z, then they move and cover that base. That's why you see a lot of lobbying against liability and the idea that, hey, we don't know enough yet to impose liability. Maybe we have a safe harbor where as long as the corporation does certain safety-related actions, then they're let off the hook. But the problem is that we don't know what safety-related actions are required to prevent things like jailbreaks, as we've talked about earlier in this episode. And so as a result, all you can really do is say, hey, there's a liability regime. I don't care how you do it, but it's now your responsibility to throw the billions and billions of dollars that you have at solving this problem. And I'm sure things will have a magical way of sorting themselves out once the incentive is in place. But that's sort of the position at least of a lot of the people who advocate for these sorts of liability frameworks. Next story, OpenAI and Google will be required to notify the government about AI models. This is an announcement from the US Secretary of Commerce Gina Raimondo that said that there will be this new requirement that will mandate companies to share details every time we train a new large language model. And these companies will have to share the safety data for review. This is part of the AI executive order from last year from the Biden administration. We discussed a few times how there were some of these details in there on like, if you're training a very, very big model, there are some additional, I guess, responsibilities or actions you have to take. And so this is kind of an update on that, so to speak, where once again, it's being reiterated that this will be a component and something that the companies will have to abide by. Yeah. And you can see the administration reiterating the grounds that they claim to have to impose this requirement on companies. So for context, the vehicle they're using to allow the government to do what in the US is highly unusual, having the government step in and tell private companies that they will report on certain activities, especially in the United States where we have a tradition of free market sort of liberalism, this is highly unusual. The thing that makes the government able to do this is this thing called the Defense Production Act. This was invoked in the White House's executive order that Biden signed fairly recently. And it's going to come up again a little bit in a couple of minutes, but for context, this is an act that came out like in the 50s and it was meant to respond to basically the start of the Korean War. It was part of an exercise to get mobilization of civil defense and kind of broader society to support the war effort. It's something that you're only really meant to invoke when there's a national security emergency. And it gives the president the power to require businesses to do certain things. The last time it was invoked was in 2021, actually, so fairly recently in the context of COVID, just to basically order companies to start producing pandemic-related protective equipment. So this is something that you usually need to find is heavily justified. Of course, the current administration believes that it is on the grounds of a number of things, including the weaponization potential of these systems. We've already seen these, they're able to carry out autonomous cyber attacks and AI scaling suggests that that's going to get potentially much, much worse very soon. So you kind of need to get ahead of the ball on this. But still the fundamental basis for this reporting requirement is something that is increasingly coming under fire. And you can see the administration now trying to be clear about the justification for that. A justification, which I actually think is quite strong just on the basis of some of the national security work I've been doing on AI, I think this is absolutely sensible, but it does mean we have to have a conversation about it. And I do understand arguments that say, no, this is kind of, this is unprecedented. It's maybe not appropriate in this context or whatever, ultimately it boils down to how well we understand or buy into the risk picture with AI. And that's sort of the core here. Is there a national security emergency? So Gino Raimondo is now, the Secretary of Commerce is now sort of having to answer questions about this and talk about additional requirements, like know your customer requirements that are being imposed like on cloud service providers, which now we're going to require folks to ask a whole bunch of questions to their customers before they let them do big training runs. And this is a way of being consistent with their export control policy. So basically saying, look, as she says, we use export controls on chips. Those chips are in American cloud data centers. And so we also have to think about closing down that avenue for potential malicious activity as she puts it. But basically, China potentially using the cloud to bypass the controls on high-end chips that already exist. So really interesting package. This blends politics and technical stuff all together. And it's absolutely, I mean, this is the debate right now on the Hill in terms of what's appropriate, what authorities can be invoked to manage this stuff in the short term before we have legislation that Congress has to pass probably, hopefully we'll see in 2024. And speaking of that debate, starting out the lightning round, the story we have is the campaign to take down the Biden AI executive order, which is all about that same topic. It's really going into how the use of the Defense Production Act is facing opposition from lawmakers and tech lobbyists and conservative groups that argue that basically this is overreach. It's not a legitimate way of doing it. So quite a bit of detail here on this push by different groups against it. One specific detail is that the Republican Senate Commerce staff are reportedly slowing down all AI regulation going through their committee. Another detail is that the Americans for Prosperity Foundation, a nonprofit founded by the Koch brothers and some lobbies basically have filed Freedom of Information Act requests and a lawsuit against the Commerce Department and another department within the national government demanding agency records on the Defense Production Act and AI. So yeah, some controversy over this, some pushback over the ability to do this going on. And this is really down to that question that we were talking about just a couple of minutes ago of whether we think there is a genuine national security concern here. This is certainly the position of the White House, and there was actually a quote explicitly saying that from Ben Buchanan, who's the White House Special Advisor on AI. He says, yeah, we invoked the Defense Production Act's emergency power because, and this is the quote, because there is, no kidding, a national security concern. You can debate this, we can debate whether that's true or not. But if it is true, then quickly you fall into the territory where this can be justified. Definitely get the arguments against invoking the DPA against this whole line of reasoning. Myself, I'm a guy, a libertarian, I'm a tech guy, Silicon Valley, I've started a lot of companies and I don't like regulation. I think it's a really bad thing and we definitely need regulation for a narrow range of issues in the world when there are risks that are generated that can't be controlled by the market. The vast majority of problems have free market solutions, but the question is whether this is one of them. Certainly, I think, having spent all the time that we have, we're working in the national security space, looking at what these models can do already, and then looking at what scale seems poised to deliver in the next year or two, it's really difficult for me to say that there is not a national security emergency here. This is something that's evolving super fast. If you don't get ahead of the ball, these models can't be deleted once they're shared on the internet. Once people buy very powerful AI processors, you can't take them back. Right now, we don't have an easy way of tracking them all. If there's an algorithmic breakthrough that makes it possible for them to be used for very dangerous things, you can't put the toothpaste back into the tube. We do have Senator Mike Rounds, who's a Republican senator, coming out and saying there's not a national emergency on AI. He's a South Dakota Republican, and he's worked with Chuck Schumer on some AI legislation, but he's opposed to the use of the Defense Production Act, because he says, really, this is not necessarily what the DPA was made for in the first place. Again, really understand that. I think this is a very challenging debate, and everybody's trying to be thoughtful about it. From my perspective, it pretty clearly is a significant risk, and arguably, I would say an emergency, but there's a lot of room here for people to figure out what makes sense and how we cover these bases. It may be through the Defense Production Act, or it may be something else, but somehow, probably we'll want to cover these bases. Once again, another story on US politics. The next story is Representative Jeff Jackson introduces a bipartisan Cloud AI Act to stop China from remotely using American technology to build AI tools. As we've covered quite a lot, there is an export ban so that you cannot buy GPUs and other things that are used for running and training AI models, but you can still pay to use this hardware via the Cloud. That's what most companies actually do. They don't necessarily buy a ton of hardware. They just pay Amazon or Google or Microsoft or many other potential companies to use their GPUs on the Cloud and do their own training and inference and so on without having to actually set up their own server farms and so on. There's now this Cloud AI Act, Closing Loopholes for the Overseas Use and Development of Artificial Intelligence Act that is basically closing that loophole, I guess you could say, of saying if you're in China, you will not be able to access GPUs via the Cloud. Yeah. The premise here is if there's a ban on exporting certain GPUs to China, then there should also be a ban on Chinese domiciled organizations or individuals accessing those same GPUs if they just happen to be served up on the Cloud. That's the premise here. There is actually a debate here, by the way. This is not viewed by everyone as strictly a loophole. One of the reasons that this could be not a bad thing, that is the Cloud compute loophole, is that Cloud use is trackable. It means that the US can actually monitor Chinese use of AI systems to the extent they use the Cloud. It also has this effect of reducing the Chinese domestic incentive to develop an independent compute supply chain because you're essentially taking business away that otherwise would be going to Huawei and would allow them, for example, to ratchet up their production and compete ultimately more with the Western market. The flip side of this is that China has actually made it a policy priority to develop a domestic AI supply chain. Maybe this doesn't matter at all, and at the end of the day, they're going to do it anyway. They're going to prevent people from using Western domiciled Cloud services or try to artificially inflate their domestic Cloud. It's this interesting debate. It's not 100% clear either way. It's an interesting piece of legislation. It's definitely one way to go on this. By the way, you can read it. I did. It's surprisingly short. It's four pages long. Anyway, it talks a lot about the risk of weaponization and all kinds of stuff like that. Worth giving it a read if you're a policy nerd. Next up, a follow-up to a story we covered last week. We heard about how there was a seeming deepfake calling, a robocall happening out in New Hampshire telling people not to vote in the New Hampshire primary. This call had someone who sounded like Joe Biden, AI's generated audio. The story is that AI startup 11 Labs has banned the account that has been blamed for that audio. What we've seen is that this audio that was used in a robocall was generated by 11 Labs, this leading creator of text to audio for voice synthesis. We also covered last week how they have reached unicorn status. It has been blamed or at least sourced to them, and now the user's account has been suspended as yet another sign of how the platforms that enable you to generate various media of AI will have to take on this responsibility to really prevent misuse of this sort. We're wrapping up with how the West can match Russia in drone innovation. This is an article by the very talented and bright Sam Bendet, who full disclosure I know, and Jane Pinellas, who is a senior official responsible for a lot of AI test and evaluation stuff in the US Department of Defense. Both very thoughtful people involved deeply in the US or tracking deeply, actually involved in Jane's case, but Sam as well, tracking the state of US DOD policy and the Russia situation in particular. Really quick overview. It's a long piece. It has a lot of good detail if you're interested in it, but one of the key differences between Russia and the United States when it comes to the automation of warfare is this idea that the West is focused more on having systems that autonomously observe and orient themselves and maybe make some simple decisions, but they don't tend to outright act in the battlefield in a very liberal, uncontrolled way without human intervention, whereas Russia is trying to automate what's known as the whole kill chain, so the chain from observing its environment to orienting itself to making decisions and then actually acting on a decision to take out a target. Russia is really trying to automate the full stack, and one of the challenges the United States faces is that they don't really want to automate all that quickly because they have a set of responsible AI principles that they're tied to as a liberal democracy. They have the values you might imagine. This is enshrined to a significant degree in DOD Directive 3000.09. We've talked about that one in the past, but this really is ... It's not quite right to interpret it as saying that you always need a human in the loop, but it's sometimes interpreted slightly falsely as saying that. It definitely does impose a lot of requirements, ethical requirements, testing evaluation requirements on US DOD use of autonomous systems. The other thing is there's apparently ... Russia has a system called Sturm 1.2. It's a heavy quadcopter drone, and supposedly it can drop projectiles without involving a human at all. It's also used as a kamikaze drone and all this sort of thing. They've got a whole bunch of examples of these autonomous systems that Russia's fielding, but the US would never field, at least currently, based on the constraints that they're facing. One of the most amazing things to me in this article was this reference to Russia actively testing their commercial systems in live military operations. Technically you would do that if you were unbound, unrestricted by ethical, moral considerations around this stuff. If you just wanted to say, hey, you know what? We got a new system. Let's see how it does. Let's just ship it. This also helps them with their development process. They're just able to throw these things out without really doing that much testing and evaluation. Needless to say, this is not something a US military would ever do. One of the big challenges that they uncover here is the big need for the US DOD to modernize its acquisitions process and blend that in with testing and evaluation because they've got to move faster. If the bar is higher for the Department of Defense because they have these ethical and moral guidelines that they're trying to follow, then they just need to get faster at meeting those bars. I will say it's something we've seen firsthand just how dedicated the US government is to safety when we deployed GPT-4 powered applications and stuff in the US DOD. I think we were actually the first ones ever to do that, by the way, a little bragging point. The amount of focus on safety and testing and evaluation and integration of those systems is really impressive. It is in sharp contrast to the approach that seems to be used right now by Russia. That's a really big handicap for the US going into this conflict. It's something that structurally, either that bar has to be lowered or we need to get faster at meeting that bar. Ultimately, you get into a situation where when your adversaries are willing to move faster than you are and be more reckless, it creates a race to the bottom. We need to avoid that, of course, at all costs, especially in these DOD applications. Yeah. We've been doing this podcast since March of 2020. In that space, there's been a couple of news stories on AI guided drones, drones that autonomously find a target and go for it. It has sort of flown under the radar. To this day, there's no significant automation happening as far as I guess is generally understood, but I think it's very much something that is kind of there to waiting to happen, so to speak. It'll be interesting to see if this year we'll start seeing much more automation of at least drone attacks with AI or if there's going to be more of a conversation about AI enabled weaponry and regulations and so on around that. And with that, we are done with another episode of Last Week in AI. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at lastweekin.ai. You can get in touch with us by emailing contact at lastweekin.ai or Jeremy at, is it hello at Gladstone or Jeremy? It is hello. That's not my first name though. That's just hello. Hello at Gladstone.ai. Hello at Gladstone.ai. And as always, we appreciate it if you share this with your friends, if you give us nice reviews and generally make us feel nice about how our podcast is going, but more than anything, we love to see that people are actually listening and benefiting from us recording all this for two hours. So please keep tuning in.