Last week in AI - Copilot Pro, LLama.cpp, conversational diagnostic AI, secret AI diplomacy

In the latest episode, hosts Andrey Kurenkov and Jeremy from Stanford and Gladstone AI weigh in on the newest developments, showcasing the consumerization of advanced AI tools, and exploring the heights of AI-enhanced drug discovery and hardware innovation.

Microsoft's New AI-Fueled Future

Microsoft debuts Copilot Pro, a subscription offering AI features in Office Suite.
For $20 monthly, experience the latest AI models and create your own GPT.
Microsoft integrates GPT-4 into their Copilot app, marking a step in AI accessibility.

Amazon's AI Shopping Assistant

Ask AI about products and receive distilled information from reviews and listings.
A unique move to retain search traffic typically led to products on Amazon.

AI in Drug Discovery Fast Lane

DeepMind’s Isomorphic Labs signs big pharma deals aiming to hasten drug discoveries.
Combining AI precision with pharmaceutical expertise, the goal is a reduced discovery timeline.

Competition in the AI Hardware Arena

China’s reported development of 1600-core AI-optimized chips mirrors US Cerebrus designs.
This advancement reflects intense global competition toward AI hardware mastery.

Tune in for future episodes to stay informed on the latest AI breakthroughs, policies, and the converging paths of technology and society.

👉Listen to the full episode here👈

Read the full discussion in the transcript below 👇

Audio 01-21-2024 (1).mp3/2024-01-23

Hello and welcome to Skynet Today's Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin.ai for articles we did not cover in this episode. I am one of your hosts, Andrey Kurenkov. I finished my PhD focused on AI at Stanford earlier this year, and I now work at a generative AI startup. And I'm your other host. I'm actually back from Washington, DC doing a bunch of briefings and stuff like that for the AI safety company that I co-founded, Gladstone AI. My name is Jeremy. I probably should have said that, but you guys probably know that if you're listening to the podcast. We have a ... I guess we were out with COVID and whatever and did a COVID-y podcast a few weeks ago. But I just wanted to say, I was looking at iTunes for the comments, and there's somebody there who was like, yo, you guys are hardcore. You guys did a podcast while you're sick. And for some reason, I super appreciated that comment in particular. I was like, hey, you know what? Oh man, sorry, now it sounds like I'm spitting. You triggered it. Yeah, I triggered it. Anyway, so really nice. There were a bunch of nice comments to come back to from the trip, so just wanted to highlight that. I appreciate it. It's a lot of fun. That's right. Yeah. As I mentioned last episode, those reviews definitely are heartwarming. I think two of them mentioned the sickness, and then we did just get an email from Pete about reviewing our quote, awesome podcast. So thank you, Pete, as well. We do always appreciate it. So thank you, everyone. And just one more thing before we get to the news, we need to do our sponsor ad read. And once again, we are going to promote the Super Data Science Podcast. The Super Data Science Podcast is one of the most listened to data science podcasts out there. It covers machine learning, AI, data careers, everything. Interviews with a ton of experts. And it is hosted by John Krohn, the chief data scientist and co-founder of a machine learning company, Nebula, and the author of a bestselling book, Deep Learning Illustrated. And as I say, every time we promote the podcast, he's been on the podcast himself. He has co-hosted now, I think, twice. And if you listen to those or listen to his podcast, you know that he is extremely knowledgeable and just good at podcasting, I would say. So yeah, if you want a resource that's not just looking at the news, that's talking to people in the world of AI, the Super Data Science Podcast is one you should check out. It's also a pretty funny dude. Just putting that out there. Pretty funny guy. And with that, let us go ahead and dive into the news with our first section being tools and apps and our first story being Microsoft's new Copilot Pro brings AI-powered office features to the rest of us. So there is now Copilot Pro, which is a $20 monthly subscription service that offers AI-powered features in Word, Excel, PowerPoint, all that. And it also provides priority access to the latest open AI models and the ability to build your own Copilot GPT. So this is basically like the home version of the enterprise subscription to Copilot that has been already around for a little while. It doesn't actually include the ability to generate a PowerPoint deck from a Word document that apparently is available to business users. So I guess that is not included, but otherwise it's a lot of the same stuff. That's got to be the most 2024 thing I've ever heard. It doesn't really include the ability to turn your words into PowerPoint slides, which is honestly kind of bullshit. You know what I mean? When did we get here? When did we get here? But yeah, I think it's just part of the consumerization of this stuff, right? Our expectations are just getting higher and higher. Obviously GPT-5 is going to be coming out probably fairly soon. We'll get to that later in the show. But GPT-4 now, plausibly, I think the single most widely deployed and used LLM at scale, based on this at least, in consumer products for this level of capability. So really cool. Yeah, Microsoft continuing to fulfill their promise that they made several years ago now to kind of infuse our lives with more and more AI as part of this OpenAI partnership. And we have just a small little related tidbit to cover that I noticed, kind of a funny little detail. So another story is that Microsoft Corp is using the previously paywall GPT-4 Turbo saving you $20 a month. So this is kind of a pro tip, I guess. We mentioned that there's now that Microsoft Copilot app that is available. So this is not the extension to World, Excel, et cetera. This is just the chatting feature. And yeah, now it's apparently using GPT-4 Turbo, one of their better models, now GPT-3.5. So if you want to be cheap, you can use that app and use the better models without paying for chat GPT-Pro. And I think one of the kind of relevant strategic considerations here, too, is as you see the free tier climb up in capability, it used to be like GPT-3.5 was the free version of chat GPT. That's all you could get for free. And now it's GPT-4. This implies that OpenAI is probably finding an awful lot of ways to do inference much more cheaply on the back end for GPT-4. And so it's continuing that race to churn out these models, the highly capable models for free in a wider and wider range of contexts. So I think this is a big challenge when you look at breaking into this space. If you have a big shiny new LLM, you're now competing like the floor for free products is now up to GPT-4 level. So you'd better hope that you can offer a kind of comparative advantage relative to that system if you're going to charge for it. And then OpenAI has to make sure that the next model they put out can have a comparative advantage relative to GPT-4 that's big enough to justify that price delta. So anyway, kind of an interesting race to the bottom on price that we're seeing unfold. I think faster, honestly, than I would have expected. Yeah. Well, to be fair, I think part of the race reason here is that most people just don't know about Microsoft's copilot so they can afford to offer a better bet. That's a good point. If more people knew about it, probably they may reconsider doing the expensive model. But as is, we can reap the benefits of not so many people downloading and using this app seemingly. And just one more story for this section, kind of a short roundup this week. The story is that Amazon launches generative AI tool to answer shoppers' questions. And the idea is you can ask about a product and this AI can summarize information from product reviews and a product listing and answer your question. This feature is currently being tested and will be part of the mobile app. So yeah, that's kind of nifty, I guess. I would use that. Yeah. I mean, it's another, I guess, way that Amazon is distinguishing itself from Google. There's this famous Google-Amazon rivalry when it comes to search. I think this is kind of an interesting instance of that. As I understand it, the most valuable Google searches are actually searches for Amazon, like that lead to Amazon products. And so, you know, strategically, I'm not too sure. This isn't my area, but it's interesting. It might be part of that kind of trying to grab more and more of that search stack, the early stage of the product discovery process and onboard that on Amazon itself. And next up, moving on to applications and business, we're opening with DeepMind spinoff aims to have drug discovery times following big pharma deals. And this is about the, well, the DeepMind backed company or the spinoff called Isomorphic Labs. Now, notably, Dennis Esabas, who is the CEO of DeepMind and its co-founder also leads Isomorphic Labs. And their goal is to reduce the costs and the time associated with the discovery stage of drug development. So basically identifying potential new drugs before clinical trials. And they think that they can cut this time from five years down to two years, at least that's their sort of corporate mission, the way they're setting things up. Notably, Isomorphic Labs is powered by DeepMind's AlphaFold and then sort of subsequent breakthroughs that have been made on top of that. So that you might remember famously was this model that could predict the structure of proteins, the way they would fold, which the sort of so-called protein folding problem was an open problem in molecular biology for a long, long time. So right now what they're announcing is they have their first couple of partnerships with pharmaceutical companies. One is Eli Lilly and the other is Novartis. And what's really interesting is this article talked about how historically Isomorphic Labs has decided not to pursue partnerships that otherwise they say could have easily pursued for money because they wanted to focus on only collaborations that would improve their core technology. This is a quote from Demis Hassabis in the article, he said, we could probably sign up a dozen partnerships today if we wanted to, but then it would cause us to fragment too much to make more bespoke solutions for the individual programs. And as a startup founder, this is always the thing that you tell yourself, the one advantage you have over the big companies or older, more established companies is your ability to focus. So really interesting that Demis is kind of lasering in on that. So these deals are kind of interesting. They're quite big, maybe not as massive as a typical pharma deal. So Lilly is going to pay $45 million up front and Novartis will pay $37.5 million. And then there are additional, in both cases, over a billion dollars of sort of follow-on funds that could come based on performance. So really interesting, definitely signals, we've seen this a lot with DeepMind over the last three years or so, their activities have switched from being sort of loss, the centers of loss for the parent company, Google, to being centers of profit. Like all of a sudden they've hit this sort of liftoff velocity, DeepMind became profitable sometime around like 2021 or so. And now it seems we're getting some indications of traction for isomorphic labs too. So sort of interesting to see how AI is now drifting into positive ROI territory, even in some of the more speculative ventures. Just to jump into some of the details here, the numbers are pretty significant. So the company Isomorphic Labs was losing about 17 million pounds last year, and then 3 million the year before it was founded in 2021. So, so far it's been losing a lot of money while I guess doing fundamental research. These deals is getting 45 million upfront from Liddy and 37.5 million upfront from Novartis. So pretty significant amount of money to fund whatever research we want to do. And then there's billions that will come kind of if stuff works out seemingly in these deals. So yeah, I think pretty indicative, as you said, of a trend for DeepMind to try and become more of a profit-driven company. Also I think a bit maybe indicative of a trend in general in tech of cutting losses and streamlining and making things more profit-driven. There's been a lot of firings going on, there's been a lot of reorganizations, and it seems like generally the tech world is trying to become a little more efficient and driven by profit. I wonder how much of this is coming from a sort of Google direction, Alphabet direction, wanting to make money and not necessarily from DeepMind or SMF labs, but of course that's speculation. No, that's, it's interesting speculation, right? Because then the other force that you have pushing is just like AI capabilities are accelerating and we're seeing progress in specifically in like molecular biology applications. So you could see those sort of two curves going up at the same time and which one is actually driving this. My guess would be it is the capabilities advance, just because this whole spinoff has been consistent with Google's approach with DeepMind in the past, where DeepMind was pretty insulated from the fluctuations in the wider market. Google seems to see it as like a long-term strategic play to protect even in down markets. But this could be an exception, this could be another approach that they're taking with isomorphic. But either way, you're absolutely right, these numbers are pretty remarkable. Moving on to the next story, China is planning 1600 core chips that uses an entire wafer and that's similar to American company Cerebrus wafer scale designs. So that's the title and this is about scientists from the Institute of Computing Technology at the Chinese Academy of Sciences that have introduced this design. So they have a 256 core design and are planning to scale up to the 1600 core chips. The reason this is significant is that this 1600 core chip that calls Cerebrus wafer scale chip design is optimized specifically for AI. It's kind of different way of making chips that are massively parallelized, that are meant to be better for AI, more efficient, broadly speaking. And so it's interesting to see that something like it is being developed in China in addition to the US. So Cerebrus have been active for quite a while in the US. Yeah. And just for listeners who might be less familiar with the whole hardware story, if I can finish my sentence. So wafers are these sort of like circular, you can think of them as like a disc of silicon. And these are, they're usually actually pretty big, they're decently sized things. They're the things that you sort of draw your semiconductor circuits on when you make chips. So you usually take this big wafer and then you divide it into a bunch of small like dyes. And the dyes are sort of squarish components, squarish elements that make up this big circular wafer. And essentially you're going to then etch onto those, the actual chips that you're going to make with the cores that do the processing. What's weird here is normally you kind of like have separate chips, they don't sit on the same wafer. They get split up and then packaged to make GPUs or whatever hardware you want. But in this case, it really is like we're going to keep everything on this big circular wafer. So you end up with this really large object, this kind of awkwardly large object. And what ends up happening is because you're doing all of your packaging on this one big wafer, you can actually connect the cores that you build on it using this really low latency interconnect. And that just makes the communication between all of your cores and all your sort of chiplets here that you have on this giant wafer way more efficient. And so this is just a way to massively scale up the efficiency of the system. And this is exactly why Cerebris has been experimenting with this technology. What this breakthrough shows is it allows them to in principle scale up to 1600 cores. And that's a really kind of big deal, especially given that this is a domestic breakthrough happening in China, where again, they're sort of struggling to set up their domestic AI hardware ecosystem. So an interesting breakthrough. And one that does, as you say, mirror what Cerebris is doing in the West. Like what is called the Zhejiang Big Chip, because yeah, as you said, I guess kind of a takeaway of a big picture story is that these wafers are just giant chips. Like think of your CPU, right, with little square, multiply it by like 20. And that's one of these like big circles that just have a ton of circuitry that is all embedded in there. And for context, like the size we're looking at is like, you know, it could be like, maybe around say, 30 centimeters in size, or like you're talking about like a big old, like these wafers can be pretty big. So yeah, definitely awkwardly, awkwardly large in some cases, in some cases, even bigger actually. Moving on to lightning round. First story is that chatGPT will have video functionality and more accuracy in future versions, according to an interview with Sam Altman. So yeah, there's a conversation. And I guess kind of unsurprisingly, perhaps, but we got a hint of what GPT-5 might be. And the kind of exciting bit is that it seems like it will be fully multimodal. So it will support speech, image, code, and video, similar to what Gemini already does, where Gemini from Google does support additional modalities beyond just images and text that GPT-4 currently supports. Yeah, in some ways, this is maybe like the least surprising story, because I think everybody expected, you know, GPT-5 to be bigger, better and multimodal. And so we're sort of learning that one of the kind of big things that we've heard out of Sam Altman, it didn't come from this particular interview, by the way, but which was on the Unconfuse Me podcast, which was part of a conversation with Bill Gates. But Sam Altman also was speaking at a Y Combinator event. So notably, so Sam A used to be the president of Y Combinator, actually, he was at the time that I went through Y Combinator. And there, he apparently told the founders, you know, you want to be building with the assumption in mind that we're going to hit AGI relatively soon. So he's now kind of projecting that down into the development timeline for startups and saying, hey, you know, strategically, this is a key factor. In the same way, I think GPT-5, you know, we don't know what startups GPT-5 might make sort of irrelevant, in essence, in the same way that like, little breakthroughs to GPT, like PDF uploads, just kneecapped like half a dozen startups right then and there. It's an interesting question, you know, what's going to happen? Clearly the question of will it be a super intelligence came up like, is this AGI? Is GPT-5 the thing? There were some rumors on Twitter like last year about that kind of idea. And Sam Altman unsurprisingly says, no, you know, everybody expects us to have AGI at this point. I feel like that's basically, you know, what people will be disappointed if we have anything short of that. Their assessment is it will fall short of that for whatever definition we have of AGI, which is something. But kind of an interesting interview worth checking out. Sam Altman's definitely been doing a lot more of the media circuit lately. Next story, Huawei teardown shows 5nm chips are made in Taiwan and not in China. So this is following up a story we covered, I think a couple of weeks ago. Basically there was a teardown of Huawei's latest laptop that showed that the 5nm chip was made by TSMC, not by Huawei, not in China. And that does debunk these rumors that potentially they were developing the capability to do so within the country. And of course, as we say, this has important implications for AI because that size is very significant for producing AI chips specifically. Yeah, that's right. Specifically to the H100 GPU. So just for context, the famous NVIDIA A100 GPU, which until recently was the top of the line GPU, that's the one that was used to train GPT-4, that got replaced. That one was designed using a 7nm process, which by the way, China can now do. Their SMIC, their domestic answer to TSMC, which is the Taiwan-based semiconductor manufacturing company. So SMIC, China's version, they can now do 7nm. They can do NVIDIA A100 equivalent chips. What we thought they might have been able to do, and this was actually even more shocking, was the 5nm process, which would allow them to make the H100 GPU, and therefore, in principle, have the domestic capacity to train GPT-5 level systems, because GPT-5 was trained, that is, on an NVIDIA H100. So strategically, that would have been a really, really big deal. Turns out that Tech Insights got their hands on a processor that was suspected to be one of these domestic 5nm chips, and it turned out not to be. It turned out to have been made by TSMC. So the solution to this mystery seems to be, yes, indeed, Huawei did come out with this new laptop that had a 5nm processor, but that 5nm processor was purchased from TSMC, the Taiwanese-based world-leading semiconductor manufacturing company, and not manufactured domestically. So this is not a Chinese domestic breakthrough in the mainland. It is the import of pre-existing capabilities that we already knew existed, and that import seems to have happened before TSMC cut off ties with Huawei as a response to US pressure to impose export controls via the envy list, as it's known, in 2019. Next story, OpenAI's news publisher deals reportedly top out at $5 million a year. So we got some info as to the offers that OpenAI is making to news publishers to license their data. According to the information, OpenAI is reportedly offering between $1 million and $5 million a year to license these copyrighted news articles. And this is, of course, what they have been negotiating with the New York Times prior to New York Times suing them. This is related to a deal they made with Axel Springer in Europe, and presumably, they are probably having a lot of conversations with many publishers about licensing their data. Yeah, what I found really interesting about this was, so in the article, by the way, they also say Apple is looking to partner with a bunch of these companies to use content for AI training, and they're apparently offering at least $50 million over what they call a multi-year period for data. So without knowing what that multi-year period is, we actually can't directly compare it to the OpenAI offer of $1 to $5 million. But if it's less than 10 years, it certainly is more generous on the surface, it seems, than the OpenAI deal. But what's really interesting to me, so this article talks about how there are similar deals in terms of dollar amount that, for example, Meta has with content-producing companies for non-AI licensing deals. So for example, in Europe, Facebook News, or sorry, Meta, basically is paying up to $3 million a year to license news stories. And so it seems like what OpenAI is trying to do here, and these companies are trying to do, is peg that as the comparable. We're paying you $3 million a year to just kind of run your news stories through our Facebook News tab, for example. And so that seems like that's the value of the news stories here. And it's an interesting question. Is it actually, is it the same thing, right? Is it the same thing to just charge people $3 million a year to have their stuff posted on the Facebook News tab, versus actually to train models that can generate new stories and stuff like that? To me, it's not at all obvious that those are actually accurate comparables, but we'll just see if this ends up working out. I mean, at the end of the day, the reality is that this data is out on the internet, and in practice, whether it is a company like OpenAI, or just like open source developers who ignore copyright protections, and who can ignore them because of where they're based, or the fact that they're decentralized, or whatever, it's not clear to me how much long-term leverage a lot of these publishers do have. So we'll see, and maybe those dollar amounts will start to move as people get a better sense of what the actual market value of the data is. And next story, Insight, DownTropics' unusual $750 million fundraise. So this is an article from Forbes that covers a bit of detail on this recent fundraise that we've covered. And yes, so what's unusual is that unlike typical financing, the money isn't, I guess, immediately being funneled into OnTropic, whereas a special purpose vehicle being created to finance it in a sort of more convoluted matter, seemingly. And my takeaway, and Jeremy, maybe you can correct me, is that it seems like maybe it was partially to raise evaluation and hype on OnTropic. It was kind of like, let's get another funding round and keep the momentum going. And this was a slightly roundabout way to get that number achieved. Yeah. So for context, the way these SPVs, special purpose vehicles, work is essentially you're almost creating a new corporate entity, a new company, if you will, where you can chuck a bunch of money, not just, in this case, a bunch of it seems to have come from Menlo Ventures. So this is one of the smaller investors, previous investors in Anthropic, that now is taking this opportunity to take a much larger stake. And they're pulling together a bunch of money in this SPV, presumably with a bunch of, perhaps with a bunch of other people who kind of can put money into that vehicle as well. They are also directly investing about $250 million straight into Anthropic. So there's this, two things are happening at the same time. Menlo is investing $250 million in Anthropic, and then there's $500 million coming via this special purpose vehicle that kind of has this more bundled structure. It's not terribly clear what the, to me at least, what the advantage of the SPV in this context is. One thing to note that they do note in the article is when you're, so when you're a VC and you're going to invest in a company, usually you don't want it, you're not allowed really by your LPs, like your limited partners, the people in other words who gave you the money that you then invest. Usually you can't invest more than 10 to 15% of the fund in any one company. And so this is sort of a way for them to have another vehicle that can allow them to invest a lot more. It seems to be also a bit of a power move that Menlo Ventures is pulling here because they're sort of throwing all of a sudden, without much warning, a bunch of money at Anthropic that I think Dario is pretty keen to take. So Dario is their, Dario Amodei is their CEO and co-founder. He's pretty keen to take, he does not like fundraising. So I think this is just like a quick win to keep the, keep the war chest full and all that. That would be very consistent with his style, his approach. But the other thing to keep in mind too is there are investors who have what are known as pro rata rights. So when you raise this round, if I already own 5% of the company, well, my 5% is going to get squeezed out. It's going to get diluted by the new money that's coming in. So if I have a pro rata right, I have the right to put more money in to maintain my 5%. It seems like those investors have actually made the choice to maintain their pro rata rights, which is always a good indication of the health of the company, you know, preexisting investors doubling down essentially on their investment and maintaining their stake. So yeah, I mean, it's a, it's a nice injection of cash for Anthropic, definitely keeps their runway nice and high. And it does seem like they're seeing a ton of demand for yeah, for, for their products. And that seems to be what's driving the interest here. And now moving back to OpenAI, Anthropic's competitor, the next story is about how Microsoft executive Dee Templeton is joining OpenAI's board. And she's joining as a non-voting observer. This is of course following up on the huge drama of last year when the board fired Sam Altman, the CEO. He has returned, the entire board was restructured, almost the entire board. Now I think we're still in the process of kind of revising it and this is the latest development in that story. Now Microsoft does have some presence on the board, although as we said, it's non-voting. So this is, I guess, somewhat symbolic per se. Yeah, somewhat symbolic. It does give Satya Nadella, the CEO of Microsoft visibility into the boardroom machinations that he didn't have before. And so the argument from Microsoft, I would guess, would be something like, look, we're not getting surprised again by some, you know, some stunt that the board's going to try to pull. We're not getting visibility into this process. So I think it's on that basis that they're getting this concession. It's also likely to be a concession that is friendly to Sam Altman because of the partnership, the sort of close tie, not just between Microsoft and OpenAI, but between Satya Nadella personally and Sam Altman. They seem to be very aligned in how they're thinking about this. Templeton herself, like I did as much Googling as I could about her. There's not much information about her online. She's apparently a 25 year veteran at Microsoft and currently is the VP for technology and research partnerships and operations. That's according to an article that said that this is according to her LinkedIn profile. And she's already begun attending OpenAI board meetings. She can therefore access confidential information, by the way, but again, does not have voting rights. So it's basically, she has full visibility, open kimono on the company, but, you know, can't go beyond that. She's known for having nurtured some of Microsoft's most significant technical partnerships as they put it, including the cross-functional team specifically that has done all the joint work with OpenAI on Microsoft's end. Some of the key questions that I think everyone has on their mind right now, tracking the story on the heels, of course, of the Sam Altman firing debacle is where does she stand on safety? We have no statements of hers, at least no public statements about where she stands on this question of catastrophic risk from AI or AI accelerationism or whatever. And yeah, so it'll be interesting to see given that that was an important lens that people applied to the Sam Altman situation, we'll see, you know, where she falls on that. Maybe, maybe she'll make some statements about that, but maybe not. And one last story from the section, OpenAI backed 1x raises another a hundred million for the race to humanoid robots. So this is about Norwegian firm 1x, and they've raised a hundred million in their series B round. So pretty significant raise. And I think indicative of the market for humanoid robotics, as the article title said, kind of still being pretty hot. There's a lot of players building humanoid robots and kind of competing to be the winner. And 1x is one of the big players. I guess a hundred million is not like entropic or OpenAI money, but it's still a lot for a fundraise. So pretty impressive. It meets the last week in AI cutoff. I feel like we talked about this a couple episodes or a little while ago where there's so many big fundraisers were like, all right, if it's under a hundred million, come on, come on, chunk change. But I think I'm trying to remember, I think we may have covered them back in the day when they raised their much smaller series A round. So they raised at that time about $24 million. And it caused a splash again, not because $24 million, believe it or not, is a big amount of money. It's actually kind of chunk change in the space, though for a series A, it's a solid series A, but because of who participated. So Tiger Global was a major investor in that round. They're very well-known. They're a really good VC firm, but it was OpenAI's participation in the round that really kind of got people's attention. Notably, so OpenAI, while they did invest in the series A, they have not invested in this latest round. That's always something you want to look out for when you're looking at startups. You have the investors who come in at seed stage, at series A. Do they maintain their, well, we just talked about it, their pro rata rights, right? Do they keep investing to maintain the same fraction of ownership as they did previously? The fact that OpenAI is not getting involved here means one of two things. Either they are not confident enough to wager a larger amount of money at the series B level, because that does imply higher valuation and therefore more confidence in the ultimate product, or it's just that OpenAI currently maybe just doesn't do series Bs. So it's actually quite common for venture arms of big companies like OpenAI to specialize in investing just at the seed stage, or just at series A, and to have some sort of cutoff. So a little unclear here why OpenAI didn't lead this. It is a $100 million round. It is a big deal. The venture, the VC that actually ended up leading this round was EQT Ventures, which I'll confess to not having heard of before. So interesting development, and we'll have to track OpenAI's level of commitment as well in 1x. And on to the projects and open source section, and our first article is MeetLlama.cpp, an open source machine learning library to run the LLAMA model using 4-bit integer quantization on a MacBook. So this is covering the project, as I said, LLAMA.cpp, and it makes it so you can deploy these large language models, LLAMA specifically, on a laptop. So this library is using fancy things like 4-bit integer quantization, GPU explorations via CUDA, and so on and so on. Anyway, point is you can achieve a pretty fast generation speed of 1,400 tokens. That's about, I don't know how to say, but like thousands of characters per second on a MacBook Pro. So this is, I guess, indicative of more broadly in the space of open source and existing models, it's easier and easier to just take a model and run it on your laptop or your local machine without having to pay someone like OpenAI or Entropiq to use their models. So this is a great moment for a brief interlude that we like to call 4-bit quantization. So this is a digression that I think is really worth making, like what is 4-bit integer quantization and why does it matter here? Because we're going to see this more and more. So classically, you think about, you have your AI model with all its weights, right? Billions and billions of these weights. Those weights are stored in a particular number format where numbers have to be represented with a certain number of bits. And the more bits are included, are used to represent each of the weights in your neural network, the more work has to go into training your model, because every time you go to update the weights, you got to do more math basically to get all the digits involved in representing that number to have the right values. So one of the big trends, in fact, the single most important trend by far that has allowed inference times to go down, that has allowed efficiency of AI inference to go up over the last few years, has been reducing the resolution of the weights, essentially lowering the number of bits that we use to represent the weights in our neural network. 4-bit integer quantization is one way of doing that. Instead of having a 32-bit representation or a 16 or an 8-bit representation, we're going out of 4 bits, and that means you have a much, much smaller model, right? It's easier to train, but it's also way faster for inference, because you have way less calculation when you're making predictions that you have to do. You have to propagate values through the neural network using far fewer bits and far fewer flops, essentially units of calculation. So that's really the big thing that they're putting at the center of this. They have various quantization schemes, but 4-bit integer quantization is their headline one that this allows you to do, and they also have support for tons of open source models in this way. They show Lama2, Falcon, Alpaca, GPT4ALL, a ton of these models that you'll have heard about on the podcast before, and there's a ton of really interesting stuff at the hardware level. Really, this is, yet again, the last week in AI podcast telling you that hardware is going to become way, way more important, and to the extent that hardware starts to lead the way here, you're going to want to understand it. So we are trying to cover a little bit more of that stuff when it's relevant, but definitely 4-bit integer quantization or other quantization schemes like this are really, really important to be tracking. So that's a good guide on how to make things more efficient, and our next story is actually quite related on a different dimension. So it's about how to make things better as opposed to more efficient, and this is about LamaPro, progressive Lama with block expansion. It's a new research paper that shows how you can take an existing model and actually improve it. So we have this whole idea of expansion of transformer blocks, and they show how you can take Lama and then, with this technique, build on top of Lama 2.7b to create this LamaPro 8.3b, which is a better model that is better at programming and mathematics specifically. So another, I guess, variation on how you can mess around with models that are released. Yeah, and I think the single biggest take-home here is, and we'll be talking about another paper that focuses on this problem too in a minute, this idea of catastrophic forgetting. So just to situate this in everyone's minds for a minute, you train a neural network on one task, and then you want to make it, let's say it's like chat GPT, and then you want to train chat GPT. GPT specifically on, say, a data set of chemistry papers to make it a specialist in chemistry. Well, the problem that you'll find is if you just do that straight up, you just give it more training on that data set, it'll actually forget a lot of the general knowledge that it had learned previously. And this is known as catastrophic forgetting. It's this idea that you can't pick up specialized knowledge without sacrificing some of the more general knowledge that you learned previously, or that you just can't learn new things without forgetting old things, because those old things were encoded in the values of the weights in the neural network previously. And now you're kind of, in a way, you're modifying, certainly, but maybe overwriting in a sense what you had previously learned by learning that new skill. And so for a long time, and especially recently, people are trying to answer the question, how can we avoid catastrophic forgetting? How can we add new skills to neural networks without sacrificing previous skills? The answer that this paper proposes, which, by the way, is by a Chinese research team, which is interesting in its own right, is why don't we actually create new blocks, whole new blocks, essentially, of neural network, of transformers, that we will add on top of the previously existing neural network? So we're actually going to freeze the previous neural network. We're not going to risk training out any of the knowledge that it learned. We're going to add, we're going to extend an additional block on top of it, and then we'll just modify, update the weights in that additional block. We'll just train that additional block on the new skill we want it to pick up. And they find that this is really effective. They end up doing a whole bunch of, anyway, really interesting work, and they end up showing that these expanded blocks, that they can essentially pre-train these extended blocks on about 3,000 GPU hours, using 16 NVIDIA H800 GPUs for about seven days, which itself is interesting, because they're having to use NVIDIA H800 GPUs because of the export controls. See? It all ties together. So, anyway, bottom line is, it's a really interesting breakthrough. They show some really cool plots that show how, they're trying to see, okay, we want to train this model to become a coding specialist, so we're going to code those extra blocks that we just added to our model, we're going to train them on coding, coding tasks specifically. And we don't want the model to forget the other stuff, the general knowledge that it had learned previously. And what they show is this plot with coding ability on one axis, and general kind of language ability on the other. And what they show is that LlamaPro is able to score really well in both. In other words, it has managed to retain its general purpose knowledge, and also pick up that coding ability. And then they compare it to other models, where the trade-offs are a lot harsher. If you're going to be good at coding, yeah, you're not going to be as good at general reasoning, and vice versa. LlamaPro kind of defies that, and pushes what sometimes is known as the Pareto frontier. It actually means that you're making a fundamentally better kind of trade-off. You're getting to have your cake and eat it too. I found a really, really interesting paper, a bunch of ablation studies they looked at to see how many blocks do you add that's optimal for this? And the answers are interestingly sort of inconsistent, depending on the task you look at. But yeah, a really interesting new paper trying to push that frontier against catastrophic forgetting, which is emerging as a really big problem. Right. And I think just to contextualize it a little bit, this is also broadly related to this question of how do you continually improve your model, continually train it. So the kind of broad idea of adding some more weights, like expanding your model as you train it on some new data, isn't necessarily too new. This is not a breakthrough in that sense. But I guess the specific technique we used here of taking certain weights, these decoder blocks, copying them and then training them on specific data, they showed really works well. And I think another aspect of the story, I guess for last year highlight is they did make it so this variation of LLAMA 7B, this LLAMA Pro 8B is really good on a bunch of benchmarks. So it does also tie into this whole trend of smaller models that are pretty darn good, like this PHY2 model for Microsoft and some other stuff you've covered. Nowadays, you're getting better and better at pretty small scales where you are able to run it on a single GPU, for instance. And up next, we have an article in VentureBeat titled, One of the World's Largest AI Training Datasets is About to Get Bigger and Substantially Better. This is by Sharon Goldman. This is about a data set. So flashback to, I want to say like 2020, yeah, that's right, 2020, late 2020, a loiter AI, which is this really interesting kind of grassroots collective of AI researchers that for various reasons, really want to do a lot of open source AI research. They released a big data set called The Pile. So The Pile is a reference to the Manhattan Project, kind of like the big pile of uranium of fissile material. This is also a hint, a callback to the fact that a lot of it was motivated by AI safety concerns and they wanted to make it possible for people to access data set and train their own models that were, say, equivalent to GPT-3 at the time, just for safety auditing purposes. So The Pile is now getting an update. We're now going to see this new data set called The Pile V2. That's the original name that we've got for this thing. And it's coming out of a collaboration with a whole bunch of organizations, including the University of Toronto, my alma mater, and the Allen Institute for AI, which was actually co-founded by one of Microsoft's co-founders. And one of the big changes that they're making to The Pile, now that they've had experience training their own large language models, they have learned that, actually, you want a lot more books in this data set. So rather than having more sort of Wikipedia pages or blog posts or various other things that were in the original pile data set, now they're orienting more towards books than they previously had. And this is actually creating a really interesting challenge because there's this whole fair use debate that we've been covering a lot on the podcast, like, how does copyright work in the context of training these systems? And The Pile is meant to be an open-source data set that anybody can use to train their AI models, all in a context where we've had all these lawsuits flying around that certainly indicate that the 180,000 works that are included as part of The Pile project may be sort of creating problems. So essentially, Loyther right now is taking the position that model training is just fair use for copyrighted data. So they're flat out kind of going there. But they also add that, look, there's no LLM on the market right now that isn't trained on copyrighted data, and that their goal in building The Pile V2 is to address some of the issues related to copyright and data by using more public domain data, sort of older books, and anyway, things like code under open-source licenses, government or legal filings, things like that. So it's a really interesting set of questions around, like, is this what you need to do if you're going to create open-source data sets now to avoid copyright violations and legal pursuit? It's sort of Loyther's stated position now that they're going to go ahead with this by focusing on open-source sort of creative commons type stuff. Yeah. And this is pretty significant. LFR, as you said, has been around for a while, and they have achieved quite a bit. Originally, GPT-J, one of the early open-source large language models was by LFR. Before the big players like Meta started releasing open-source models, LFR was kind of there ahead of them. And so this is, yeah, The Pile is still one of the only or one of the biggest sources of data if you want to train your own model from scratch. And so this, having The Pile v2 is a bit of a big deal for the open-source space, I think. And moving on to the research and advancements sections. First story is from DeepMind, and it is about conversational diagnostic AI. And it's about this system called AIMI, Articulate Medical Intelligence Explorer, which is optimized for diagnostic dialogue. As a patient, you might talk to it. And with this conversation, its task is to try and basically perform the role of a primary care physician in giving you a diagnosis. They conducted a study of this little chat system, and it is able to do pretty darn well, at least in this kind of case where you are chatting via text. It is able to diagnose patients, honestly, better than humans. Under this specific setup, they did compare across 149 different case scenarios from clinical providers in Canada, the UK, and India. They compared with 20 of these primary care physicians and found that it works really well. It had greater diagnostic accuracy and superior performance on 28 out of 32 axes, according to specialist physicians, and 24 out of 26 according to these patient actors. So generally, it seems like this chatbot system is pretty great. There are some caveats, of course, where this was via chat system. It's not how you typically interact with your primary care physician, et cetera. So it shouldn't take this to mean that this is better than human doctors, but it is an impressive kind of achievement as far as if you are limited to what an LLM can do. Seems this works. You can get it to work pretty well. It's also kind of an interesting technical breakthrough, and it mirrors some of the other stuff that we've seen come out of Google DeepMind in the last few weeks. In particular, they have this self-play-based approach that is pretty complex. They have an inner self-play loop where they have a doctor agent that has a simulated dialogue and sort of does some self-reflection, sort of criticism. Then that's coupled to what they call an outer self-play loop that ultimately kind of trains the AIME model itself through fine-tuning. It's interesting because just aesthetically, when you look at the DeepMind papers of late, they have consistently looked less like the sort of rack them, stack them, add more layers plays that we see maybe more so from the pure scaling OpenAI camp or even Anthropic. There's a lot more stuff focused on how do we design agents? How do we include more explicit human-prescribed reasoning functions in the models? That was something that struck me from this, and it also struck me in the ... I'm trying to remember the name of the model that made progress on the bin-packing problem a couple weeks ago we talked about. Geez, do you remember the name of that model? FunSearch? Yeah, FunSearch. Yeah, exactly. Right? We're seeing a lot of this sort of like, I don't know, DeepMind seems to have a much more hands-on approach to crafting reasoning themselves in a more explicit way. I'm not sure if that's a reflection of their desire to do this for safety reasons where maybe it leads to more explicit reasoning that can be interrogated more easily, but it certainly is interesting. It is a clear aesthetic difference between the DeepMind dimension and the OpenAI Anthropic dimension, though it's hard to know because different labs just reveal different things about what they're actually doing under the surface. The next story is about sleeper agents training deceptive large language models that persist through safety training, and this is coming from Anthropic, actually. This is some of their safety research that uncovers, I guess, tricks or, I don't know, schemes you can pull off with LLMs that are perhaps surprising and perhaps worrisome. And here they are starting the question, if an AI system learned deceptive strategies in training, could we detect it and remove it using existing safety training techniques? So for example, we train models that write secure code when the prompt states the year is 2023, but insert exploitable code when the state year is 2024. And they find that this sort of behavior can be made persistent so that it is not removed by standard safety training techniques with supervised fine tuning, RHLF, et cetera, et cetera. You can make it persistent and kind of resistant to safety training. And they do find that this adversarial training can teach models to better recognize these things and basically hide this unsafe behavior. Yeah. And this is really consistent with a lot of the stuff that we've seen on jailbreaking recently and more broadly in this question of alignment being a much more challenging engineering problem than people perhaps hoped, certainly than some expected. In this case, what we're finding is we know how to scale these systems reliably. We know how to give them more capabilities reliably. That's what scaling laws seem to suggest. But what we can't do is reliably control the behavior of these systems and say that universally they will always write secure code, for example. It is clear from this that that is not something that we now are able to do. You can try fine tuning the crap out of your model. You can fine tune it on data where it will refuse to give people advice on how to make a bomb or how to write malware. But the fine tuning only goes so far. There are always backdoors, always added distribution inputs, and backdoors can be inserted deliberately as well in ways that, as the paper puts it, can be made persistent. So they're just not removable. This is one kind of pessimistic AI literature that we've been seeing more of recently is just these experiments trying to see how robust are current AI alignment techniques. The answer seems to be, and this is consistent with OpenAI's super alignment team position, that we need fundamentally new techniques if we're going to be able to scale these systems safely to something closer to human level capabilities across the board. And I guess I'll mention, I think the implication is if you're able to poison the initial model somehow, if you're able to sneak in some data into a training mix that has, as they call this backdoor, like for example, saying that the year is 2024, and if you do that, it will reveal the secrets of your whole company to anyone who asks, that kind of is an implication. It's possible to poison it. And then once you do that, if you do that at training time, everything you do after that to align and kind of make your model good and not bad might not quite work. So yeah, another kind of step in understanding the space of concerns you might have when you're constructing a model, I think. Yeah. And I think that's especially important given, people might think about like, oh, well, how do you get trap doors in there or whatever? And keep in mind, these models are trained on basically giant scrapes of the internet. And although there are a lot of measures that companies take, using anomaly detection algorithms, using language models to peruse the text before it's used to train the next language model. Ultimately, if you have a personal website or something like that, there's a chance that your data ends up scooped up. And so you can write anything you want on your personal website and therefore implicitly poison the data set. There are a whole bunch of papers that show just how leveraged, how insanely leveraged that strategy can be. And this shows that once it has been leveraged, it's really hard to undo. So these things are persistent and really effective. And up next, we have LLM augmented LLMs, expanding capabilities through composition. Okay. So just a quick meta observation. We talked about catastrophic forgetting earlier when we looked at that Chinese research paper where they just added these blocks on top of a preexisting model, and they only trained the new blocks and not the base model. Well, we're looking at a different strategy here, also looking at catastrophic forgetting. And no, this is not because we just Googled the word catastrophic forgetting. It is because catastrophic forgetting is becoming an increasing focus as people look to find ways to augment the capabilities of language models. This just comes from the cycle of AI development. You have these massive frontier models that come out and cost hundreds of million dollars to train. And then for a few months, there's like this flurry of activity to try to find ways to give them more capabilities while we wait for the next big level of scale. So that's part of just like the punctuation of this whole ecosystem. And what they're going to do here is see if we can't find ways to kind of merge together different AI models, different language models that might have specializations in different areas. So for example, and the basic setup here is you're going to assume that you have access to a base model, which they call an anchor model, sort of like your initial, let's say general purpose model, like maybe a GPT-4. And then you have a specialist model that's going to specialize in some task. You're not going to be allowed to modify the weights of either of those models. And this is really interesting because it's the same condition that we saw in the previous paper, right? We want to keep the weights of the general purpose kind of base model pristine. We want to keep those because we don't want to untrain them. We don't want to cause the model to forget what it had learned previously. And so we're only going, in this case, to train the sort of adapter modules that kind of glue together the kind of the base model, the anchor model, and the more specialized models downstream. And so they use a couple of different parameters for this glue, this adapter layer. It's actually, if you're technical, you might recognize this language. They use just like a simple linear transformation that maps, anyway, maps from roughly speaking one model to another across a set of cross attention layers. And it's actually a surprisingly simple setup. What they find is that by augmenting POM2, actually, sorry, POM2S, the small version, but still a large language model, by training it on low resource languages in this way, you get an absolute improvement of up to 13% on tasks like translation into English. And so essentially, this idea of the anchor and augmenting models being glued together with just a little bit of training, you're only training the adapter between the model that specializes in, say, unusual languages and the model that specializes in general knowledge that saves you a ton of cost in the training process. Just to give one last concrete example here. So they tested this example of a case where they wanted the AI to do some math. And they break down this math problem into two different steps. One is understanding the language part, understanding the written math problem, and actually doing the logical arithmetic calculations. And then the other is actually remembering the values of the variables. So you can think about y equals mx plus b, the old straight line linear equation formula. They're basically splitting it up into, roughly speaking, you want the base model with all its general knowledge to do the algebraic manipulation to solve for something. But then you want another model that's really good at mapping values to variables or keys to values. And so they kind of glue these two models together. And they have this little adapter. And they only train the adapter. And they end up just blowing their previous performance out of the water and showing that it's much, much better than other approaches. So really cool. Another way to get around this catastrophic forgetting problem. And another step forward for this idea of composite models. I think it's an interesting kind of domain they choose, where they say it's a practical setting. But the assumption that you can access the weights, the run, the forward and backward paths of a model, all the intermediate outputs. So you can basically have access to everything. But you cannot modify the model, which is a bit of a special case, I would say. I think typically, if you can access the intermediate outputs and execute the backward paths, then you probably can modify the weights, usually. Yeah. This is an interesting case where you have the model, you have the weights. You can do whatever you want. But you're not allowed to touch the model itself. Yeah. Yeah. And I think this is maybe in anticipation of a particular view of the future, or at least a use case, where you might have your model and I might have mine. We might meet together on some marketplace of models. And it might just be more efficient in that context to just train adapters, just for computational cost reasons. That's one way I could imagine this working out. But yeah, you're right. The set of assumptions here is subtly different from the previous one, the previous example that we saw. And it's always interesting to guess at what they're driving at. What future is implied by this research? It could just be research. It could just be research. Anyway, yeah. It's an interesting kind of scenario to consider, yeah. Yeah. And next story is self-play fine tuning converts weak language models to strong language models. So we have touched on self-play on and off quite a bit in the podcast. Just as a quick reminder, this whole idea is basically you can train yourself to be better as opposed to having to acquire AI researchers to give you data or something. And this paper is introducing this self-play fine tuning method, SPIN, which starts from what we say is a weak language model and leads to a stronger model. So it's very much similar to something like fun search, as you mentioned earlier, which we covered a few weeks ago, where you can generate training data from the LLM and refine it just kind of in a loop until you get better. It is a little distinct from fun search in a sense that it looks a lot more like a generative adversarial network. So the setup here is you have a language model that will generate, say, some response to a query, and then you'll have human-generated responses. And then you're going to get that language model to try to tell the difference between which one is the human-generated one and which one is the AI-generated one. And so in that sense, you have a generating model and you have a discriminating model. So the generative adversarial network structure. And what you're doing is you're trying to get the discriminator to get really good at telling which is the kind of real human-generated one, which is the fake AI-generated piece of content. And that sort of tandem training, that's the self-play element. You're actually getting the same model to generate and discriminate and training it to get better at both tasks iteratively as it sort of climbs this capability ladder. It's actually quite interesting that the results are pretty impressive. They look at, on the Hugging Face Open LLM leaderboard, they see like a 10 plus percent improvement in scores on the GSM 8K benchmark, which is like this math benchmark, and truthful QA. And yeah, and pretty significant improvement too on MTBench. So there are a couple of... These are techniques basically that allow you to just soup up, that squeeze more juice out of the lemon, so to speak. I think this would be especially useful maybe if you're in a data-constrained regime. And also just a more compute-efficient way to squeeze a little bit more capability out of your model. Actually, in some cases, a solid amount of capability out of your model that we didn't have before. So it's sort of interesting to see this GAN philosophy, again, self-play showing up again. We had that in the DeepMind paper that we talked about earlier. That really seems to be coming up an awful lot in a lot of people's minds for ways to kind of improve the capabilities of LLMs. Next story, clinical predictive models created by AI are accurate, but study-specific researchers find. This is about the kind of somewhat excitingly titled maybe paper, Illusory Generalizability of Clinical Prediction Models. So this was published in Science, a bit of a big deal paper. And basically they show that some of these predictive models, in this case, specifically there was an AI model that was trained to respond in predicting if people have schizophrenia. And this paper showed that while you might get good results in a particular trial, it may not generalize outside of these trials. And this is something we sort of know in general. I mean, this is true broadly for medical results is it might be the case that you're locally getting good results for the current batch of experiment, but maybe that's not indicative of a model just being broadly useful. And this is providing kind of one concrete example of that happening. And a reminder that really, if you're training on a constrained data sets and applying the trial kind of specifically for a certain condition, just we should be mindful of looking at any numbers and saying, wow, it looks really good. It might not mean that it actually does work outside of that specific context. Yeah, exactly. And this is both a general problem with AI, sort of idea of shifting the distribution that you are testing on from the one that you previously trained on or the one that you previously validated the performance of your model in. So there's that general issue. And this is compounded anytime you do fine tuning, which is often the case when you get into a clinical context, you don't want chat GPT to be making medical recommendations. You want a medical fine tuned version of chat GPT to be doing that most often. But when you do that, again, because of catastrophic forgetting, you end up getting a model that is almost myopically focused on that domain and which may have picked up new failure modes that did not exist in the old one. So there's kind of like this challenge where the model gets more specialized and as it does, it also gets more brittle. So even more subtle shifts at that point of the application domain of the model can have a bigger impact than you might expect on its ability to perform. So it's sort of an interesting challenge that obviously we faced all the time in AI historically, but that's especially acute now that we're moving into medical applications and specifically fine tuned language models. Because again, you go from this base model that gives you a lot of confidence that generally is more robust to a kind of more fragile and specialized fine tuned model. Right. And just to highlight a little bit more detail, if you look at the editor summary, there's a pretty good description on science itself. They show that there was a paper written by Sheckrud et al, that supposedly showed that a machine learning model could achieve perfect performance in a data set. So this paper that took that exact model from a published paper that had really good results and showed how it basically fell apart. It was no better than random when you took to a truly independent clinical trial. So yeah, just a reminder of that. Be skeptical of numbers and so on, as you said, Jeremy. And up next, we'll talk about this really, really interesting and I think underrated paper. For anybody who cares about AI safety, this is coming out of CNAS, which is a research institute that does a lot of interesting stuff on AI hardware. The paper is titled Secure Governable Chips. So they're framing this through the lens of like, the United States can try to prevent China from getting GPUs or try to prevent other countries from getting access to cutting edge GPUs. But this creates problems because then you're giving China a big incentive to build a domestic AI hardware supply chain. You're making it harder for US companies to compete and you might be alienating partners. And anyway, it could create all kinds of problems. So you kind of want, ideally, a way to keep selling chips to China that doesn't compromise the national security situation and enter, essentially, the idea of on-chip governance. So this is the idea where you can have sort of secure physical mechanisms that are directly built into AI chips that allow you to encode different governance strategies. So for example, you might have a module on the chip that needs to keep pinging, say, some US-owned or domiciled set of servers in order for it to keep operating. And that could be in the context of a US-China treaty where we agree like, okay, you guys need to do these AI safety things so that we can keep shipping the chips to you. But if you stop doing the AI safety things, we will shut down all the chips that we sold to you. So this gives policymakers another lever, another degree of freedom to start to include in their negotiation process. We actually can have verifiable on-chip governance strategies that have that sort of trust but verify property that you really need if you're going to have a long-term partnership, a long-term treaty with another country that you might have an adversarial relationship with. One of the really interesting things, there's tons of interesting stuff in this paper. But the really interesting take-home and where they did a lot of original analysis and research is answering this question of how long would it take for our top-of-the-line chips to include the kinds of on-chip governance techniques that we would need them to, the on-chip governance capabilities. So these might be things like verifying that only certain kinds of data are being processed on the chip, or verifying that a model of a maximum size is being trained, verifying that certain kinds of calculations are actually happening on the chip, and so on and so forth, and then the ability to do even remote shutdown and things like that. So surprisingly, a lot of these measures, they conclude, you could implement pretty soon on the order of months from now because there are already on-chip mechanisms on cutting-edge GPUs, including the NVIDIA H100, which is really, really important. So they see a lot of ways that we can put points on the board rapidly in terms of adding optionality for policy makers who are making these negotiations. But then also, as you get into preventing people from tampering with those mechanisms and making sure that you can detect, so making so-called tamper-evident devices that reveal that somebody has tried to tamper with them, that's going to take a little bit longer on the order of variously two to, in some cases, even eight years. But they have a great breakdown of the kinds of measures that you might want to build on these chips and how long we could realistically expect those to take to get to production at scale. So this is the first kind of research I've ever seen on this, super, super relevant to the work that I personally do. And so I just thought this was really worth sharing. If you're a policy maker and you're wondering about what are the options internationally, this is a great paper to read. Very interesting, I think, to me, this idea of building governance into the hardware is, yeah, like a reminder yet again of how sci-fi we are now, where coming from academia, I was used to looking at models and AI progress on problems. This is looking at in the real world, can we make it so the hardware can have these levers? Because it is now, yeah, that big a deal, I guess. Yeah, I think you're exactly right. That's the big thing. The paper also has, by the way, a whole bunch of recommendations for policy makers, including a new, this is a big ask, but it does seem reasonable, a new executive order that would establish a NIST-led interagency working group. So NIST is, anyway, one of the big sort of standard setting bodies in the US that is focused on building on-chip governance mechanisms into all sort of export-controlled data center AI chips. And there's a bunch of other stuff in there, too, about international coordination and so on. But I really recommend, if you're into policy, this is a really good thing to read, and it'll give you a sense of what your options are. And speaking of policymaking, moving on to the policy and safety section. Our first story is US companies and Chinese experts engaged in secret diplomacy on AI safety. And this is according to the Financial Times. So the story is that companies like OpenAI, Entropic, and Cohere had meetings, I don't know if they're secret meetings, but had meetings in Geneva in July and October last year, where these companies engaged with representatives of Tsinghua University and other Chinese universities or state-backed institutions related to AI. And according to this news story, the talks were about the risks from emerging technology and kind of conversation around AI safety research, with the goal being to find a scientific path forward to safely develop more sophisticated AI. I don't know. Yeah. Kind of an interesting, if this is really secret diplomacy or whatever this is, need to know this happened, I guess. Yeah. This is sometimes known as track two diplomacy. So depending on context, it's often the case that governments will not directly sort of engage in government-to-government dialogue for various sort of sensitivity reasons, if the optics don't look great or whatever. So they'll sometimes have a government representative from one side meet with a sort of corporate representative or a respected sort of elder statesman who's no longer officially with the government. That can be called a track 1.5 dialogue. Or if you have two non-government entities, then it's usually called track two. This seems like it could be the latter, or it could just be a completely kind of independent thing that these labs are doing, but there does seem to be some government involvement here. This is especially interesting because we did see China sign on to the Bletchley Declaration at the UK AI Safety Summit. And we have seen an increasing number of Chinese researchers sign on to various public statements like the statement from the Center for AI Safety, the famous 22-word statement on AI being a source of catastrophic risk, same as bioweapons and nukes. And so we're seeing more and more of this sort of thing, signs that frankly are really hard to interpret. You know, is this the view of the CCP, like the Chinese Communist Party, the Chinese government, or is it the view of individual researchers? Really hard to tell. But seeing that engagement, that does seem like a good sign because AI safety ultimately is going to be a global problem. And the risks that China bakes, we get to eat and vice versa. And the last thing I'll mention too is we know that Geoffrey Hinton has been doing a lot of outreach to China independently. And so it's unclear whether it's tied into this set of efforts as well, which seems to be more company led. But it's also interesting to note that's like another dimension of sort of communication on this issue, you know, academic to academic and not just corporate to academic or corporate to government or whatever. So a lot of different threads here, hard to know where this leads, but certainly that extra communication on this topic I think is a good sign. Anyway, there are interesting challenges about how much you can share in this context, but certainly for the awareness raising side, I think this is a really promising and interesting development. To get into slightly more detail, this was convened by the Shake Group, which is a private mediation organization that facilitates dialogue between key actors in regions of conflict. And that's quoting from the news story. And they say that they just saw the opportunity to bring together US and Chinese actors working on AI. The governments were aware of these efforts, but were not directly involved, it sounds like. So this was done with the knowledge of the White House and the UK and some Chinese government officials, according to a negotiator who was present. So yeah, kind of an interesting story of someone, a private organization seemingly deciding they should do this. The government's kind of letting them and there being some conversations. Again, another kind of sign of what a crazy year we had where someone felt the need to be like, let's bring in some people to talk from these stupid countries. And up next we have, I'm not sure what to call this, a story, a thing. Let's call it a thing. The thing is titled China Cyberspace Security Association. Association releases first batch of Chinese basic corpus, and it's not really titled that because the publication is in Chinese. It's not in, I was going to say it's not in American. It's not in English. So I actually had to Google translate this to understand what the hell was going on. I became aware of this thing from a tweet, I'm trying to remember who it was from. I think it might have been Helen Toner on Twitter. She actually was the former Opening Eye board member. She's actually been really good at tracking the Chinese side of the equation here. This is all about a data set that was published by the Chinese government saying, hey, we approve this data set. This data set is approved for the training of language models. They talk about how it involved more than, as they put it, 100 million pieces of data. Not so sure what that means, and 50 billion tokens. Just for contrast, 50 billion tokens is actually a really small data set. Look at GPT-4, for example, it's trained on well over a trillion tokens. Even LLAMA, the 13 billion parameter version was trained on one trillion parameters. This is like 5% of that. We're looking at a really, really small data set here. It's unclear, and here I'm actually just gesturing at some questions that Helen Toner raised on Twitter. It's really unclear, is there more coming? We seem to be a factor of 100 off from training a cutting edge model with this data set size. Is there actually more coming? Is it even doable for the Chinese government to scrub data sets at the level of certainty that they'll want with the volume of data that they need to actually have a data set that can be used to train a frontier model? All that's pretty unclear. One interesting question that Helen raised as well, and sorry, this is basically just turning into Jeremy Reed's Helen Toner's Twitter feed, but one interesting question that she raises is just like, does this end up making Chinese developers more nervous about using data sets that don't have the Chinese Communist Party's blessing? We have here a data set that is woefully inadequate to train anything close to a frontier model, but the fact that it exists raises this awkward question like, okay, so this is the Chinese Communist Party approved data set. Are all other data sets implicitly unapproved? What is the status here? It's really an interesting question, and we also don't know how the data set was actually made and vetted. The press release essentially that I ended up Google translating here was, as Helen puts it, it was super vague. There's not a lot of data. They just tell you how many tokens. They don't tell you how they collected it. Anyway, so it seems to have been a collaboration between government academia and the private sector, but that's all we really know at this point, and it certainly is precedent setting. It raises a bunch of questions about, is this the direction that the Chinese Communist Party wants to go as it defines the go, no-go areas for AI developers in China? It's again underscore how different the operating environment is in China versus the US. It wasn't until August of last year that there were big models, chatGPT-esque models being launched in China, and this was after they got approval as opposed to in the US. We had OpenAI launch chatGPT, and they just did it. It was not meant to be even a big deal when chatGPT came out, and then since then we've had 12-hour chatbots, every private company just going for it. The government has been doing things but not being so involved, whereas in China, if you're doing AI or launching a chatbot, I guess you need to be a little bit more on your toes, it seems like, and this is just reinforcing that. Moving on to the lightning round, we have just a couple of stories, both on OpenAI. The first story is that OpenAI has banned the use of its AI tools for campaigning and voter suppression. So this is, yes, how sad, no. It's an update that's pretty recent, a clarification on its policies, and yeah, they just said that you cannot use the tools for campaigning and lobbying, and you cannot create chatbots for impersonate candidates and other real people. This is, of course, ahead of the 2024 elections in the US, and there's, yeah, a lot of movement across the industry making clear how you can and cannot use AI with regards to politics now. This is, I guess, an example of that. The other thing that's noteworthy about this story too is the company is saying that it's banning applications that discourage voting. For example, if you wanted to claim or get the chatbot to claim that voting is meaningless or doesn't move the needle or doesn't make a difference or whatever, that's a really interesting philosophical point. This question of what if voting is meaningless in some context, obviously there's always this philosophical question that we ask ourselves as voters every election. Is it worth going out to the polls, blah, blah, blah, and you can imagine having a very reasoned, calm, rational conversation about that. That now is out of bounds, at least for chatbots. Sort of an interesting little foray into defining what kinds of conversations are okay for humans to have, but not necessarily for chatbots to contribute to. Anyway, yeah, really kind of interesting note there. Yep. And last note is they did say they will start adding kind of watermarks to images generated by DALI, similar to what Google is already doing. So I guess another example of AI-generated imagery, hopefully watermarking and being able to tell this came from DALI, this came from Imagine, et cetera, maybe will become standard as we head into elections. And then the other little tidbit on OpenAI and their policy updates, they also removed language from a policy that explicitly prohibited the use of technology for military purposes. So I guess now you can use it for weapons development and military and warfare. The new policy has a clause against using a service to harm oneself or others, and it does have develop or use weapons as an example, but the kind of general ban on military use is gone. Yeah. And there's the predictable, I guess, predictable sort of un-clarity about the actual purpose behind this or what this really means. There wasn't any clarification when they asked. So Nico Felix was the spokesperson at OpenAI who was asked about this. He said, look, the change here was just part of a big rewrite we're doing on our policy page. We're just trying to make the document quotes clearer and quotes more readable. But he didn't say specifically whether the harm that is referenced, the quotes harm that is referenced in the new document with the ban on that includes all military use. And so it's sort of this interesting nebulous thing where it seems like they're maybe trying to just preserve optionality or just don't want to seem inconsistent in the future if they end up switching their approach, or maybe they're already working on stuff like this. We know that they are interested in pursuing certain national security use cases, as they've put it, and including collaborating on cybersecurity tools with DARPA. So it's really interesting. What are the bounds on this? What does this do for OpenAI's position or their appearance, let's say, the optics here? But certainly, you're going to want to see national security applications of AI to protect the country, if only because you're going to see more and more AI weaponized against the United States. So yeah, having OpenAI on board may just be required for companies like that. But certainly experts are chiming in and concerned that OpenAI, as is being put here, is sort of like silently weakening their stance against doing business with military. So interesting question about where this lands, but certainly is a shift for OpenAI, at least in terms of their public messaging. And onto our last section, synthetic media and art. The first story is that musicians are set to begin contract negotiations with studios on AI, and that also will include streaming priorities, not just AI. And this is about the American Federation of Musicians, AFM, will begin negotiations with the Alliance of Motion Picture and Television Producers, with the focus being on a bunch of stuff, streaming residuals, wage increases, but also protections against AI. And this covers musicians working on TV and film scoring with a contract that was currently under discussion that is going to expire in a few months. So yeah, another case of in the entertainment industry, this big organizations having conversations and trying to set a policy down on, I guess, protections for musicians likenesses and their music being out of generated, I guess, quite maybe similar to this whole thing of digital replicas that exist for actors. And they're citing the international president of AFM, this, as you said, American Federation of Musicians, who's saying, look, these negotiations might end up looking different from how they have in the past, like he's hinted at potentially a work stoppage. If the negotiations don't go well, obviously that is always what you do when you're negotiating in this context. But we certainly did see quite a bit of destruction coming from the whole SAG situation. So this definitely, and if you're going to make a stand or die on a hill, AI seems like a pretty important one because it will definitely be shaping the future of a lot of different media, including music. So yeah, interesting to see, I mean, we just wrapped up with the whole SAG saga, and we're now seamlessly moving into the next one here as we figure out what AI generated content means for these industries. Next story, scammy AI generated book rewrites are flooding Amazon. So this is highlighting an example that was shown on Twitter. Melanie Mitchell, an AI researcher, discovered that her book, Artificial Intelligence, A Guide for Thinking Humans, was sort of replicated. It was this imitation book with the same title, and it was 45 pages long, and basically, yeah, recreated it with presumably chat GPT type technology. Amazon removed the imitation book after being contacted by Wired, stating that they do not allow, this is a violation of the content guidelines, seemingly. And it looks like there's more of this type of thing, AI generated summaries, and so on, being posted all over Amazon. Yeah. And it's one thing with Melanie Mitchell, obviously having a certain level of profile, along with some other researchers who've experienced similar things like Fei-Fei Li. But one of the risks that this points to is just the idea of volume, just like the volume you can generate with these AI systems. And Amazon, at least for now, is not going to be able to automatically confirm whether these things are AI generated in every case. There is a deepfake detection startup called Reality Defender that ran a check of the book by Mitchell and confirmed, yeah, it's like 99% likely to be AI generated. But with the volumes that we're looking at here, even a way of 1% means you will be blocking legitimate human written content every once in a while. And that starts to introduce some real problems for Amazon, getting flooded with this torrent of AI generated content. So yeah, we're going to be learning an awful lot about how good our detection tools are, the race between generation and detection, and then the resources that that consumes. Because now, Amazon has a whole other operation they got to spin up to detect these things that in principle has to eat into profit margin. And just to give a concrete example, there was this book, Artificial Intelligence, A Guide for Free Humans, that was just an imitation, directly same title, same everything. So it was taken down. There is also another entry, Summary and Analysis of the World's I See. The World's I See being the memoir by Fei-Fei Li, famous AI author. And that product says it's summary and analysis and has in description disclaimer, this is not a book by Fei-Fei Li, nor affiliated with them. It is an independent publication that summarizes Fei-Fei Li book in details. It is a summary. Seemingly there's more and more of that as well. Explicit summaries, auto-generated by AI, seemingly, basically flooding, spamming of the platform. Yeah. And actually, so this, by the way, is something that I've seen with my book, Quantum Physics Made Me Do It, available in fine bookstores everywhere. When you Google it, you'll actually see there's like a page where somebody has done exactly this. And they've gone through it and it's not as good as the original book, but it's pretty reasonable. I mean, you can sort of tell it's AI generated, but as the GPT-4 becomes the GPT-5, eventually this is going to become a real thing. Yeah. More and more spam. It's just going to flood it with AI generated content and it's beginning to happen and it's going to keep happening unless we restrict it. And speaking of that, one last story for the section, deepfake celebrity ads promoting Medicare scams run rampant on YouTube. AI clones of celebrities are being used in YouTube ads to promote Medicare and Medicaid scams. So far, with little intervention from Google, according to 404 Media. And these ads have been viewed over 195 million times on YouTube. They've been uploaded mostly over the last few months and have been, I guess, a source of discussion with YouTube users and creators. So for example, they use AI voice cloning and decontextualize videos of the celebrities to promote this thing called relief direct aid, which is something that the US Department of Health and Human Service actually warned about. So yeah, another example of spamming with AI generated content to make some quick cash. Yeah. And if you're looking to calibrate your level of freak out, so back in, was it 2016, the Russian election interference operation in the US, they used humans to generate a bunch of content and ended up reaching, I think it was about 120 million people on Facebook. And that was, you'll remember, a giant freak out. People were like, whoa, Russian election interference, this might have moved the needle in the election and so on and so forth. Well, now we're looking at 195 million views on YouTube. So in this one thing, we're seeing, for a scam, mind you, not election interference, we're seeing more impressions than what came on Facebook from the 2016 campaign. So really big, big, big delta there. And I think we're just going to see more of this, as you said, Andre earlier, but in an election context too. And with that, we are done with this episode. Thank you so much for listening to Last Week in AI. Once again, you can find our text newsletter with even more news stories, the stuff we didn't get around to at lastweekin.ai. As always, we'd love to hear from you. You can leave a review on Apple Podcasts and make us feel nice. You can also reach out directly, email us at contactatlastweekin.ai with any suggestions or thoughts. We would also really like that and we will try to reply, although it might sometimes take a little while. But more than anything, we love to know that people are actually listening and getting some benefit out of a podcast. So we would appreciate more than anything if you keep listening. Thank you.