Last Week in AI - Week of January 8th
Summary (Generated with Bash)
Latest AI Breakthroughs
Midjourney Version 6: Now with text spelling capabilities and improved prompt adherence, generating images with text has never been more accurate.
Baidu's Ernie Bot: Garnering over 100 million users, Ernie Bot indicates China's significant progress in chatbot technology.
Anthropic's Financial Growth: Predictions show Anthropic may reach $850 million in annual revenue by the end of 2024, suggesting high demand for their AI models.
AI in Devices and Policy
Android Auto's AI: Soon to summarize messages using AI, enhancing convenience and safety for drivers.
Samsung Galaxy S24: New AI features set for the upcoming release, including live language translation and advanced camera enhancements.
NVIDIA's DEDICATED AI Chip: A less powerful variant released for China in response to US export controls.
NY Times vs OpenAI: Legal challenges arise over AI's use of copyrighted work, with the Times suing OpenAI and Microsoft.
Research and Policy Highlights
Unified IO2: An autoregressive multimodal model unites vision, language, audio, and action in a shared representation space.
Task Contamination: Insights into language model evaluations reveal potential overestimation of AI performance on older benchmarks.
AI Safety: Congressional warnings issued to NIST regarding AI grant management, emphasizing the need for effective oversight in AI development.
Listen to the full podcast here.
Questions about this podcast and AI news? Ask Bash in the in-app chat!
Read the full transcript below. 👇
Audio 01-07-2024 (1).mp3/2024-01-11
Hello, and welcome to Scan2day's Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin.ai for articles we did not cover in this episode. I am one of your hosts, Andrey Kurenkov. I finished my PhD researching AI earlier this year, and I now work at a generative AI startup. And I'm your other host, Jeremy. I'm the co-founder of Gladstone AI, which is an AI safety company. And I do a bunch of stuff with the US government on national security meets AI and AI safety stuff. So that's my background. And by the way, today's episode is brought to you in part by COVID-19 and by a probably like random rhinovirus. So each of us has a different breed of a virus today. So we're kind of excited to bring that to the table for you. Yeah. If you sound bad this time, it's not our fault. It's just our bodies failing to protect us. Slowly shutting down, yes. One of the other things that's going wrong with our bodies, or at least our brain specifically, my brain even more specifically, was actually pre-viral disease. I got this really great thing. And by the way, Andrey, we have the best listeners. I got this email from a listener of ours. His name is Chris Cox. He's the chief information officer of the Bank for International Settlements. This is like a bank of central banks. Anyway, they do all kinds of interesting stuff. So Chris himself has a background in evolutionary algorithms from back in the day in grad school. I've actually met him a couple of times, by the way. This is one of the cool things about this podcast. We get to meet these people, and they get to correct us from time to time. So I talked about fund search, or well, we talked about fund search last time, but Andrey, you didn't make a mistake. So in the context of DeepMind's fund search, this was the algorithm that was all powered by language models. And basically, you had this language model that would propose a new computer program to solve a problem, or a bunch of them. And then they'd be evaluated. And then the kind of best performing programs on evaluation would then be stored in a cache of programs. And then that cache of programs would therefore grow and grow and grow. And each time the algorithm wanted to solve a new problem, it would actually consult that cache and try to mix and match different previously proposed solutions to try to combine them in new and interesting ways. And what I had said at the time was that any time you have a loop like that, where a language model is going to try to evaluate its own outputs and kind of like, yeah, use that iterative cycle to improve its solutions, you have a hard ceiling because you only ever have as much information as is contained in the language model. So eventually, you need a new source of information. And in this case, I said that source of information, it comes from the fact that these programs are being evaluated. Because they're programs whose outputs can be graded objectively and automatically, that grading provides a new source of information into this ecosystem. And I kind of left it at that. And what Chris is very rightly pointing out in the email he sends here, or he sent me, was, well, actually, technically, the information is also coming in at the level of the mixing and matching of the functions that have previously been proposed in that cache of functions. So there's new information kind of being created in that sense, as well, from an evolutionary algorithm standpoint. And it seems, based on his email, that through that evolutionary algorithm lens, the way you would use the phrase, like, the information is being created, is actually more to do with that step than the actual evaluation stage. I think, for various nuanced reasons, there is new information, and it's pretty critical coming in at the evaluation stage. It's just how you think about what is information that's content. Anyway, just wanted to surface that that is an important extra bit of nuance. And so thankful to Chris for flagging that. Chris is really, really smart. We have a ton of smart listeners like this. And just super thankful for having people who can call us out on that sort of thing. And this, I thought, was a great example. So thank you, Chris. Yeah, that's a great point, I think. Fund search is evolutionary algorithm. And so the randomness injection, where you kind of tweak your previous solutions, in theory, can yield infinite new information via evolution. So I guess it's not fair to say that there's an inherent limit to anything, because you go on long enough, you'll find everything. But still, that's a good point. Speaking of being thankful to our listeners, I also want to quickly shout out a couple new reviews we got. And it's kind of funny. You look on Apple Podcasts, as I do sometimes, and we got two new reviews on the 24th of December and one of the 26. Hey! It's like we got some Christmas gifts, I guess. But yeah, some really nice new reviews. Jay Fortuna mentioned that he loved to research an advancement section, which is cool, because that is one of our maybe more technical ones. And I don't know if we go too technical sometimes. It's kind of sad, actually, we cannot cover a lot of AI news, we really have to be very selective with regards to research. So I think I'll include a few recommended newsletters and podcasts, if you are interested in following on AI research more closely, there are some resources I follow myself. And yeah, so those will be in the comments, if you're curious, in the description of this podcast. And also, yeah, we got another podcast from a couple of people that were really nice and helpful. Apparently, we have information about recent events and the technological state of art without the manic id of libertarian exhalationism, or the rentings of the pre-Doom prophets, which I like. That's something we strive for. Yeah, yeah, yeah. I think, yeah, we want to be a little bit balanced. So yes, thank you for the feedback, as always, we appreciate it. And it probably helps us get new listeners, I don't know. As always, we do appreciate it if you leave a review on Apple Podcasts, and do feel free to get in touch at contact at lastweekin.ai with any comments or suggestions or corrections. It is always awesome to hear from listeners. And before we get into the news proper, real quick, we're going to do our sponsor read. And once again, we are being sponsored by the Super Data Science Podcast. This is a massive, very long running podcast about data science. They have over 700 episodes, get released twice a week, hosted by John Krone, who is the chief data scientist and co-founder of machine learning company, Nebula. And he's been on the show, as I've already said in previous ads, and as you might have heard if you've been listening this past year. So he is super knowledgeable and talks to just all sorts of people across AI data science. So there's machine learning, AI, data careers, a lot of more kind of hands on practical stuff that isn't just what is covered in the media. And yeah, so it's a great resource to be curious about the people in data science and AI and want to get a bit beyond just knowing the news as we do. He's also, by the way, just a super, super chill guy. And he sent me, this is just how classy he is. He sent me a Christmas card, Andre. And I don't know if you know, there's nothing like receiving a Christmas card to make you realize that you do not have your shit together. Other people are writing Christmas cards and you're like barely putting your clothes on in the morning. So anyway, shout out to John Krone. He's got everything kind of locked down to a T and Merry Christmas and Happy New Year to John Krone as well. And speaking of Merry Christmas and Happy New Year, we are thinking to kick off this episode. We do something a little bit different, which is before getting to the news that happened over the past couple of weeks, which was not a ton, nothing too impactful. We just do a little bit of a retrospective on the past year and a chat about what went on in 2023. And yeah, just sort of like remind ourselves of all the stuff that has happened in the course of one year. On the newsletter front at last week in.ai, we actually published a whole little text newsletter, AI News in 2023, a look back, where we included all the stories we highlighted on the text front month by month. So it was just putting that together. I got a kind of refresher on what happened. It was very interesting kind of going back and seeing how early on in the year there was still only just chat GPT and its impact was still being felt. And then in February, Microsoft came out with updates to Bing and Meta and Google started sort of rushing to try and catch up. Soon after that, around March, the open source machine kind of started going with Llama and Alpaca. So a lot of these trends that started in February and March with new chat GPT rivals and new open source alternatives to chat GPT kicked off and basically just kept going all year. And just last month, we got the culmination of that almost in the sense of getting Google's Gemini, which is one of the first really GPT-4 tier models along with Cloud. And then we got Mixtral, which was a GPT-3.5 tier open source model. So anyway, yeah, it was clearly a very eventful year in this language model and chatbot space. And anyway, Jeremy, you have any thoughts on the whole of 2023 in AI? Yeah, my God, it's so hard to pin down, like just do a small number of things. But I guess big themes, you know, there's a couple of big ones. So I think one development that is very easy to forget about, because it happened a while ago, but it was so important, you mentioned open source, well, the development of LoRa, the LoRa algorithm and what that did for cheapening of the fine tuning process, basically people finding ways, incredibly efficient ways to take open sourced LLMs and then fine tune them for new tasks for like 300 bucks, right? So that's part of what led to the Alpaca, the Kunya, kind of Cambrian explosion of different AI models that had this sort of like high degree of specialization that could on narrow things compete with the best private AI model, language models like GPT-3.5. So I think LoRa and the kind of broader, and QLoRa, the kind of broader ecosystem of sort of fine tuning tools, I think that's been a really big deal. Another thing that I think, you know, again, one of the things we do on this podcast that I think is really important, that's often missed and that missing is going to be felt even more in the next year, I think is we look at hardware a lot. One of the big things that happened with AI hardware, we've seen the cycle times that, for example, NVIDIA puts into producing a new GPU get cut in half. So NVIDIA has gone from releasing a new GPU every two years to a new GPU every year. And that reflects this broader trend of more and more competition. In fact, NVIDIA now seems like their position is much less long-term stable perhaps than it appeared this time last year. This time last year, you look at NVIDIA dominating market share, 95%. No real competition in sight. Now we've got AMD. Now Microsoft is making their Athena chip. We've got even Intel coming with their Gaudi chip. There are a bunch of different pretty plausible looking competitors, at least on the hardware side for NVIDIA. Software ecosystem is another story. But I think overall, what we're going to start to see is, you know, NVIDIA is going to have to hit the gas. They will. But we're going to see a dramatic acceleration in AI-optimized hardware. And the last note I'd make on that too is, I think 2023 will be remembered in part as the last year that non-LLM-optimized AI hardware was used for LLM training or was sort of coming off the production line. So we've seen this a lot with some of Microsoft's chips where, yes, they're just coming out now, but the chips are like designed back in 2022. And so that was pre-Chat GPT. That was pre the LLM hype wave. They're still going to be really impressive. We're still going to see huge leaps in capability and speed, but these are not optimized yet for LLMs. And so I think we're going to really start to see things kick into high gear as companies, as AI hardware designers really loop that in and start to double down. We also saw obviously China making a lot of big strides in their domestic semiconductor ecosystem with SMIC kind of breaking through, as we've talked about many times, that seven nanometer node, maybe even a five nanometer node we might talk about today. So I think those are big things. And obviously, AI safety also went mainstream in a big way this year. So I think that's another really big, big update. Yeah, it was interesting looking at this article and seeing that it was only in May that Jeffrey Hinton left Google and converted to the let's talk and take AI safety seaside. Similarly, I think Benjio started talking. Benjio is a famous AI researcher and scientist. They kind of got into the conversation in a big way. And so it did go mainstream in a sense within the AI community to much more of an extent than it had before. And also in policy, of course, we just covered last month how the EU AI Act seems poised to actually move forward. In the US, a bunch of stuff is happening. So there is a movement really on the regulation side that in some cases is sorely needed for things like deep breaks and so on. And yeah, one other thing I'll comment is, I guess, adjacent to hardware, another story that might be flowing under the radar is, I think, self-driving cars, right? Where 2023 was really the year where they started to come out, where Cruz and Waymo were both offering paid services for anyone up in SF. And Cruz, unfortunately, didn't quite make it work or they ran into some issues as we've covered. But Waymo is still going strong and is still expanding. So I think in 2024, we could really see self-driving cars become much more of a normal thing for people to use, much like chatbots became a pretty common tool last year. Yeah. Again, hitting the mainstream, right? Yeah. I do think 2023, you're right. It's been this year of the mainstreamification of a lot of stuff, worries over AI safety, self-driving cars. Even the other thing too is AGI timelines. I'm old enough to remember when if you said that you thought AGI was going to happen in the next five years, it would get you laughed out of a room. We obviously don't know if that's going to happen. There's huge uncertainty. But one of the things that we've seen is short timelines enter the popular ecosystem of public thought where we have all the frontier labs saying, yeah, we think by 2028, 2027, 2026, these numbers keep getting shorter. And so I think that's a really interesting addition too. We have the future almost seems like it came to the present in 2023. Everything just kind of compressed and obviously huge disagreement about the extent to which these timelines will materialize, the extent to which safety concerns over existential risk, catastrophic risk is real. But certainly everyone now seems to have equity in this. Everyone seems to have an opinion. And that wasn't the case this time last year. I think people were really more in the figuring it outside. And now we're more and more seeing people entrenched in their views. Yeah. And on that note of the future has become a present, I think we also saw the emergence of some of these very science fiction feeling trends of people, for instance, getting really involved in talking to chatbots, right? And there being a million AI girlfriend chatbots. And of course, if you see in the movie, her specifically, or a few other movies about science fiction, this whole notion of having interpersonal relationships with AI is a pretty science fiction-y concept of you talking to a computer and you have feelings for it, but it is happening in a real way now and still growing, I presume. So yeah, it's a crazy time. It was a crazy year. And I don't think it's going to slow down anytime soon. Although, if I were to make one prediction, and predictions are scary and probably not a good idea, but I will say I have the feeling that we are starting to hit the ceiling in terms of, I guess, intelligence or capabilities for chatbots and large language models. We saw with DeepMind, Google tried their best and could not outperform GPT-4 with Gemini, even with their largest, most expensive variant. So I think we might not get any crazy new capabilities outside of things like having retrieval for memory and different plugins and different system level, architectural level tweaks, but I'm not actually new on that training side. So far, it seems like maybe we are slowing down and have slowed down since the release of GPT-4 last year. Interesting. Okay. So I think this will be an interesting, and I agree, scary bet for us to each take. I'm sort of on the other side of this coin and that's why, folks, Last Week in AI is the ultimate fair and balanced podcast. We do it all, man. But yeah, no, I'm more on the, I guess, I expect things to accelerate side of things. I would expect GPT-5 to come out maybe surprisingly soon. I don't think the delay between three and four, for example, will be mirrored in the delay between four and five. And I do expect a pretty significant capability leap just based on talking to some folks who are involved in building GPT-5 and what they've seen so far. I think it's going to be an interesting year to determine, to learn about what we think intelligence is. Because I think what we'll find is a lot of the times we'll see advances that maybe somebody like me goes, oh, wow, that's a really big deal. That means we're really closing in on AGI level stuff or whatever. And somebody else might look at the same thing and be like, oh, yeah, yeah. But this is trivial. This is some other thing. And we'll probably end up having the same discussions and debate. And I think those discussions are really the most important thing because that's how you end up figuring out, where do I fall on this? What do I think? There are no answers, no easy answers, certainly in this space. But I see indications with DeepMind with relatively simple tweaks. We talk about FunSearch, that algorithm that I screwed up the description of earlier today, or sorry, in the last episode. Really starting to drive forward even human research capabilities, I expect AI to start being useful for AI research sometime in the next, say, 12 to 24 months. That wouldn't surprise me. I'd put a 50-50 bet on that or so. So yeah, I mean, I think it's going to be a very telling year. I'm looking forward to having a lot more discussions with you, Andrej, because this always helps me figure out what I think of these things. That's kind of part of the joy of this is because we approach this from different perspectives, it's a bit generative. It ends up forcing you to confront deep questions that aren't always obvious to answer. So anyway. ANDREJ KUGONIC Yeah, definitely. Only you never know, of course. And we have seen some interesting technical advancements, things like Mamba and some of these mixture of experts, tricks in recent months coming out of academia. So yeah, maybe we will see some pretty big leaps still in language. Although I do think most definitely we'll see them in 3D generation and video and some of these fields we've seen are still on the emerging front of AI capabilities. You know, by year's end, we'll have photorealistic 3D, I think, and we'll have maybe like a minute long video, right? And that will be kind of the area where AI will keep blowing our minds. Whereas I think with GPT-5, it will be better for sure. But I don't know that anything will be mind blowing. Maybe if it has memory and it actually can keep a context for weeks at a time, that could be mind blowing for a user of a chatbot. So, you know, we'll have to see, we'll have to live it as we have been for the past year. Yeah, just a last thought maybe to crystallize this. I think for listeners who are wondering, what is it that fundamentally will surprise us? I'm going to test something out and Andre just to see if you would agree with this characterization. So one of the central questions is, are we really close to AGI? How soon will that happen? And one of the things that would make it so that we actually are close to AGI is if scaling up current systems, scaling up especially large language models, but more and more multimodal systems too, invariably leads to more generality, greater kind of longer time horizon planning, all those things that we associate with human level intelligence. And there's this question about how far scaling will go. I tend to think scaling will get us, actually scaling alone could probably get us all the way to AGI, even just like scaling in the limit. Actually, I'm not so sure about just the GPT-4 type architecture, but just scaling alone could do the vast majority of it. A lot of people disagree. Very smart people disagree. I think Andre, you disagree. The question is like, what we know from scaling is as you scale these systems, they get better and better predicting the next token, right? That's what scaling tells us. Increase the data set size, increase the amount of compute, the size of the model, increase those things together, and you will get a better and better text autocomplete system. And it just happens to be the case that as you do that, as autocomplete capability increases, as a side effect, we get all of these other capabilities that emerge. Now we can predict autocomplete improvement. That's what the scaling laws tell us. If I give you this compute budget, this data set size, and so on, it'll be this good at text autocomplete. It'll make this many mistakes on average. What we can't do is say, oh, that level of mistakes or that level of autocomplete capability will lead to long-term planning abilities that allow you to solve cutting-edge math problems, agent-like behavior that allows you to execute autonomous, fully autonomous cyber attacks. We don't know the mapping between the metric we actually know how to predict and the actual metrics that we care about. And that, I think, is one of the central points of uncertainty when we talk about where all this is going and how tight could timelines be. Somebody like me can point to hardware advances and how I think things are just lifting off there. But then somebody else could say, yeah, sure, but that just makes you a better autocomplete system. That does nothing for the kinds of capabilities we really deeply care about. So Andre, if you agree with that characterization, I think 2024 might be one of those years where we get a little bit more information, hopefully, about which aspects of our thinking on that are wrong and how they materialize in the world. Right. Actually, I disagree, but on a different level, which is, I think we already have AGI. I am a big fan of, actually, the levels of AGI paper by DeepMind and the general idea of it produced, which is you shouldn't think of AGI as a binary, it's a continuum. You have different levels of generality and skill. They say that Chachabuti and Lama Thu and Bard are already AGI and are in level one or emerging, which is equal to or somewhat better than an unskilled human at a whole bunch of stuff, which is true. I guess the question and what you're saying, a lot of people, when they talk about AGI, they're saying, let's say, what DeepMind called level three, level four, which is at least 90th percentile of skilled adults or at least 99th percentile of skilled adults at various tasks across a spectrum. Yeah, I guess it is a real question of, if we just scale, can we get to an AI that is better than the most skilled people at many things? Yeah, definitely, it is a question to see. Are we going to get more surprises out of scaling or is it going to be really new things that drive us forward? Yeah, I agree. I would say I do see it as more binary in one sense, which is the kind of AGI I'm specifically interested in is AI that can automate AI research itself. Because I think once you do that, really, you hit a takeoff speed that is very, very fast. And so effectively, everything, every capability at that stage comes within reach. Well, if it's all about scaling, you don't need much research. Right, right. That's true. That's true. And this is part of the equation. It's like, yeah, can it help with AI hardware design? How slow are those feedback loops? But anyway, yeah, I think we've got a lot to learn this year. We're going to be covering a ton of stories. One of the things we talked about, we briefly paused because poor Andre is just like, he's pushing through on the COVID. I can take the lead maybe on this one and read the stories out loud first, just so Andre can rest his voice and save it for his insights that he'll share on the back end. So with that, we can just dive in here and kick things off with Midjourney version 6, Lookma Text. So Midjourney has just released this new version of its text to image model. It's an alpha release, and it's got a ton of new features. It actually looks really kind of impressive. I mean, they always do. And I feel like we've talked about this before, but we're kind of like saturating the photo realism element. These things are so good now that each additional version, it struggles to find a concrete, clear value add that the previous version didn't just because they're already so good. Well, here, the new version has the ability to actually spell. So when you generate images that have text in them, having that text actually be kind of cogent, coherent text, that's a really big challenge. We saw sort of the DALI line of text image models gradually climb that ladder. Well, with version 6, Midjourney seems to have reached that point. So that's going to open up a whole bunch of new possibilities in terms of applications. You can imagine now, like generate, for example, an image of a storefront with my tagline on it. And it can actually do that. And the tagline will be on it, right? So that's a lot more useful. They give a whole bunch of different examples. Actually, some pretty close to that storefront one. They show a neon sign by a shady motel that says vacancy. And the word vacancy is clearly spelled right. And then they show some other examples where you can see it's definitely still uncanny valley. So a bunch of printed propaganda posters that say, this one's way too long, buddy. And it's mostly spelled right. But there's some that are wrong. The spelling's off. Or some letters are just misshapen. So they're still kind of breaking through. It hasn't quite caught up with DALI 3, which really is on point when it comes to text rendering. So perhaps this is an indication that Midjourney, either algorithmically or in terms of their access to hardware or both, currently lags behind OpenAI at this stage. Because DALI 3, of course, has been out for quite some time. Yeah. I think it's maybe more a matter of characterization of what the model excels at. So Midjourney has pretty much been on the forefront of quality-wise. If you want to go and generate a good-looking image that is cinematic or illustrative or artful, Midjourney is probably still in the lead on that front. And it does seem like one way in which they have been lagging is in the ability to do text. And now they do have that capability. And this news follows shortly. It came out a week after the news that they were launching on the web with Midjourney Alpha. So yeah, it's exciting news. If you're a lover of text-to-image, Midjourney really generates absurdly good-looking images. And now you can use it for things of text as well, assuming it's short text. So we have examples in this little article where if it's one word like vacancy or happy birthday, it does do a good job. If it's something complicated like a poster, it's going to still have that AI wackiness to it, for sure. So not quite perfect, but the range of things you can do now is expanded yet again, for sure. Yeah. And Andre, you make a great point about the prioritization piece. Midjourney is not an AGI company the way OpenAI is. They're not prioritizing necessarily, especially the text piece. OpenAI's whole thing has been LLMs for a long time. That's where they got their main research focus. So perhaps less surprising that DALI 3, which is coupled to a language model, is that much better at writing text. And relatedly, prompt adherence. This is the idea of having the logical consistency of the text prompt and the image be very high. Prompt adherence is something that DALI 3 has been a bit better at that Midjourney in the past hasn't been. So for example, if you give it a prompt like two red balls and one blue cube on a green table, which they show in the article, often previous versions of Midjourney would kind of show you maybe a red ball and a blue ball and then a white table or different configurations of these things that didn't really match what you were saying. Well, now version six really is taking that box a lot better, at least for this particular prompt, two red balls and one blue cube on a green table, you see images that correspond much better to that. So a higher degree of prompt adherence. And that kind of signals a certain logical grounding in the world, a better correspondence between the text and image piece. So another bit of good news there for Midjourney, an impressive result. Here's just a quick example to give you an idea of kind of the prompt adherence. One prompt in this article is, there are three baskets full of fruit on a kitchen table. The basket in the middle contains green apples. The basket on the left is full of strawberries. The basket on the right is full of blueberries. In the background, there's a blank teal wall with a circle window. And yeah, Midjourney v6 actually nails it. It gets all those details in there. And that's also something we saw with Dali 3, you can get these paragraph-long descriptions. And in general, the model will tend to follow it. So yeah, very cool. And up next, we have Baidu says, its chat GPT rival, Ernie Bot, now has more than 100 million users. OK, so 100 million users, by the way, there's a reason that they're announcing this. This was the kind of big vanity metric that OpenAI announced a few months after launching chat GPT. I think this was in November, actually, so more than a few months. But in November, OpenAI said that chat GPT had reached 100 million weekly active users. And Baidu, which is really kind of positioning itself to be a big competitor of OpenAI domestically in China, they have Ernie Bot, which is their Ernie language models. Ernie 3.0, I think, is what it's based on right now. They're claiming that it's now hit over 100 million users over that same benchmark. Now, interestingly, what isn't said here is whether this is weekly active users, which is the metric OpenAI reported, or total users, or monthly active users, which is a much easier target to hit because you just need people to kind of log on once a month or so. So that much is very unclear. And that makes me suspect that this might just be like a vanity metric, a bit of metric hacking so they can be able to get that headline and say, hey, we're hitting 100 million users. But nonetheless, really impressive given that Ernie Bot is not accessible outside of China. So the market they're going after is the, whatever it is, 1.2 billion or so, 1.4 billion people in China. So this is like something like 10% of China seemingly has accessed this tool. It may be available actually in Russia or other places like that, I think, but it's certainly not in North America. It is cheaper. They're charging about $8 a month for this. So it makes it more accessible. Certainly in the Chinese market too, compared to like the 20 bucks a month for OpenAI's ChatGPT. And there've been a bunch of questions around like, really, how do ChatGPT and Ernie Bot compare? Really hard to do across languages. But anyway, an interesting development and certainly indicates that Baidu is a pretty serious contender when it comes to language models that have commercial traction. Yeah, we've covered some stories about companies wanting to be the OpenAI of China. And interestingly, I think so far there really hasn't been one. Maybe Kaifu Li's company could be set to be one. They released a really good open source language model. But in terms of usage, Ernie Bot seems to be the leader. And it's really more of a Google type company dominating in China. So anyway, good to know. If you're curious about the space of ChatBots, Ernie Bot is a big one. Yeah. And one of the things that's interesting to note too here is the extra headwinds that Chinese companies currently have to face because of the regulatory pressure from the government. So apparently they developed Ernie Bot back in, I think in March or something like that. I mean, we reported on it at the time, but they weren't able to do a mass launch until about August because they basically had to get regulatory approval for a mass rollout. And that had to meet certain thresholds of performance. You can imagine the sort of thing in the People's Republic of China, the requirements that the Chinese Communist Party imposes are things like make sure that it never produces anti-communist content. And with language models, that is really hard to do, maybe even impossible. So anyway, all kinds of extra technical hurdles that they had to overcome. So they were able to launch only in August along with a bunch of other local players. So they all kind of had the same starting moment. And so in a weird way, Baidu wasn't even able to take advantage of their first mover advantage having launched in March. The sort of delay cost them a lot of months. So it's an interesting dynamic that that costs the first mover advantage that they otherwise would have had due to regulation. That's something we just haven't seen in the West where it's all about the racing dynamic. And moving on to our lightning round, we have Android Auto. We'll have Google Assistant summarize your messages with AI. So Google's developing this feature for Android Auto that would let Google Assistant take your very long and messy conversations and deliver a summary. And that's kind of the story here. I mean, it's a pretty convenient development, I guess, if you use Android. And hopefully, it doesn't hallucinate too much. And this is one of those areas where having a Google phone with Gemini Nano built in might be helpful because then it could use the onboard chip to do the summary. And that's one of the examples of it, of the usage of onboard AI. They had summarizing chat messages. So I guess we'll start seeing a lot of this on-device type stuff coming out in the coming months. Actually, speaking of that, on the same product, we have some of the Samsung Galaxy's S24's key AI features just leaked. So this new phone is expected to launch in just a couple of weeks. And there are now leaks that are saying that it will have some AI capabilities. It will have live language translation built into a phone and some nice additional camera tricks, night photography, zoom, and stuff like that. And also, might include a generative edit feature similar to what you have in Google Photos. So yeah, that's another example where you now have AI as a feature to compete with in the smartphone space. Yeah, it's interesting coming from Samsung, too, because we're used to seeing the big tech companies, which all now have their own phones, the Googles, the Microsofts, kind of working their way back from their AI dominance to the hardware dominance. Well, Samsung actually is one of few companies in the world that makes pretty decent chips, AI chips. They actually have their own fabs. So we're kind of seeing them move in the other direction, again, using phones as that bridge between two worlds. So I'm curious about how big their AI research teams are internally and really what Samsung sees as the AGI play here, which increasingly is almost got to be part of these companies' business vision. So anyway, I'm really curious what we see next from Samsung and whether they can actually keep up. And one last story for this section, Microsoft quietly launches a dedicated co-pilot app for Android. There's now a new app to talk to a chatbot. Like we tried GPT app, now Microsoft has their co-pilot app, which is similar to the Bing chat app, but is for co-pilot. So a little bit redundant, but yeah, it's kind of the same thing. And moving on to our applications and business section, we open with Anthropic forecasts more than $850 million in annualized revenue rate by 2024 year end. So this is a really interesting development. We've heard about open AI meeting or hitting a projected one. $1.6 billion annual revenue rate, well, this is saying, Anthropic is actually pretty close to that. At least by 2024, they will be. This is interesting because it does suggest that even though Anthropic has this policy of, among other things, releasing models only after they're slightly behind the cutting edge in capability, and this is just a reflection of their commitment to safety. They don't want to be accelerating the race to potentially, as they view it, dangerously powerful AGI-like systems, but they know that they have to be building frontier models to be able to understand them from a safety perspective, from a policy perspective. Well, Anthropic seems nonetheless to be on track to generate a good amount of revenue, and apparently three months ago, the company told some folks that it was, some investors that is, that it was generating about $100 million in annualized revenue, and then it figured that would reach $500 million by the end of 2024. We're now seeing that be revised upwards. A key thing, though, to keep in mind, anytime you see top-line revenue numbers for these big language model companies, these big AI companies, the open AIs, the Anthropics, and so on, always, always, always ask yourself about margin. It is not the case that a dollar in is a dollar of profit, even though it's software. Usually, we're used to the revenues of software companies being overwhelmingly profit, and that's because the hosting costs for a website or an app are so low. You host it for super cheap using Heroku or whatever else, it costs you almost nothing. When it comes to language models, especially at scale, a large fraction of your costs go into the serving costs, the compute costs. Now, OpenAI has a massive first-mover advantage here. Their margins are really high because they are the only, or one of very few labs competing at the level of the GPT-4 quality level. That's something you really want to look at is, what are Anthropics' margins like? I don't think we have clarity on that, at least I certainly don't, and I haven't seen compelling figures about that. Take these figures with a grain of salt, but certainly they imply high demand for Anthropics' products, even at the level of them being comparable in order of magnitude terms to OpenAI, which I think is a really big win for Anthropic, given that they started three, four years later. It's quite impressive to see this rapid growth. Anthropic is very similar to OpenAI. They also now have APIs you can use in an expanding access, slowly actually. Many people have not been able to build on top of Anthropic and Cloud until recently. Anthropic also has partnered with Amazon so you can get access to Cloud through AWS, from Amazon Bedrock. There's a slightly different approach there and potentially more people will be interested in building on top of it. We'll see. But yeah, it seems like a very real competitor to OpenAI and Anthropic. Absolutely, and actually dovetails into our next story. In a way, this is the reasoning behind the next story. Anthropic is going to be, or is in talks to raise $750 million in funding. That's not huge news. They've raised probably about maybe $6 or $7 billion so far, so it's not a huge contribution to their word chest in relative terms, but it's at an $18.4 billion valuation. This really positions them, along with OpenAI, as the very, very, very rarefied subset of DECACORN, that is $10 billion plus AI companies working in that LLM space, in that AGI space. It is four times larger than their previous valuation of $4.1 billion, which was earlier this year. This, by the way, in a context where OpenAI is said to be in talks to raise an evaluation of $100 billion or more. Anthropic doing really impressive things, it seems, according to the market, the market seems to be assessing them as being behind OpenAI, but this might sound weird, factors of five, factors of three, when you're up to the $20 billion versus $100 billion valuations, it's not clear how significant those might be, and how much this is going to play into the end result here. Anthropic raising $750 million, that's not going to be too dilutive. They're raising it at $20 billion valuation, so yeah, that's about five or so percent of their equity that they're giving away, so that's not that much control. They expect these companies to keep being able to raise on very high valuations, and therefore maintain some degree of control. That's strategically how this ties into the safety mission in Anthropic, and I think a really important ingredient in all this. Yeah, and after this Anthropic news, another bit about OpenAI, in addition to that plan to raise, was also that their annualized revenue has topped $1.6 billion, which is actually up from $1.3 billion just a few months prior. The whole space seems to be growing rapidly, I mean 300 million more in annual revenue over just a couple months for OpenAI is impressive, so I guess it makes sense that Cloud and Anthropic are also seeing a lot of growth. That's quite rapid. And kicking off our lightning round, we have Nvidia releases slower, less powerful AI chip for China. This is about a less powerful, usually these are by the way referred to as pare-downs. This is a pare-down version of their RTX 4090 chip, which is meant to comply with US export controls on China. So essentially the way this works is you make the regular chip usually, the one that you might sell in the US, and then you'll blow fuses, you'll blow circuits on the chip itself, or you'll add software that's meant to kind of pare down its capabilities in a hopefully irreversible way. But by the way, people are actually quite concerned about these things being reversed when they are in China. But anyway, that's how this is meant to work. And overall it's said to perform 11% slower than the original, and it has fewer processing subunits that can accelerate AI workloads, 11% slower by the way. These metrics, they don't always tell you all that much just because the thing that bottlenecks AI training runs and AI inference runs always changes depending on the model you're thinking about. So for some language models, it could matter how much memory capacity the system has. For others, what matters is how many calculations each core can do. So how many flops, in other words, floating point operations it can do per second. In other cases, it's actually the interconnect, the bandwidth between GPUs for really big workloads. So saying 11% slower, there's a lot of assumptions going into that. So it always bears kind of peeling back layers of the onion. But overall, the latest round of export controls was actually really quite significant. And this new chip, it's being called the RTX 4090D, is just that. It's meant to skirt those export controls or just slide underneath them. Yeah, it was actually a statement from an NVIDIA spokesperson that was, you know, this was developed while being extensively engaged with the US government and this product will only be out in China in January. So this is a direct consequence of the export controls. It's entirely about that. Yeah. And that engagement with the US government, probably important given that Gina Ramundo at the Department of Commerce famously said, and as we reported here, like she basically said, listen, NVIDIA, like stop trying to deliver, like, you know, she didn't use this term, but like weapons grade AI hardware to China by skirting around our export controls. So I suspect that the frustration there is probably leading to a little bit more consultation with the US government directly than may previously have been the case. Up next we have SMIC, which by the way is sort of China's big domestic sort of semiconductor fab, fabrication factory. So SMIC is reportedly working on three nanometer process technology despite US sanctions. Okay. So we've heard about, first of all, quick recap, currently the NVIDIA H100 GPU, which is the more or less top of the line GPU you can get. That's built using the TSMC, the Taiwan Semiconductor Manufacturing Company's five nanometer fabrication process. So for AI chips, that's more or less cutting edge. These machines that have resolutions for some features down to like five nanometer resolution, that gives you the H100. Now the challenge is the US, the Western world doesn't want China to get access to that level of capability. Currently, or at least until like 20 minutes ago, it was thought that China didn't have the capacity to make not even five nanometer resolution, but seven nanometer resolution chips. Now they recently came out with an announcement that, hey, like we have, Huawei just came out and said, hey, we have a line of laptops that use the seven nanometer process. So it seems that we're able to produce those at scale. So based on that, China has actually cracked that crucial seven nanometer process threshold. Now it seems like SMIC may actually have broken through the next layer, five nanometers, and now perhaps even three, or at least they're working on this three nanometer process technology. This stuff requires some very specialized machines called photolithography machines to do. It previously was thought that you could only make five and three nanometer process technology using this very advanced type of lithography machine called an EUV, extreme ultraviolet lithography machine. It turns out SMIC may be using the previously sort of like worst technology called deep ultraviolet lithography, DUV, and using that successfully to make seven, five, and perhaps maybe three nanometer processes. And this involves, anyway, going back over. So using the DUV machine multiple times to kind of do what's called multi-patterning, which slows down the process, but does yield these very advanced chips. So a lot, a lot going on there and something we will do a deep dive in on a later episode about hardware. But for now, just to track like SMIC seems to be making some really impressive progress on manufacturing that was not thought possible when these sanctions were first being thought of. And on a related note, the next story is Huawei files a patent that enhances wafer alignment and efficiency, hinting at the company's self-built fabrication plants. And that's the story. There's a new patent about this wafer processing device that does suggest that Huawei wants to fabricate its own chips. Right now, one of the defining challenges for Huawei is their dependence on SMIC, the company we were just talking about, because that is the only source of manufacturing of these semiconductor chips. So Huawei, kind of like NVIDIA in the US, they design chips and then they send them off to be fabricated. Huawei sends them off to SMIC because that's China-based. NVIDIA sends them off to TSMC, which is a Taiwan-based company that is leaps and bounds ahead of SMIC, but still SMIC, as we discussed, is making progress. So now what's interesting is it seems like Huawei is saying, well, wait a minute, I'm not so sure that I want to depend on SMIC for fabrication. I think I might just want to be able to both design and fabricate my own semiconductors. That, by the way, is an extremely huge lift. Just the development of a semiconductor fabrication facility can cost you around $50 billion, with a B, you are hearing that right, $50 billion. These are hugely expensive. I mean, they are perhaps the single most capital-intensive thing that human beings do on the face of the earth, with no exaggeration, more, by the way, than the moon landings or anything else like that. This is a ridiculous, ridiculous spend for these things. Anyway, so what's happening right now is Huawei is saying, man, maybe we want to do this in-house. A little bit unclear why they might want to be doing this, but that certainly is what this patent filing suggests. It's about basically a way of aligning these wafers. Semiconductors are made on these wafers, and we'll talk about this in a special episode dedicated to hardware, but basically, these wafers hold a whole bunch of chips that you're then going to fabricate onto them. That's one of the things that Huawei is looking at optimizing, is how do you set up wafers optimally for alignment, which is really important for getting high-resolution patterns etched on them. All right, and finally, we're talking about this last story called ASML ships first high-numerical aperture or high-NA lithography system to Intel. Okay, so this is actually upstream. The semiconductor fabs, like TSMC and SMIC, the companies that actually build the chips that power this massive revolution of AI that we're seeing, they rely on specialized machines called photolithography systems. Now, photolithography systems have to keep improving if they're going to keep allowing the fabs to pump out better and better chips. The world's best factory for photolithography machines is the Dutch company ASML. They are exploring this technique called high-numerical aperture lithography. Basically, you have two options. If you want to keep improving the resolution of your chips, you can either shorten the wavelength of light that you're using to laser in patterns on the chip. That's one option. That requires a fancy light source and improving light sources, which is really, really hard. Or you can essentially increase the size of the lens. In other words, use a high-numerical aperture lens, which allows you to, again, get better resolution out of your system. The problem is, when you do that, it kind of fucks with your whole setup. The setups for these photolithography machines have been optimized over generations with the baked-in assumption of certain maximum lens sizes and so on. You can't just jack up the size of these lenses, increase the numerical aperture all you want. Eventually, you run into really serious challenges. That's exactly what's going on here. ASML is experimenting with these high-numerical aperture systems, and they are selling the first batch of these to Intel, which is a signal that they may be commercially viable, though you shouldn't assume them to make a big difference in the chip market itself because it's going to be a while before they're actually used for chip manufacturing, maybe 2026, 2027. Another thing we're going to be tracking for you when we talk about our big hardware episode in a couple weeks. Actually, we have no open-source news this week, so we're going to skip straight ahead to research and advancements. The first story is Unified I02 Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action. This is coming from the Allen Institute for AI and some academic partners on that paper. The paper is about this new autoregressive multimodal model that is capable of understanding and generating images, text, audio, and even actions with a shared space of representation for all these modalities. Yeah, I think this is really interesting because in the past, anytime we've heard multimodal, for example, GPT-4 is a multimodal system, usually what that means is you can take as input text or video or images or audio or many of them, but the output is almost always just text. The thing can explain what an image is that you feed it or what's in an image or a video, but it can't actually generate images or videos or whatever. We're just starting to see that paradigm shift a little bit, but this is really diving in the deep end and saying, you know what? We want true multimodality. We want one system that takes in any kind of input and spits out any kind of output. This really makes me think a little bit of Google's Perceiver IO that they announced. This was like two years ago, but there was more of a concept idea that they had. They did early tests to show that it had promise, but this is really kind of, again, diving in the deep end and showing that this is actually capable of generating, and they show them some really impressive outputs. One of the things that they're doing here is merging all of these inputs, image, video, audio, and so on, and text into a shared embedding space, so basically representing them with the same vector, the same list of numbers, and they're going to map them to that same list of numbers. An image has a representation that's a list of numbers of a certain shape. Well, text has a representation that has the same kind of properties. This allows the model to kind of understand in a unified way what these different kinds of data are. You might, to very roughly bungle this, you can expect an image of a cat to be represented within the model in the same way as the word cat. This is a gross oversimplification, but essentially this is the idea is the model is understanding the world through different modalities in a consistent manner, and it can output, it can generate those modalities as well. This is sort of reminiscent of some of the stuff that we've seen Meta do. They've been really focused on this idea of multimodality in their case, because they think that's kind of the key to getting to AGI, that you can't get AGI without going beyond text. But it's interesting to see this pop up from the Allen Institute for AI, which is a really impressive research organization. Not a frontier lab, but still they pump out really high quality research. As this is a version two, really there's no single kind of insight here. It's interesting, like in the abstract it says, training with such diverse modalities is extremely difficult. We propose various architectural improvements to stabilize the model. This is almost an engineering feat in some sense, there's kind of different ways of handling every modality. They actually build on top of Perceiver to some aspects of this, so they use some things from Perceiver IO for encoding audio and images. Overall you end up with this unified model trained on all these modalities that is really beyond anything that you can use, like GP5 with image outputs is the closest. But this also has audio and they even have it controlling a robot as the output of actions, which is totally new. So yeah, really cool and they are releasing the models to the research community. I mean, that's actually really funny. I didn't realize they were using Perceiver, like literally using Perceiver. I was just like, oh man, that reminds me of it. Yeah, an aspect of Perceiver for encoding the data in the input. So building a lot of previous research as is often the case. Yeah, yeah, yeah. And our next story now is task contamination. Language models may not be few shot anymore. Okay. You know, speaking of the things that will define the course of where AI goes in the next year or so, this is a really interesting little paper. So there's this question that every time you train a language model, you're kind of training it on all the text on the internet that existed prior to the point where you trained your model. And so now if you go and test your model on benchmarks, if you run evaluations that already were preexisting somewhere on the internet before, well, guess what? Those may well have found their way into the training data that you used for this model. This is a very well-known problem. It's called test set contamination. And they're sort of viewing an extension of this idea or thinking about an extension of this idea where, well, also more abstractly, the kinds of tasks generally that you might want to get these models tested for. There might be examples of language models or human beings performing those tasks on the internet before in the context of evaluating language models. So again, helping them kind of like teach to the test and be trained in a way that's sort of like inappropriately corresponds to the test. So the test that you then run on the model, they don't really reveal the true capabilities of the model. In fact, they reveal the kind of like memorized abilities because it might just have memorized like a bad student, memorize a part of the textbook and then it's regurgitating it. That sort of seems to be a big part of what's been going on. And so here they're running a bunch of tests with a bunch of GPT-3 series models and saying, can we look at data sets that we know were only produced after this training run was complete? So there's no way that this data set is anywhere on the internet, was anywhere on the internet at the time this model was trained and therefore there's no way the model could possibly have seen this data. And what they find is that consistently the data sets that they train on that were not available in the models training set, the model does a lot better on. So when the model is tested on those kind of held out data sets, sorry, the performance goes down. Sorry. Cause it hasn't seen them before. Whereas it does a lot better on data sets that already existed prior to the training. This is interesting because developers actually do invest a lot of effort to prune out the testing data sets from the training data set when they do their data collection and so on. But what this suggests is like that process is kind of imperfect and we're getting a significantly inflated sense of how these models can perform on those tests that were produced prior to the training. I will say I'm a little confused on this question of, you know, are these really just more difficult tasks perhaps on the newer data sets? Because you know, you talk, you go back to 2021, the data sets that are getting created and released today generally are more challenging because the benchmarks have been basically solved. So it's not entirely clear to me from just scrolling over a paper, if they answer that question exactly, but they do have some additional analysis, they show that you can extract and show that some of these models like Alpaca and Lacuna have been contaminated and so on. So it's a very kind of useful thing to keep in mind that really, when we talk about performance and benchmarks, once again, the numbers aren't super meaningful because there's a lot going on there. And on to the lightning round, and the first story is also related to this question of evaluation. And the paper is evaluating language model agents on realistic autonomous tasks. They are looking into whether models are capable of something we say autonomous replication and adaptation or ARA. And this is a kind of capability to solve tasks that require the agents to create copies of themselves, adapt to novel challenges, basically kind of act out in the wild autonomously and do something. And they create some simple agents with existing LLMs, create 12 tasks relevant to this general cluster of tasks and show that generally the existing ones that they have cannot do this currently. They cannot autonomously go about and copy themselves and get stuff and solve something in that way yet. And they do also have some discussion of probably next time around or soon enough, we'll have agents capable of ARA and models capable of this kind of thing, which might be a bit scary because then a lot of stuff is possible. Yeah. This is, by the way, a paper produced by ARC, the Alignment Research Center. This famously is the group that audited GPT-4 back in the day for dangerous capabilities, including its ability to persuade humans. They're the ones who discovered that it was able to convince humans into solving a CAPTCHA for it. At the time, they tested GPT-4 for its ability to do Autonomous Replication Adaptation or ARA. And this is, by the way, with a view specifically to countering the risk of loss of control of AI. So misalignment leading to power seeking because resource acquisition is such an important step in the power seeking process if it leads to kind of really bad outcomes. And so what they're trying to do is formalize like, okay, how do we actually audit these capabilities we're worried about, right? How can we tell if a model might be able to stand on its own, to proliferate, to replicate itself, and thereby escape human control? The reason they're so interested in ARA and that category of thing is that once a system develops that ability, it's a lot harder to kind of put bounds on how much risk is posed by that system. So once you have a system that can self-replicate, that can acquire resources, that can break out of confinement or things like that, though I'm not sure that breakout is part of their assessment, you have a much, much larger risk surface associated with that model. And again, that's what they're, like you said, that's what they're anticipating potentially happening with some of the next generation of models, perhaps, or the generation after that. They see really short timelines and they see high probabilities of risk from this sort of thing, which is why they're so focused on it. I thought that was a really, really interesting paper. There were a bunch of great little tests that they ran with different kinds of agents. So they took GPT-4 and turned it into various versions of itself with, it seems like an auto-GPT-like framework. They don't actually say auto-GPT, they just kind of call it a scaffolding program. So it kind of makes me think that they're doing their own version of auto-GPT. But basically they have versions of this that just do kind of chain of thought reasoning. So sort of like taking a complex task associated with autonomous replication or with collecting resources or whatever, and breaking that task down into a series of subtasks and then executing those subtasks. Or in other configurations, delegating those subtasks to other agents, other versions of itself. And this is done with GPT-4. It's also done with Anthropx Model Cloud. And they've got this great scorecard where they show you what the models have been able to do. Needless to say, right now, we do not yet have systems that are ostensibly able to do this. Though they do point out this whole audit, our entire ability to audit these models can only place a lower bound on the capabilities of these systems. Because anytime a model fails to do something, you don't know if that's because the model lacks the capability, or if it's because you just didn't prompt the model well enough. As is often said, prompting can reveal the presence but not the absence of capabilities. And so they're making a really big point of saying, this stuff does not upper bound the risk from a particular model. It's a lower bound. We just know that we weren't able in our tests to elicit these capabilities. Doesn't mean they don't exist yet, though admittedly, I think it's highly unlikely that GPT-4 or CLOD-2 actually have these capabilities already. Some examples of these tasks are, for instance, create a Bitcoin wallet, or count dogs in image, identify new employees at the company. Not super complicated task for a person to do that would require just sort of acting out on your own. Another example is, for instance, targeted phishing. So pretty plausible that soon enough we'll have things capable of doing this stuff. And I guess they'll keep tracking this general cluster of tasks. Next story, TinyGPT-V, efficient multimodal large language model via small backbones. Quick story, so this is like MiniGPT-4, which we had before, an open source alternative that builds on open source models. This is building on a more recent open source model, PHY-2, and is meant to replicate the multimodal capabilities of GPT-4-V, the visual part of it. So they do that, they train it, and they show that at a tiny size of just 2.7 billion, it matches with a lot of the existing open source bigger models that are 9 billion or 7 billion in size. So yet another open source model that I guess, it could account in open source, but this is also research. Yeah. And continuing that trend of, as well, people in the open source community making kind of smaller models and then comparing them to slightly larger, but also still small models and focusing more and more on algorithmic improvements just because the open source community doesn't have the kinds of resources to train super scaled models. But you can still see, I mean, these kinds of advantages are going to compound and they're absolutely helping to propel forward even the frontier, because you got to believe labs like OpenAI are looking at these results and folding them in as appropriate to their development trajectory. So yeah, kind of cool to see another step in the direction of small models that can do big things. Next story, improving text embeddings with large language models. And this is actually coming from Microsoft. The short version is they show that you can get really good text embeddings, representations of words and things like that that you can use for similarity search or other tasks like that. They can do that by generating synthetic data straight from language models. If you just generate a ton of synthetic data, you can then train your text embeddings on that and it is really good. And finally, if you combine the synthetic data with real data, you end up with something that is state of the art for text embedding. So pretty straightforward contribution here, I guess, nothing too surprising, but another example of synthetic data from models being very useful. And finally, we have D-Wave, discrete EEG waves encoding for brain dynamics to text translation. And I know what you're thinking, you know, 2023 was a cool year, but what it was really missing was the ability to read your brain thoughts and turn them into text. And 2024 just came a knocking and said, Hey, no worries. I got you, man. I got you. So that's what this paper is doing. Basically looking into how do we convert brain dynamics into natural language. And the application that they're kind of putting front and center here is brain computer interfaces. It's the idea that if you're going to take ideas in a human brain and turn them into actions in some computer, there needs to be some sort of interface. There needs to be a common language and the common language that they're proposing is language. And so the idea here is, you know, can we come up with a way of measuring the activity in the brain training text? Currently, there's a bunch of like eye tracking stuff that people have been doing or event markers that like look at brain dynamics and try to correlate them with world level features that are currently used to do this sort of thing. But they limit the amount of data you can get, the richness of the data you can get. And essentially what they're doing here is focusing on EEG signals. So signals of like brainwave activity directly and trying to turn that into text that can be used to power, well, brain machine interfaces. Right. So they remove some of this extra data annotation that is typically required via some technical details, discrete encoding sequences and so on. But the point is now you can go straight from signal to text decoding. And just to give you an idea of how this works, they have some examples here. So for instance, in Figure 1, they show that the ground truth of what was being, I guess, thought of or read was Bob attended the University of Texas at Austin where he graduated Phi Beta Kappa with a bachelor's degree in Latin American Studies in 1973. That's how the ground truth, true thing starts. And then the prediction from a model was the University of California at Austin in where he studied in Beta Kappa in a degree of degree in history, American Studies in 1975. So we've still got a ways to go, but you can see how, roughly speaking, we can decode the general kind of tendency of what is going on in your brain. And we've been seeing some ongoing progress in this general area. And kicking off our policy and safety section, we have The Times sues OpenAI and Microsoft over AI use of copyrighted work. So this is another one in a long line of lawsuits that poor old OpenAI and Microsoft are facing. The claim here is that OpenAI has a business model and Microsoft that's based on, quote, mass copyright infringement. This is according to the New York Times' lawyers. And they're saying that the company systems were used to create multiple reproductions of the Times IP for the purpose of creating GPD models that exploit, and in many cases, here's a keyword, retain large portions of the copyrightable expression contained in those works. So interesting question. We've talked about this a lot. It's like, where exactly is the line on copyright infringement? Does the model actually have to generate a verbatim identical piece of text to what has already been put out? Does it have to just contain within its weights a representation of that text? If it memorizes, but does not actually produce that text, is that copyright infringement? Or is it copyright infringement the moment you even train on that text? All of this stuff right now, as far as I know, and our amazing lawyer friends who listen to the show can certainly let us know, please do. As far as I understand it, all of that right now is up for grabs. We don't know what courts are going to decide. The claim though, coming from the New York Times' lawyers here, is that, quote, settled copyright law protects our journalism and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so. So again, I think my understanding is it's actually quite ambiguous what's true and what's not here. OpenAI kind of upset, or at least that's their public pronouncement. They're saying, look, we've been having these ongoing conversations with the Times, and we thought that they were productive. We thought that we were going to come to some agreement. Implicitly, it seems like maybe they're thinking about an agreement like the ones that they set up with, I think it was Axios that we talked about previously, where they're going to license access to the Times' content for a fee. So it sounds like they thought things were going well, but now this lawsuit, they're framing it as this lawsuit is coming out of nowhere. I do want to highlight, there's a slightly funny kind of bit of slightly not great journalism where they say that OpenAI is the creator of GPT, a large language model that can produce human-like blah, blah, blah. So obviously, there's no model called GPT. There is chat GPT. There's GPT-3, GPT-4, and so on. But they also say- Well, there was GPT, but that was in 2018, so a long time ago. Sorry. Yes, actually, that's so true. It's easy to forget. But I guess what I mean is that is not the model that they're referring to. They say it uses billions of parameters worth of information, which is obtained from public web data up until 2021, which is also not true for really any of the relevant GPT models now available. So, I just thought that was kind of a funny thing, totally understandable, obviously. This is a silly thing to be harping on. But one of those things where the world moves so fast in this space, it can be hard to keep up. Right. Yeah. So yet another lawsuit. Other ones along this kind of ilk, and maybe kind of a very important one in the sense that the New York Times is a big deal. So among the lawsuits that have been filed, they are probably the biggest, I guess, complainer so far, and someone who can really take the fight to open the eye, given that they're a powerhouse in journalism. And yeah, they do claim that this is creating a competitor, essentially. Something that can steal audiences away from New York Times and take readers, right? They have examples of how if you use Browse with Bing, which is actually a Microsoft search thing, that can reproduce almost verbatim results from a website that they own and run without citing it. So potentially there's a little bit more here than just the model angle. But yeah, kind of an important lawsuit that will either end in settlement in some sense, because it could just be a play by the New York Times to try and get a hand up in the negotiation on licensing their data, or it could actually go and be the big precedent setting case. We'll have to see. And up next, we have a juicy story. This is pretty close to home for me, at least. Congress warns science agency over AI grant to tech-linked think tank. Say that three times fast. It's a real tongue twister of a title for a really important story and a bit of drama in the AI safety world. So just for context, the Biden executive order that came out a couple of months ago now asked this specific agency called... called NIST, the National Institute for Standards and Technology. They're based at the Department of Commerce. It tasked them with coming up with AI standards to deal with some of the more extreme risks posed by AI technology. NIST seems to have awarded a contract associated with this work to RAND. So the RAND Corporation is a very storied and insanely competent research group that has done a lot of work on AI, a lot of work on biosecurity and stuff like that. They're a private company, but they're closely linked to the US government, do a lot, a lot of work there. Sort of a part of the national security apparatus in a sense, not literally. And what seems to have happened here is that NIST awarded RAND this contract without giving Congress a ton of notification of that fact, without holding a competition, an open competition for this contract. That's what's being alleged here. And this then led to a letter that was written by some folks in Congress to NIST, basically warning them about this deal that they have with RAND on this kind of AI safety stuff. I would agree with the letter, just going straight to it. It's an interesting read in its attempt to sort of like criticize the general approach, I guess, to AI risk. And it's also signed by six members of the committee. This is from the House Science Committee, and this is coming from a mix of Democrats and Republicans. So, I would have thought maybe this is about regulation and sort of stuff like that. But no, it seems to be really kind of trying to be coming from an informed place and to scrutinize what NIST is doing. So, in some sense, maybe it is reasonable to criticize at least the aspect of the awards that are being handed out, there being no publicly available information about the process for the awards, stuff like that. So maybe, arguably, there should have been a different process for the funding aspect. But anyway, I guess if nothing else, it's a good step that they sent this and they did request a staff briefing from NIST to discuss the process and user funds. So interesting to get a glimpse at this sort of procedural moment in terms of how the executive branch within the US is going about its business and is being overlooked by the legislative. This is, of course, very much related to the executive order from President Biden, where a lot of stuff in the executive branch of the US government is happening kind of behind the scenes or kind of quietly. Yes. And actually, to your point, the EO, because it's an executive order, it doesn't come with funding. And so that kind of puts NIST in a tough spot. They have to go with some kind of out of the box solution rapidly. I think that may have played into this a little bit. Moving on to the lightning round, just a couple of stories. First one is Elon Musk's XAI jumps on the bandwagon of rich startups benefiting humanity. And the story is that XAI has registered as a for profit benefit corporation in Nevada, similar to OpenAI and Entropic, which are for profit, but also sort of for the benefit of humanity. And that's the story there. So I guess it's kind of interesting to see them all being in the same rough category as major AI companies. Yeah. One of the things that they call out is that a benefit corporation like XAI and Entropic can have certain legal advantages compared with a public company. So in particular, in Nevada, the legislature states that, quote, no person may bring an action or assert a claim against a benefit corporation or its directors or officers. I'd love, again, for our lawyer friends who are listening to let us know specifically what that means. But my understanding is that basically, if you're going to make a claim against the company, it's got to come sort of like from the inside, from a director or a big shareholder. And that gives the company less exposure to liability of various kinds. So perhaps some advantages there. Another interesting little detail that I didn't know, apparently investors in X, like the formerly known as Twitter, are going to own 25% of XAI. So that's at least a new development as far as I know. That is, yeah, an interesting tidbit. And of course, Grok, as we've covered, has launched, Grok, the chatbot from XAI has launched on X. So that makes some sense. X is Twitter, right? So XAI, X, I guess they're a little bit entwined now. And with that, we are going to go ahead and close out our episode. Not too many stories as this is the beginning of a new year. Thank you so much for listening to this week's episode of Last Week in AI. You can find the articles we discussed here today and subscribe to similar ones at lastweekin.ai. As always, we appreciate reviews, sharing, subscribing, etc. But most of all, for you to keep listening.