I started writing this grab bag of quotes and thoughts about local AI models a couple of weeks ago, but was inspired to finish it after playing with Meta’s newest nano model Llama 3.1 8B on my machine over the last 24hours.
It is truly amazing to me that a model that has equivalent capability, to OpenAI’s GPT-3.5 -which blew the world away back in November of 2022 – is running locally on my year-old MacBook Pro using less RAM than a web browser.
Open As It Gets ¯\_(ツ)_/¯
Meta’s decision to release the Llama model architecture as open source to consumers is an extremely important move – often missed by commentators about AI. I’ll acknowledge that there is a great deal of hand wringing about the ‘actual licence’ of the Llama Architecture and how ‘Open’ their ‘Open Source’ licence really is – The licence is one of Meta’s own devising and not approved by the Open Source Initiative – it is never the less, about as open as gets.
Various estimates put Meta’s AI capital expenditure for 2024 at $35 billion to $40 billion expenditure. With $16-18 billion already spent on GPUs, data centres, GPU hardware, thus far. Either way, the release of the Llama 3 model stable – in particular the Llama 3.1 frontier model – are the fruits of this enormous spend.
I highly recommend this 30-minute interview with Zuckerberg on Meta’s AI strategy and the release of Llama 3.1 as he makes some interesting points:
Reading between the lines, Zuckerberg intends Meta’s open models to stand as a kind of bulwark against the other tech giants’ closed products. His points about the future monetisation strategy are interesting too.
I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here.
These models for all intents and purposes – unless you have 700 million users – are released to the public and are free to download and use, retrain, hack on, modify, and extend its capabilities all you want. OpenAI, Anthropic, and Google meanwhile, have set the price point for large models at $20 a month. But, as per the title of this post – If I were them I’d be worried.
Even if Meta were to never release another model of this kind again, Llama 3.1 still represents a significant piece of computing architecture released into the wild. A gift to the world(?).
It’s not going away.
Also the largest, most capable frontier model released yesterday – Big Llama 3.1 405b, is (as I understand it) ‘just about’ runnable on top end consumer hardware. Though most folks will be running it in the cloud for the foreseeable future.
While reporting and writing in the media about AI tends to focus on the super-large frontier models that run on and require city scale industrial compute in sheds just off the M4, I think readers of this blog should be be interested in the much smaller Nano models.
Like I said I literally have one running on my M2 right now and it runs great.
Intelligence Inside
Last month, I made an episode of my podcast about Little Computer People and said what we are going to see ‘maximal intelligence at all levels’
A small model like Google’s Nano, plus new optimisation and increased RAM in whatever pixel hardware they announce soon may mean we see a performant language model running locally on a phone this year. An AI developer recently said to me that the goal is ‘maximal intelligence at all levels’—on your device, in software, and in the cloud. If your phone thinks an instruction is ‘too big or complex’ it will push it up to a bigger model in the cloud.
And that is exactly what Apple announced a few weeks later with onboard intelligence.
Apple Intelligence is designed to protect your privacy at every step. It’s integrated into the core of your iPhone, iPad, and Mac through on-device processing. So it’s aware of your personal information without collecting your personal information. And with groundbreaking Private Cloud Compute, Apple Intelligence can draw on larger server-based models, running on Apple silicon, to handle more complex requests for you while protecting your privacy.
The only thing we need for lil models to be running everywhere is for new consumer devices is more RAM. And as these two graphs show – there’s a lot of headroom:
Here’s a comment on the above from Gruber:
Apple silicon Mac with 8 GB RAM performs as well under memory constraints as an Intel-based Mac with 16 GB. But base model consumer Macs have been stuck at 8 GB for a long time, and it’s impossible to look at Schaub’s charts and not see that regular increases in base RAM effectively stopped when Tim Cook took over as CEO. Apple silicon efficiency notwithstanding, more RAM is better, and certainly more future-proof. And it’s downright bizarre to think that come this fall, all iPhone 16 models will sport as much RAM as base model Macs.
Just a ‘small bump’ in minimum specs is going to open up local nano models over the next few years: in our phones, laptops, tablets, TVs? The fact I have one running on a mid spec M2 right now, clearly means they will be inside of everything soon.
Which is what Apple are already doing of course. I think that it’s pretty instructive that they aren’t launching a chat bot, nor is there any kind of conversational UX wrapped around them. Critics who say that AI is nothing but ‘spicy autocomplete’ will largely be vindicated tbh. As the UI on general purpose intelligence improves, we will barely notice them. Behind the interface, these models will help with spell check, arranging lists, creating to-do lists, tidy up wonky copy and pasted text, prune speech-to-text voice notes, and more. A ‘Super Siri’
Siri draws on Apple Intelligence for all-new superpowers. With an all-new design, richer language understanding, and the ability to type to Siri whenever it’s convenient for you, communicating with Siri is more natural than ever. Equipped with awareness of your personal context, the ability to take action in and across apps, and product knowledge about your devices’ features and settings, Siri will be able to assist you like never before.
When it launches, (Unlike my thoughts on the OpenAI conversational demo) I really don’t think many people are going to be thinking ‘WOW THERE’S AN AI IN MY PHONE NOW.’ Most are just going to think ‘my phone’s a bit smarter, there’s some useful new stuff’ and just go back to texting their families in WhatsApp, scrolling TikTok, or whatever people use their phones for most of the time.
The thing is you’re xbox uses more energy. These small models aren’t the energy hungry, water guzzling, industrial compute everyone is worried about. They are going to be little guys in your phone.
The Only Acceptable Price Point is Free
Matt Webb asked the following back in October 2023:
If future AI models will be more and more intelligent (per watt, or per penny, or per cubit foot, whatever we choose measure) then we can equivalently say that, in the future, today’s AI models will become cheaper and more abundant.
What happens when intelligence is too cheap to meter?
Too cheap to meter: a commodity so inexpensive that it is cheaper and less bureaucratic to simply provide it for a flat fee or even free.
Last month I also wondered about the falling cost of running state-of-the-art LLMs and what this might mean over the long term.
Right now it costs about 60 bucks a day to house a state of the art little computer person powered by an LLM in a virtual world. People are already doing it, and the idea has been around for decades. But what will the world be like when it costs 60p?
With Llama 3.1 8B running locally on my machine I totally missmisjudged how quickly it was going to happen. It REALLY IS worth us asking the question: What happens when intelligence is too cheap to meter?
Right now the price point for the frontier models has settles on $20 bucks a month. But big Llama3 (as I have already mentioned) is just runnable on consumer hardware. And as Albert Romero over at The Algorithmic Bridge recently said, this price point is a question of value:
I lurk in alpha AI bubbles. Here’s the most common take I’ve heard in 2024: “Why do people still use the free version of ChatGPT when for a few bucks you have access to substantially better tools like GPT-4, Gemini Advanced, and Claude 3”? (This changed after GPT-4o became the default model but remains a valid question.)
It feels true: No one I know pays for these tools. No one I know online who’s not in the bubble pays for them either. I’d even wager, without proof, that most users haven’t noticed ChatGPT was replaced by a more powerful version.
It’s true—not as an opinion but as a verifiable fact—that you can get surprising amounts of performance improvements (I’m talking about the you-can’t-believe-how-much-better-this-is-until-you-try-it kind) worth much more than twenty dollars. I condemn the use we’re giving these tools but when used for intimate purposes instead of deceptively making money, they’re a bargain if you take the time to learn.
The thing is until very recently the free version of ChatGPT people were using had the same level of capability as the model I have running on my machine.
Romero again:
Those are the facts. Here’s the big picture of the current trend—making AI models (up to) 100x smaller instead of larger:
The first AI models were seriously underoptimized. It was a matter of time before they got tiny, fast, and cheap without compromising quality. In other words, they should’ve never been that expensive.
A few companies own the best models—private, open-source, large, and now also tiny—which means they control the entire market.
A few companies own the distribution channels, which means AI isn’t a democratization force but a new element of the same oligopoly.
LLM’s are 7 years old and realistically just 3-4. Optimisation is going to continue to happen and we are going to continue to see all sorts innovation happening, 1-bit models are being explored etc. The cost of training models is only going to increase so the pace of innovation at the bottom end is going to be intense. And like with the ecosystem around stable diffusion, much of it is going to happen at the hobbyist level.
Meta has basically created a new market. for consumer AI. One totally separate from and outside of Google, OpenAI’s, etc API paywalls. As per the strategy:
I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here.
And this is the thing: With the ecosystem that has already spring up around older Llama models, these new models capabilities, and market presure – it may well be that ‘Free seems to be the only acceptable price‘.
I have some misgivings around Meta’s strategy, and I also wonder – like with their metaverse project – just how many billions they are going to burn on this over the long term. But for now I am glad they are.
I’m going to keep playing with the smaller nano models, seeing what they can do. I’m looking into training a LoRa on my journal, or maybe my blog.
The only thing these smaller nano models require is battery power. They aren’t burning down the rainforest, using the power of a small city, and the water of a small country. They are just lil’ guys in your phone and everyone is going to end up with devices that have ‘Intelligence Inside’.
We are already in the era of intelligence is too cheap to meter. The real question is what are we going to use it for?
I cut several other sections out of this post which I’ll come back to, as they really should be whole posts by themselves:
- Observing that its private and not nation state actors burning all this cash building these things.
- At some point soon we may run out of road with training data, and/or the training of open-source AI models becomes so enormously expensive costing $100’s billions of dollars for incremental improvements that their training become a generational civilisational project.
- Funded and worked on at a global scale as an engineering project, similar to global coordination around climate change?
- The first sparks of Silica Anima might get confirmed soon. At which point, the entirety of ‘post-Enlightenment Western civilisation and theory of mind‘ comes tumbling down and we have a much bigger crisis rendering almost everything we know about AI / philosophy of computer science moot.
- Insert “Google Reza Negarestani Meme”
- The railways, telegraph, telephone, internet back bone, and social media companies all lost billions and billions of dollars laying the Rails and Fibre. If the bubble pops and everyone goes broke we’ll still have Llama 3.1 and a shit ton of capacity that we can find other uses of GPU’s for.
- Riffing on the idea of Little Computer People and local agents as Tamogotchi’s I think Matt Webbs recent post on AI landscape is missing a whole category of ‘art games’ and clippy pets.
- Evals
Newsletter 📨
Subscribe to the mailing list and get my weeknotes and latest podcast episodes, sent directly to your inbox


Leave a Reply