Wow, this was one of the most thrilling weeks in tech. It feels like a whole year of AI progress happened all last week (not to mention a banking crisis + resolution).
I've used a version of GPT-4 (as did many, if you had access to the New Bing).
Compared to GPT-3/3.5 series, the key improvements for 4 are:
Significantly improved ability to understand, and reason about very long prompts. GPT-3.5 series allowed ~4k tokens. GPT-4 supports 8k with up to 32k in testing. This means way less “prompt engineering” and “model steering” is required to get the LLM to do what you want. See additional notes from me on the limits of current gen LLM “reasoning” here.
Multi-modal (text AND image input, text ouput). This hasn't been released yet, but OpenAI demo'd fully and correctly explaining screenshots, funny images, and memes. This is going to have big implications for software automation because we're about to see LLMs operating in “pixel space” to translate screens into structured language.
Several weeks above, Facebook announced LLaMa, an LLM they reported showed comparable performance to GPT-3 series models but much smaller in size (65B parameters compared to OpenAI's GPT-3.5 with 165B parameters). The model was leaked on BitTorrent (which is still live on GitHub which indicates at least some level of acceptance by FB).
Last week, Greg Gerganov open sourced an inference implementation (using minimal C++) using Facebooks' LLaMa leaked model weights.
LLaMa actually had several different model sizes from 7B parameters up to 65B parameters and importantly the 7B parameter version with instruction tuning is demonstrating coherent results! This is truly shocking and opens the door to running strong ChatGPT-like models on local consumer hardware with acceptable speed by converting the weights to
And that's exactly what the internet has been sprinting towards the last week:
This has several big implications:
Each of the above alone would warrant a big deal. But together is quite revolutionary:
Llama LLMs running on M1/M2 consumer hardware def surprised me. Thought it would take another 12+ months to achieve that.— Mike Knoop (@mikeknoop) March 12, 2023
To date, the only realistic LLM API provider has been OpenAI. That's starting to change as Claude and Google both announced new model APIs for developers (and Claude shipped!)
In 12 months I expect there will be at least 4 or 5 major LLM API providers offering choice of model (and reduced platform risk) for the first time.