AI Releases, March 2023

Last updated Mar. 15, 2023 — read 1994 times

Wow, this was one of the most thrilling weeks in tech. It feels like a whole year of AI progress happened all last week (not to mention a banking crisis + resolution).

GPT-4 released
LLaMa running on consumer hardware
Claude released
Multi-model nat.dev playground released
Google AI showcase

GPT-4

I've used a version of GPT-4 (as did many, if you had access to the New Bing).

Compared to GPT-3/3.5 series, the key improvements for 4 are:

Significantly improved ability to understand, and reason about very long prompts. GPT-3.5 series allowed ~4k tokens. GPT-4 supports 8k with up to 32k in testing. This means way less “prompt engineering” and “model steering” is required to get the LLM to do what you want. See additional notes from me on the limits of current gen LLM “reasoning” here.
Multi-modal (text AND image input, text ouput). This hasn't been released yet, but OpenAI demo'd fully and correctly explaining screenshots, funny images, and memes. This is going to have big implications for software automation because we're about to see LLMs operating in “pixel space” to translate screens into structured language.

Other notes:

The GPT-4 technical report contains no details on parameter count, architecture, training time, etc. for “competitive and AI safety reasons”. Disappointing that only ~400 folks in the world know how the currently best-performing model works. We can still speculate.
GPT-4 training started 2 years ago and ended in August 2022. A look at historical GPT release dates:
- 2 –> 3: 16 months
- 3 –> 3.5: 18 months (InstructGPT)
- 3.5 –> 4: 15 months
The GPT-4 paper in mostly focussed on alignment and results. OpenAI says they sat on the trained model for 6+ months to do alignment work. Reinforcement Learning with Human Feedback (RLHF) was super important – the base GPT4 was not much better than GPT3.5 series on reasoning and standardized tests until RLHF was performed. Suggests parameter/training data scaling alone is not resulting in huge new capability unlocks but there is a ton of “latent capability” the LLMs are capable of once provided with a clear objective function.
Bing secretly using GPT4 was part of the rollout strategy to test quiet launch vs loud launch. OpenAI found it didn't help much with alignment goals so I'm not expecting many secret launches going forward.
The new “system” tokens are structurally going to be very important for programmatic use cases and preventing prompt injection.
Code gen is one area 4 does dramatically better than 3.5 series. Early demos show 4 can generate much longer, correct, runnable code zero-shot compared to 3.5. If GPT-4 can correctly generate code, it can also generate “no code”.

LLaMa running on consumer hardware

Several weeks above, Facebook announced LLaMa, an LLM they reported showed comparable performance to GPT-3 series models but much smaller in size (65B parameters compared to OpenAI's GPT-3.5 with 165B parameters). The model was leaked on BitTorrent (which is still live on GitHub which indicates at least some level of acceptance by FB).

Last week, Greg Gerganov open sourced an inference implementation (using minimal C++) using Facebooks' LLaMa leaked model weights.

LLaMa actually had several different model sizes from 7B parameters up to 65B parameters and importantly the 7B parameter version with instruction tuning is demonstrating coherent results! This is truly shocking and opens the door to running strong ChatGPT-like models on local consumer hardware with acceptable speed by converting the weights to int4.

And that's exactly what the internet has been sprinting towards the last week:

M1/M2 Mac (4 to 40 tokens/sec speed)
Raspberry Pi (1 token/sec)
Pixel 6 phone (5 tokens/sec)

This has several big implications:

Dramatically cheaper to deploy LLMs for lots of of use cases (pushes inference to client vs server)
Private inference now possible (eliminates sending sensitive data via an API to third-party)
No content filters (many users want raw model access vs sanitized corporate versions)

Each of the above alone would warrant a big deal. But together is quite revolutionary:

Llama LLMs running on M1/M2 consumer hardware def surprised me. Thought it would take another 12+ months to achieve that.
— Mike Knoop (@mikeknoop) March 12, 2023

Claude and a multi-model future

To date, the only realistic LLM API provider has been OpenAI. That's starting to change as Claude and Google both announced new model APIs for developers (and Claude shipped!)

In 12 months I expect there will be at least 4 or 5 major LLM API providers offering choice of model (and reduced platform risk) for the first time.

Nat Friedman released https://nat.dev/ last week which is a multi-model playground to compare LLM performance quickly in the browser.