AI Releases, March 2023

Wow, this was one of the most thrilling weeks in tech. It feels like a whole year of AI progress happened all last week (not to mention a banking crisis + resolution).


I've used a version of GPT-4 (as did many, if you had access to the New Bing).

Compared to GPT-3/3.5 series, the key improvements for 4 are:

  1. Significantly improved ability to understand, and reason about very long prompts. GPT-3.5 series allowed ~4k tokens. GPT-4 supports 8k with up to 32k in testing. This means way less “prompt engineering” and “model steering” is required to get the LLM to do what you want. See additional notes from me on the limits of current gen LLM “reasoning” here.

  2. Multi-modal (text AND image input, text ouput). This hasn't been released yet, but OpenAI demo'd fully and correctly explaining screenshots, funny images, and memes. This is going to have big implications for software automation because we're about to see LLMs operating in “pixel space” to translate screens into structured language.

Other notes:

LLaMa running on consumer hardware

Several weeks above, Facebook announced LLaMa, an LLM they reported showed comparable performance to GPT-3 series models but much smaller in size (65B parameters compared to OpenAI's GPT-3.5 with 165B parameters). The model was leaked on BitTorrent (which is still live on GitHub which indicates at least some level of acceptance by FB).

Last week, Greg Gerganov open sourced an inference implementation (using minimal C++) using Facebooks' LLaMa leaked model weights.

LLaMa actually had several different model sizes from 7B parameters up to 65B parameters and importantly the 7B parameter version with instruction tuning is demonstrating coherent results! This is truly shocking and opens the door to running strong ChatGPT-like models on local consumer hardware with acceptable speed by converting the weights to int4.

And that's exactly what the internet has been sprinting towards the last week:

This has several big implications:

  1. Dramatically cheaper to deploy LLMs for lots of of use cases (pushes inference to client vs server)
  2. Private inference now possible (eliminates sending sensitive data via an API to third-party)
  3. No content filters (many users want raw model access vs sanitized corporate versions)

Each of the above alone would warrant a big deal. But together is quite revolutionary:

Claude and a multi-model future

To date, the only realistic LLM API provider has been OpenAI. That's starting to change as Claude and Google both announced new model APIs for developers (and Claude shipped!)

In 12 months I expect there will be at least 4 or 5 major LLM API providers offering choice of model (and reduced platform risk) for the first time.

Nat Friedman released last week which is a multi-model playground to compare LLM performance quickly in the browser.