Every month I turn my open browser tabs into a blog post.
SDVideo is good with clouds and reflections but is otherwise hit or miss
The Year of the Large Language Model
It has been one year since ChatGPT was released. Being in the AI space before then now qualifies as having been “early.” And things still seem to be speeding up.
Although multi-modal capabilities have existed in some fashion for some time (namely, that CLIP embeddings offer a shared latent space for text and images), crossing between modalities seamlessly with a single model has been challenging. No more.
The ChatGPT-4V lineage of models successfully interpret and produce images, which has almost immediately led to some very interesting use cases. Watch the video below from Twitter user @tazsingh to get the sense for what I mean.
The first link on deck is about the platform used to make the demo, tldraw.
What’s more, good video generation seems to be right around the corner. Stability AI released SDVideo this month, their model and weights for generating short 1-2s of video. The video at the top of the page of a oceanside mountain framed by clouds was first generated in Dalle3 and then animated with SDVideo. Of a number of samples I tried, this was the best one. It’s not quite there yet, but it’s getting close to being quite good.
Skeptics may argue that LLMs are neither new nor special, and that the hype around them is just that. But the bottom line is that computers are getting new features and abilities that were very out of reach even 4 years ago. There is something happening here.
Vite - Vite has been a refreshing “just does the thing” type of tool. It’s fast, versatile, and seems to now be the defacto starting point for new web projects. Remix is considering move to a Vite plugin. Astro and Sveltekit are already there.