Image Generation Just Changed (Stable Diffusion First Impressions)

Stability AI released Stable Diffusion last week and it's worth talking about. Not because image generation is new, DALL-E 2 has been in limited access since April, but because this one is open source. Anyone can run it. Anyone can modify it. That changes the dynamics entirely.

I've spent the last few days running Stable Diffusion locally and testing its capabilities. The results are uneven but occasionally stunning. Type a description, get an image. Sometimes it's exactly what you described. Sometimes it's surreal in ways that are either creative or broken, depending on your perspective.

The technical achievement is genuine. A model that generates coherent images from text descriptions, running on consumer hardware, available for free. A year ago, this capability existed only in research labs. Now it's on GitHub.

What It's Good At

Landscapes, abstract compositions, and stylised illustrations work surprisingly well. The model has clearly absorbed a vast amount of visual art and can remix it in interesting ways. If you need a concept image, a mood board reference, or a creative starting point, it's genuinely useful.

seconds to generate an image on a consumer GPU, compared to minutes or hours for previous open-source approaches

Source: Stability AI launch benchmarks, August 2022

It's also good at things that are hard to articulate. "A watercolour of a misty New Zealand harbour at dawn" produces something that looks like a painting, not a filtered photograph. The model understands style in a way that feels qualitatively different from previous image tools.

What It's Not Good At

Hands. Faces at certain angles. Text within images. Anything requiring precision or accuracy. The model generates plausible images, not accurate ones. For creative and conceptual use, that's fine. For anything that needs to be correct, it's unreliable.

It also has significant ethical dimensions that aren't resolved. The training data includes copyrighted artwork. The model can generate images "in the style of" specific artists without their consent or compensation. The legal and ethical frameworks haven't caught up with the technology.

The technical capability is ahead of the ethical and legal frameworks. That's not unusual for emerging technology, but the gap here is particularly wide because the training data is other people's creative work.

Mak Khan

Chief AI Officer

Enterprise Relevance

Honest assessment: limited, for now.

The use cases where image generation could add enterprise value, marketing content, product visualisation, design prototyping, all require a level of control and consistency that the technology doesn't offer yet. You can't reliably generate "our product in a kitchen setting" or "a professional headshot that matches our brand guidelines."

Where I see potential is in the creative workflow. Not replacing designers, but giving them faster iteration on concepts. A designer who can generate twenty visual starting points in ten minutes and then refine the best one has a different workflow than one who starts from scratch each time.

But that's a tooling improvement, not a transformation. The enterprise impact of image generation in its current form is modest.

What's Interesting About Open Source

The open-source nature is what makes this worth watching. DALL-E 2 is controlled by OpenAI. Midjourney is a commercial service. Stable Diffusion is available for anyone to use, modify, and build on.

That means the pace of improvement will be fast. The community is already producing fine-tuned models, specialised for specific styles and use cases. The base model will improve through collective effort in ways that a single company's product can't match.

It also means the technology will find applications that the original creators didn't anticipate. Some of those will be valuable. Some will be problematic. Open-source AI is a new category and we don't have clear precedents for how it plays out.

Where I Stand

Image generation is a genuine technical breakthrough. Stable Diffusion's open-source release accelerates the timeline for widespread adoption. The enterprise applications are limited today but will grow as the technology improves.

For now, it's worth experimenting with, understanding, and keeping an eye on. It's not worth building a business case around.

The pace of progress in generative AI, text and now images, is accelerating. The NLP improvements we noted earlier this year are part of the same trend. Something is shifting in what AI can do with creative and generative tasks. Exactly where this goes is genuinely unclear.