Three months ago, a company called Stability AI released Stable Diffusion. Open source. Free to download. Runs on a decent graphics card. You type a description of an image and it creates one. Not a fuzzy approximation. A detailed, sometimes photorealistic image. I've been using it since August and I'm still working through what this means for the way we make things.
What Happened
Let me set the timeline, because it matters.
In January 2021, OpenAI published a paper about DALL-E, a system that generated images from text descriptions. It was impressive but not publicly available. A research curiosity.
In April 2022, OpenAI released DALL-E 2. Better quality, still limited access. You could join a waitlist. The images were good - sometimes striking - but the access constraints meant it was a conversation piece, not a tool.
In July 2022, Midjourney launched a public beta. Image generation via Discord. The quality was remarkable, especially for artistic and stylised images. People started using it for actual creative work.
Then in August 2022, Stability AI released Stable Diffusion. Open source. No waitlist. No API costs. Download the model, run it on your own hardware. The quality was comparable to DALL-E 2 and Midjourney, and the open source nature meant anyone could modify it, fine-tune it, or build on top of it.
That last part is what changed everything.
Why Open Source Matters Here
When DALL-E 2 was behind a waitlist, it was interesting. When Midjourney was on Discord, it was fun. When Stable Diffusion was open source and running locally, it became infrastructure.
Within weeks, people had:
- Built web interfaces for it
- Fine-tuned it on specific artistic styles
- Integrated it into image editing tools
- Created plugins for Photoshop and Blender
- Trained it on their own faces (this raised immediate ethical questions)
- Generated thousands of variations to find the perfect one
The barrier went from "apply and wait" to "download and run." And the community moved faster than any company could.
Image Generation Accessibility Timeline, 2021-2022
Source: Public release dates, 2021-2022
33,000+
GitHub stars for Stable Diffusion within two months of release
Source: GitHub, October 2022
What It Can Do
I've been testing Stable Diffusion for three months now. Here's an honest assessment.
What it does well:
- Concept art and mood boards. You can generate dozens of visual directions in minutes. For early-stage creative exploration, this is genuinely transformative.
- Stylised illustrations. Abstract, painterly, comic-style - the non-photorealistic output is often remarkable.
- Background and texture generation. Seamless patterns, environmental backdrops, abstract textures.
- Visual prototyping. "Show me a mobile app screen for a fitness tracker, clean minimalist style" produces surprisingly useful starting points.
What it struggles with:
- Hands. It cannot draw hands. Five fingers per hand seems beyond its grasp (sorry). This sounds trivial but it limits photorealistic use cases significantly.
- Text in images. Any text it generates is garbled nonsense. Logos, signage, labels - all wrong.
- Consistency. Generating multiple images of the same character or scene in a consistent style is extremely difficult. Each generation is essentially independent.
- Accuracy. It generates plausible images, not accurate images. If you need something specific, you'll spend a lot of time prompting and filtering.
What It Means for Design
This is the question our team has been discussing since August. Here's where we've landed, knowing that our position will probably evolve.
Concept exploration just got 10x faster. Instead of spending a day creating three mood board directions, you can generate fifty in an hour and curate the best. The creative exploration phase is genuinely accelerated.
Stock photography is under threat. Why pay for a generic office photo when you can generate one? The economics of stock imagery change fundamentally when generation is free. This won't happen overnight, but the direction is clear.
Original creative work is more valuable, not less. When anyone can generate a passable image, the value shifts to curation, direction, and original creative vision. The person who knows what to ask for and how to refine it is more valuable than the person who can execute a straightforward brief.
The ethical questions are enormous. Stable Diffusion was trained on billions of images scraped from the internet. Many of those images were created by artists who didn't consent to their work being used as training data. The legal and ethical frameworks haven't caught up. Some artists are furious, and they have a point.
The last time something this accessible moved this fast was the early web. As makers, we need to get our hands on it, not just watch from the sidelines.
Rainui Teihotua
Chief Creative Officer
What It Means for Enterprise
For enterprise specifically, the near-term applications are limited but worth watching.
Marketing and content. Generating images for internal presentations, training materials, and concept documents. Low stakes, high volume - a natural fit.
Product visualisation. Generating product mockups and environmental renders faster than traditional 3D workflows. Useful for rapid prototyping, not for final assets.
Training data generation. This one is speculative but interesting. If you need training data for a computer vision model, generating synthetic images might be cheaper and faster than collecting real ones. The quality needs to be good enough for the model to learn from, which depends heavily on the domain.
What it's not ready for: anything customer-facing where accuracy matters. The hallucination problem - generating plausible but incorrect images - mirrors the hallucination problem in text-based AI. In enterprise, "plausible but wrong" is not acceptable.
What We're Doing
We're experimenting. Everyone on the team has access and we're exploring how it fits into our existing workflows. I've been testing it for design exploration, concept work, and mood boards. John is looking at whether it's useful for documentation illustrations. We're all trying to understand where this technology will be in twelve to eighteen months.
We're not building products on it yet. The technology is moving too fast for production commitments. What's state of the art in November will be outdated by March. But we need to understand it deeply so that when the right application shows up, we're ready to use it well.
This feels like one of those moments. Not a gimmick. Not a fad. Something that's going to change how visual content gets made. As someone who makes things for a living, I want to understand it properly before it understands me.
