Skip to main content

What ChatGPT Means for Enterprise Software

Two weeks with ChatGPT. A more analytical look at what large language models mean for the kind of work we do. More questions than answers.
15 December 2022·8 min read
Mak Khan
Mak Khan
Chief AI Officer
Isaac Rolfe
Isaac Rolfe
Managing Director
Two weeks ago I wrote about my first reaction to ChatGPT. That piece was personal and raw. This one is an attempt to be more systematic. Mak and I have been testing ChatGPT against actual work tasks, documenting what it does well and where it falls down. We have observations. We don't have conclusions.

What You Need to Know

  • ChatGPT is genuinely useful for first-draft content, code scaffolding, data transformation scripts, and explaining technical concepts to non-technical audiences
  • It is unreliable for anything requiring accuracy, current information, or domain-specific precision. The "confidently wrong" problem is real and significant
  • For enterprise software, the implications are unclear but potentially substantial. Document processing, code assistance, and customer communication are the most obvious near-term applications
  • We have more questions than answers. This piece is honest about the uncertainty

What We Tested

We spent two weeks using ChatGPT for actual work tasks, not contrived demonstrations. Real projects, real deliverables, real time pressure. Here's what we found.

Content drafting

We used it to draft project scopes, client communications, technical documentation, and internal reports. Results: consistently useful as a starting point. The drafts needed editing, sometimes significant editing, but the time from blank page to working draft reduced by roughly 40-60%.
The quality varied by task. Straightforward communications were almost ready to send. Technical documentation needed heavy revision because it would confidently include plausible but incorrect details. Project scopes were structurally good but missed the nuances that come from actually knowing the client.

Code assistance

We tested it against coding tasks ranging from simple utility functions to complex integration logic. Small, well-defined functions: excellent. It produced working code faster than writing it from scratch. Data transformation scripts: good, with occasional errors in edge case handling. Complex business logic: poor. It generated code that looked correct but made assumptions about the domain that were wrong.
1M
users within five days of launch, making ChatGPT the fastest-growing consumer application in history at that point
Source: OpenAI, Reuters, December 2022

Data analysis

We gave it data sets and asked for analysis. It was surprisingly good at identifying patterns and generating summary statistics. It was unreliable at interpreting those patterns in a business context. The analysis was technically correct but the conclusions were generic. A human analyst would have asked follow-up questions. ChatGPT gave answers.

Technical explanation

This was the strongest use case. Explaining complex technical concepts to non-technical stakeholders is something we do constantly. ChatGPT produced clear, accurate explanations at the right level of detail, faster than we could write them. We've already used several of these in actual client communications (after review).

The Accuracy Problem

The single biggest limitation for enterprise use is accuracy. ChatGPT doesn't know what it doesn't know. It generates plausible text that may or may not be correct, and it presents both with equal confidence.
I asked it to explain the architecture of a specific NZ government system we've worked with. That's the problem in one example.
Mak Khan
Chief AI Officer
For enterprise applications, this is a fundamental issue. Business decisions based on incorrect information are expensive. Customer communications that contain errors damage trust. Code that looks right but handles edge cases incorrectly creates production bugs.
The current model requires a knowledgeable human reviewing everything the AI produces. That limits the efficiency gain. You save time on generation but spend time on verification. For some tasks, the net benefit is still positive. For others, it's faster to just do it yourself.

What This Might Mean for Enterprise Software

We've been discussing this internally and with a few trusted clients. Here's our current thinking, with the caveat that "current thinking" in December 2022 has a high probability of being wrong.

Document processing

Enterprises process vast amounts of text: contracts, applications, reports, correspondence. Much of this processing is currently manual: read, extract key information, make a decision, file. Large language models could potentially automate the extraction and summarisation steps, routing only exceptions and complex cases to humans.
This is the application we're most excited about. Not because the technology is ready today, but because the need is real and the capability gap is closing.

Code assistance

Developers using AI-assisted coding tools will be faster than those who don't. GitHub Copilot is already demonstrating this. ChatGPT adds a conversational dimension: you can describe what you want in natural language and get code that's often close to correct.
This won't replace developers. The hard parts of software development, understanding requirements, designing architecture, handling edge cases, debugging production issues, aren't the parts that AI assists with. But the mechanical parts, writing boilerplate, implementing well-known patterns, translating between formats, could get significantly faster.

Customer communication

AI-assisted drafting for customer communications, support responses, and documentation could reduce time-to-response and improve consistency. With human review before sending. Without review, the accuracy problem makes this dangerous.

The Questions We're Sitting With

These are genuinely open questions. We don't have positions on them yet.
How fast will this improve? The jump from GPT-3 to ChatGPT was substantial. If the next iteration is a similar jump, the limitations we're noting today might not exist in a year. Or improvement might plateau. We don't know.
What does the cost model look like? Running these models is expensive. OpenAI is presumably subsidising ChatGPT currently. Enterprise-grade AI tools will need a pricing model that makes economic sense for the use cases they enable. That's not straightforward.
What about data privacy? Using a third-party AI model means sending your data to that model. For enterprises with data governance requirements, particularly in regulated industries, this is a non-trivial concern. Where does the data go? How is it stored? Can it influence the model's outputs for other users?
Does this change what we build? If AI can handle document processing, do we build different kinds of systems? If code assistance accelerates development, do project timelines compress? If customer communication gets easier, do we invest differently in support tools?
I keep coming back to the same thought: we've spent a decade building enterprise systems that process information through structured workflows. That's a fundamentally different kind of system.
Isaac Rolfe
Managing Director

Where We Go from Here

We're going to keep experimenting. Not committing to anything, not building products around this yet, just developing a practical understanding of what the technology can and can't do. The hype will be enormous over the next six months. The useful signal will be in the specifics: which tasks, which domains, which conditions produce reliable results.
We'll share what we learn. Right now, we're trying to stay curious without being credulous, and cautious without being dismissive. That balance is harder than it sounds.