Why Every AI Team Needs Someone Who Built a Reporting Framework by Hand

We've hired ML engineers. We've hired data scientists. We've hired prompt engineers. But the single hire that changed how we deliver AI projects was someone who'd spent years building monitoring and evaluation frameworks from spreadsheets, field surveys, and sheer determination. That person is Louise. And the skills she brought aren't the ones you'd find on a typical AI job description.

The Argument

M&E professionals know how to define success before building anything. They've spent careers designing measurement frameworks for programmes where the goals are complex, the data is messy, and stakeholders disagree on what "good" looks like. This is why AI projects fail without this discipline.
They know how to collect reliable data from difficult environments. Not clean API responses from well-structured databases, but field data from community health workers with intermittent connectivity and competing priorities. Data quality is always the real bottleneck.
They know how to present findings to people who don't care about methodology. Ministers, board members, community leaders - people who need the answer, not the process.
These are exactly the skills AI teams lack and can't easily train for.

The Skillset Nobody's Hiring For

AI job listings are predictable. Python, PyTorch, transformer architectures, experience with large language models. The technical bar is clear. And it's necessary - you need people who can build the systems.

But here's the problem we kept running into: we could build AI systems that worked technically, and then struggle to answer basic questions from clients.

"How will we know if this is working?"

"What does success look like in six months?"

"How do you measure whether the AI is actually better than what we had before?"

These aren't technical questions. They're evaluation questions. And most AI teams don't have anyone whose job it is to answer them.

M&E professionals answer these questions for a living. They design evaluation frameworks for government programmes, international development projects, health interventions, and education initiatives. They figure out what to measure, how to measure it, and how to interpret the results for decision-makers who want clarity, not caveats.

Louise's Background

Louise built the national Monitoring & Evaluation framework for the National Health Service in Samoa. Think about what that means for a moment.

She had to design a system that tracked health outcomes across an entire country. Not a company's customer base. A country. She had to collect data from clinics with unreliable power. From community health workers who were also parents, farmers, and church leaders. From ministry officials who needed quarterly reports for donor agencies.

The data wasn't sitting in a database waiting to be queried. It was handwritten in registers, stored in folders in rural clinics, sometimes recorded weeks after the fact. Turning that into a functioning national reporting framework required skills that have nothing to do with technology and everything to do with understanding people, processes, and incentive structures.

When I built the M&E framework in Samoa, the hardest part wasn't designing the indicators or the reporting templates. It was getting buy-in from the people who had to collect the data. If the community health worker doesn't see why they're filling in the form, the form doesn't get filled in. The same thing happens with AI. If the user doesn't understand why the system is asking for their input, they'll find a workaround.

Louise Epa

AI Analyst & Research Consultant

Why This Matters for AI

Three specific areas where M&E skills change how AI projects are delivered.

Defining success upfront. AI projects have a habit of starting with "let's see what the model can do" and then trying to retrofit success criteria after the fact. M&E professionals won't let you do that. They insist on defining what you're measuring and why before the first line of code is written. They've seen too many programmes fail because nobody agreed on what success meant until the final evaluation, when it was too late to change course.

This discipline changes everything about AI delivery. When you know what you're measuring from day one, you build differently. You instrument the system for evaluation. You collect baseline data. You design the data pipeline to capture the metrics that matter, not just the ones that are easy to track.

Data quality from messy sources. Enterprise AI projects almost never have clean data. They have CRM records with inconsistent formatting. Legacy databases with undocumented field meanings. Spreadsheets maintained by people who've left the organisation. PDF reports that need to be parsed.

An ML engineer looks at this and sees a data cleaning problem. An M&E professional looks at it and sees a data collection problem. They ask: why is the data messy? Who's entering it? What are their incentives? What would need to change for the data to be reliable at source?

That upstream thinking, fixing the collection process rather than just cleaning the output, prevents the same data quality problems from recurring every quarter.

Communicating findings to non-technical stakeholders. M&E professionals write reports for ministers, donors, board members, and community leaders. They've learned to present complex findings in plain language, to lead with the "so what?" rather than the methodology, and to be honest about limitations without drowning the audience in uncertainty.

AI teams routinely fail at this. They present model accuracy metrics to executives who don't know what F1 scores mean. They lead with technical achievements rather than business impact. They either overstate confidence or bury useful findings in so many caveats that the audience tunes out.

The Compound Effect

When Louise joined RIVER, the effect wasn't just additive. It was multiplicative. Her evaluation thinking changed how we scoped projects. Her stakeholder communication skills changed how we presented to clients. Her data quality instincts changed how we designed data pipelines.

And her background meant she asked questions that the rest of us hadn't thought to ask. "Have we talked to the people who'll be entering data into this system?" "What happens to the evaluation when the client's team changes?" "Are we measuring what matters, or what's measurable?"

These questions sound simple. They're not. They come from years of watching programmes succeed or fail based on whether anyone thought to ask them.

The Hiring Implication

If you're building an AI team, look at your roster. You probably have technical depth covered. You might have product management covered. But do you have someone who's spent years defining, collecting, and reporting on outcomes in complex, messy, human environments?

The skills they'd need to learn - prompt engineering, model integration, basic ML concepts - can be picked up in months. The skills they already have - evaluation design, data collection from difficult environments, stakeholder communication, the patience to define success before building - take years of field experience to develop.

You can't shortcut that with a bootcamp. You can't replace it with a framework. You need someone who's built a reporting system from paper registers in a rural clinic and knows exactly how far the gap is between "data collected" and "data that means something."

If you're building an AI team and want to talk about what the right mix looks like, get in touch.