The default assumption in enterprise AI is cloud-first. Send data to the cloud, process it with a large model, return the result. For most use cases, this is fine. For a growing number, it is not. When latency matters, connectivity is unreliable, or data cannot leave the premises, edge AI is not a nice-to-have. It is a requirement.
What Edge AI Means
Edge AI means running AI models on devices or servers at the point of use rather than in a centralised cloud. The "edge" can be a factory floor server, a retail store device, a mobile phone, a vehicle, or any location where data is generated and decisions need to happen locally.
The distinction from cloud AI is practical, not philosophical:
- Latency. Cloud AI involves a round trip: send data, wait for processing, receive results. Edge AI processes locally. For real-time applications (quality inspection on a production line, safety monitoring, interactive interfaces), the round-trip latency of cloud AI is too slow.
- Connectivity. Cloud AI requires reliable internet. Edge AI works offline. For remote sites, mobile applications, and environments with intermittent connectivity, edge AI is the only option.
- Data sovereignty. Cloud AI sends data to external servers. Edge AI keeps data on premises. For sensitive data (medical records, classified information, proprietary manufacturing processes), edge processing may be a compliance requirement.
- Cost at scale. Cloud AI costs scale linearly with usage. Edge AI costs are primarily upfront (hardware) with lower ongoing costs. At high volumes, edge can be significantly cheaper per inference.
75%
of enterprise data will be created and processed at the edge by 2028, up from 10% in 2021
Source: Gartner, Edge Computing Predictions, 2025
Where Edge AI Makes Sense
Manufacturing Quality Inspection
Visual inspection on production lines requires real-time inference. A product passes a camera every 200 milliseconds. The AI model must classify it as pass or fail before the next product arrives. Cloud inference, even at 500ms round trip, is too slow and too dependent on network reliability.
Edge-deployed vision models running on local GPUs handle this reliably. The models are smaller than cloud-based alternatives but purpose-trained for the specific inspection task.
Retail Operations
Inventory counting, shelf compliance checking, and customer flow analysis in retail stores. These applications involve continuous video processing that would be prohibitively expensive to stream to the cloud and unnecessarily risky from a privacy perspective.
Edge processing keeps the video data local, processes it on-site, and sends only the analytical results (counts, compliance scores, flow patterns) to the cloud. Privacy is maintained. Bandwidth costs are minimal.
Field Operations
Agriculture, construction, mining, forestry. Operations in locations with limited connectivity need AI that works offline. Crop health assessment from drone imagery. Equipment condition monitoring. Safety compliance checking. These applications cannot depend on cloud connectivity.
Healthcare at the Point of Care
Diagnostic support at the point of care, particularly in remote or resource-constrained settings. Medical imaging analysis that runs on the imaging device itself. Patient monitoring that processes data locally and only alerts when something needs attention.
For NZ and Pacific contexts specifically, edge AI for healthcare is relevant for rural communities, remote islands, and situations where connectivity is intermittent.
The Technical Landscape
Model Optimisation
Large language models and large vision models do not run on edge hardware without significant optimisation. The techniques:
Quantisation. Reducing model precision from 32-bit to 8-bit or 4-bit. This reduces model size by 4-8x with modest quality reduction. For many enterprise tasks, the quality tradeoff is acceptable.
Distillation. Training a smaller "student" model to replicate the behaviour of a larger "teacher" model. The student model is purpose-built for specific tasks and runs efficiently on edge hardware.
Pruning. Removing unnecessary model parameters. Many large models have redundant capacity that can be removed without meaningful performance impact for specific tasks.
Mak has been running benchmarks on quantised models for enterprise use cases. The results are encouraging: for classification, extraction, and structured generation tasks, 4-bit quantised models running on consumer-grade GPUs achieve 90-95% of the quality of full cloud models at a fraction of the cost and latency.
Hardware Options
The edge AI hardware landscape is maturing:
- NVIDIA Jetson. Purpose-built for edge AI. Good performance, reasonable cost, strong software ecosystem.
- Apple Silicon. M-series chips run small to medium AI models efficiently. Relevant for Mac-based enterprise deployments.
- Intel NPUs. Built into recent Intel processors. Limited capability but zero additional hardware cost.
- Commodity GPUs. Consumer GPUs (RTX series) provide substantial AI processing capability at consumer prices.
For most enterprise edge deployments, a commodity GPU in a standard server is sufficient. Purpose-built edge hardware is needed only for demanding or space-constrained applications.
The Hybrid Pattern
Most enterprises will not go fully edge or fully cloud. The pragmatic approach is hybrid: edge for latency-sensitive, offline-critical, or data-sovereign tasks; cloud for complex reasoning, large-context tasks, and capabilities that benefit from the latest models.
The question is never "edge or cloud?" It is "which tasks belong where?" The answer varies by use case, and it changes as edge hardware improves.
John Li
Chief Technology Officer
The hybrid pattern requires an orchestration layer that routes tasks to the appropriate processing location. This adds architectural complexity but provides the flexibility to optimise for cost, latency, and sovereignty simultaneously.
What Does Not Work at the Edge
Not every AI task is a good candidate for edge deployment:
Complex reasoning. Tasks that require large language models with full capability (nuanced analysis, creative generation, multi-step reasoning) still need cloud-scale models. Edge models trade capability for efficiency.
Large context windows. Tasks that require processing large documents or maintaining extensive conversation history need memory that edge hardware typically does not provide.
Rapid model iteration. If you need to update models frequently (weekly or more), edge deployment creates a management burden. Each device needs updating, testing, and validation. Cloud deployments update centrally.
Low-volume tasks. If the task happens a few times a day, cloud processing is simpler and the cost is negligible. Edge AI makes economic sense at volume.
Getting Started
For enterprises considering edge AI:
- Identify latency-critical or offline-critical tasks. These are your strongest edge AI candidates. Everything else can start in the cloud.
- Benchmark edge models against cloud models for your specific tasks. The quality gap varies by task type. For some tasks, it is negligible. For others, it is significant.
- Start with a hybrid architecture. Run edge where it clearly benefits, cloud everywhere else. Avoid the temptation to move everything to the edge at once.
- Plan for model management. Edge AI requires a deployment pipeline for pushing model updates to devices, monitoring performance across devices, and rolling back when updates cause issues.
Edge AI is not a trend. It is a pragmatic response to real constraints: latency, connectivity, sovereignty, and cost. The enterprises that understand where edge AI fits, and where it does not, will build more resilient, more capable AI infrastructure than those that default to cloud-only.

