Offline-first is not a feature: building mobile software that survives the field
Insights · Engineering · 8 min read ·
A logistics dispatcher in Sulawesi sends a driver to an inland depot at the edge of a 4G cell. The driver pulls in, walks 12 metres into the warehouse, and the signal drops. The app freezes mid-form. Forty minutes later the driver leaves, the form is gone, the dispatch log shows the consignment as undelivered.
This is a routine event in field operations across Indonesia, Vietnam, the Philippines, and large parts of mainland Southeast Asia. It happens at oil and gas sites in East Kalimantan, at mining sites in Sumbawa, at agricultural cooperatives 90 kilometres from the nearest tower, and at construction sites three floors underground. Connectivity is intermittent, latency is high, and the network drops mid-transaction more often than it gracefully degrades.
Enterprise mobile apps built for the office and patched for the field do not survive this environment. They were designed to assume a network connection that exists and stays existing. When the assumption breaks, the system breaks - silently, partially, and in ways that surface as data quality problems three months later.
Offline-first is not a feature you add to such a system. It is an architectural decision you make at design time, or you don't.
What "offline-first" actually means
The term gets used loosely to mean "the app caches some data so you can read it without a connection." That is read-side offline. It is the easy half. Most enterprise apps already do this in some form.
Real offline-first means the app can do meaningful write work without a connection - create records, update them, queue actions, capture signatures, take photos, process inventory movements - and reconcile all of it correctly when a connection returns. This is the hard half, and it forces four architectural decisions that most "office apps with offline mode" never make.
1. Identity generation has to be client-side.
The classic pattern - server generates the primary key, client receives it - assumes synchronous network access. In offline-first, the client has to generate IDs that will not collide when synced. UUIDs (specifically UUIDv7 for time-ordering, or ULIDs) work. Sequential numeric IDs do not. This sounds trivial; we have seen 8-month migrations triggered by getting it wrong.
2. State has to be a log, not a snapshot.
When two clients edit the same record offline and reconnect, you cannot just merge their snapshots - the order of operations matters. The data model has to capture intent as a sequence of operations ("driver A signed at 14:32", "driver A added note at 14:34", "driver A updated status at 14:36") that can be replayed against the server state and other clients' operations.
The mature pattern here is event sourcing - storing every state change as an immutable event, and computing current state by replaying events. For domains with concurrent edits (think dispatch boards, shared inventory), this is not optional. For domains with single-user edits (a driver completing their own form), a simpler "operation queue with timestamps" pattern works.
3. Conflict resolution has to be defined by the domain.
If two warehouse staff scan the same item out of inventory offline, you have a conflict. The right resolution depends on the domain:
- Inventory deductions. Apply both, accept the resulting negative if it occurs, flag for reconciliation. "Last write wins" is the wrong answer here - it loses one of the deductions.
- Customer signatures. Most recent timestamp wins, but the prior signature is archived. You may need both for dispute resolution.
- Status updates. Workflow-aware - "delivered" overrides "in transit", but "returned" overrides both regardless of timestamp.
- Free-text notes. Append both with attribution. Never merge or pick one.
A system that uses one conflict resolution rule for all data is wrong. The resolution logic has to be modeled per entity type, and changing it later requires reprocessing history.
4. Sync has to assume hostile networks.
Field networks do not just go down - they degrade in specific ways the app has to handle:
- Partial connectivity. TCP connection establishes but stalls partway. The app must time out, retry with backoff, and not corrupt local state on retry.
- Resumable uploads. A 2 MB photo upload that drops at 80% complete must resume from 80%, not restart. Most file upload libraries do not do this by default.
- Asymmetric paths. The app can reach the API but cannot reach the file storage CDN. Sync logic has to handle each dependency independently.
- Stale token recovery. A field device might be offline for two weeks. Its auth token expired during that time. The sync handshake has to refresh tokens without losing queued work.
What gets harder when you decide late
The cost curve here is brutal. The four decisions above are inexpensive at the start of a project. They are expensive in the middle. They are catastrophic at the end.
A specific example: a client retrofitted offline support into a year-old field app. Three things broke:
- The existing data model used server-generated IDs. Every foreign key reference had to be re-keyed during the migration, which broke a year of audit logs and required regulator approval to discard.
- The existing API used PATCH operations on snapshots. The mobile app's queued operations had to be replayed as PATCHes against a state that had moved on - which produced silent data loss when two operations contradicted each other.
- The existing photo upload system used a single multipart POST. The retrofit had to introduce a separate resumable upload service, with its own auth, its own access control, and its own monitoring.
Total cost of the retrofit was approximately equivalent to rebuilding the field component from scratch. The team had to do both - run the retrofit and the rebuild in parallel - because the field operations could not pause.
The decision to build offline-first costs perhaps 20% more in week one of a project. The decision to add it in month twelve costs 100% to 200% more, and creates a window of mixed-mode operation that is the worst of both worlds.
When offline-first is genuinely not needed
To be honest about the trade-off: most enterprise apps do not need this. An office productivity tool used by people sitting at desks with fibre connections does not need event-sourcing and per-entity conflict resolution. The complexity is real, and pretending otherwise is the same mistake in the opposite direction.
The test is environmental, not industry-based:
- Are users frequently in places with no signal or unreliable signal?
- Is meaningful work being created or modified during those periods?
- Is the cost of losing or corrupting that work high enough to justify the architecture?
Yes to all three - you need offline-first. Otherwise, a simple read-cache and a "you are offline" banner are sufficient.
A short diagnostic
Pull your last mobile field app proposal or architecture document. Ask:
- Are primary keys generated client-side?
- Is there a per-entity conflict resolution scheme documented?
- Does the upload system support resumable transfers?
- Does the sync protocol survive a two-week offline gap?
Four yeses - you are looking at offline-first architecture. Three or fewer - you are looking at an office app with an offline patch. They look the same on a screenshot and behave very differently 90 kilometres from the nearest tower.
If you are scoping field-ops software and want a second pair of eyes on the architecture, book a 30-minute call.