Wednesday, April 1st, 2026
The ocean covers roughly 71% of the Earth's surface and generates about half of the planet's oxygen. It regulates our climate, feeds billions of people, and supports a web of biodiversity we are still only beginning to catalogue. You would think, given its importance, that we would have a firm computational grip on how marine ecosystems work.
We don't. Not even close.
Modelling the ocean (and specifically the living systems within it) remains one of the hardest problems in environmental science. Harder, in many respects, than modelling the atmosphere. Understanding why that is the case matters for anyone building decision-support tools in this space, and for anyone relying on their outputs.
Weather and climate modelling have a significant head start. Numerical weather prediction dates back to the mid-twentieth century, and decades of sustained investment, driven largely by aviation, agriculture, and defence, produced dense observational networks and well-characterised physics. The atmosphere is, relatively speaking, a tractable fluid. Its major dynamics are governed by equations we understand well: the Navier-Stokes equations, thermodynamic relations, radiative transfer. The models are far from perfect, but the underlying physics is constrained.
Ocean physics, by contrast, operates across a much wider range of spatial and temporal scales. Turbulent eddies tens of kilometres across sit alongside basin-wide thermohaline circulation patterns that unfold over centuries. Resolving all of these processes simultaneously in a single model remains computationally prohibitive. Most ocean general circulation models still rely on parameterisations (mathematical approximations) for sub-grid-scale processes, and those approximations introduce uncertainty at every step.
But physics is the easy part. The real difficulty begins when you add biology.
Atmospheric models deal primarily with gases and energy. Marine ecosystem models must deal with organisms, creatures that grow, reproduce, migrate, compete, adapt, and die in ways that are profoundly context-dependent. A phytoplankton bloom in the North Atlantic behaves differently from one in the Southern Ocean, even when the physics looks similar, because the community composition is different, the grazing pressure is different, and the nutrient stoichiometry is different.
These biological interactions are non-linear. Small changes in one variable (say, a slight shift in water temperature or a modest change in nutrient supply) can trigger disproportionately large responses in population dynamics. Trophic cascades, regime shifts, and sudden collapses are not edge cases in marine ecology; they are recurring features. Traditional linear modelling approaches struggle to capture this behaviour, and even sophisticated non-linear models require parameterisations that carry their own assumptions.
Then there is the problem of functional diversity. A typical marine biogeochemical model might represent phytoplankton as two or three functional types: perhaps diatoms, small phytoplankton, and diazotrophs. In reality, a single litre of seawater can contain thousands of distinct microbial taxa. Each of those taxa responds differently to light, temperature, and nutrient availability. Aggregating them into a handful of boxes is a necessary simplification, but it is a simplification that obscures real dynamics.
The observational data problem in oceanography is stark. The Argo programme (the backbone of global ocean observation) maintains roughly 4,000 autonomous floats across the world's oceans. That sounds like a lot until you consider that each float covers, on average, a patch of ocean about 300 kilometres across. Large swathes of the deep ocean, the polar regions, and coastal margins remain chronically under-sampled.
For biological variables, the situation is considerably worse. We have reasonable satellite coverage of surface chlorophyll, but almost nothing in the way of continuous, large-scale observations of zooplankton abundance, microbial community composition, or nutrient cycling at depth. Most biological oceanographic data comes from ship-based surveys — expensive, infrequent, and spatially limited.
The consequence is that marine ecosystem models are often being calibrated and validated against datasets that are sparse, patchy, and biased towards accessible regions and seasons. Model-data mismatch is not just a technical inconvenience; it is a fundamental constraint on what we can credibly claim to know.
It is difficult to model a system when you do not know all the components. The Census of Marine Life, a decade-long international effort completed in 2010, catalogued around 250,000 known marine species and estimated that at least 750,000 more remain undescribed. In the deep sea and in microbial communities, the proportion of unknown species is almost certainly much higher.
This matters for modelling because unknown species are not merely gaps in a catalogue. They are missing functional roles; organisms that may be fixing nitrogen, recycling carbon, controlling prey populations, or engineering habitats in ways we have not yet accounted for. Every model we build is, to some degree, a model of our ignorance as much as our knowledge.
Given all of these difficulties, the temptation might be to throw up our hands. That would be a serious mistake, for two reasons.
First, imperfect models are still useful. A model that captures the broad dynamics of nutrient cycling in a coastal zone, even if it cannot predict the exact timing of a spring bloom, can still inform fisheries management, pollution control, and marine spatial planning. The goal is not omniscience; it is structured reasoning about complex systems under uncertainty. Models make our assumptions explicit and testable, which is more than can be said for intuition alone.
Second, the tools are improving. Advances in computational capacity allow higher-resolution simulations. New observational platforms, biogeochemical Argo floats, environmental DNA sampling and autonomous underwater vehicles, are beginning to fill some of the data gaps. Machine learning techniques, applied carefully, can help identify patterns in large, noisy datasets and improve parameterisations. Projects like EcoTwin are working to build digital twin frameworks that integrate these advances into practical tools for ecosystem assessment.
None of this will make the fundamental complexity go away. The ocean will remain a difficult system to model, because it is a difficult system — period. Non-linearity, sparse data, and incomplete taxonomy are not bugs to be fixed; they are features of the problem that any serious modelling effort must confront honestly.
What this means in practice
For researchers and professionals working in this space, the implications are worth stating plainly. Every marine ecosystem model comes with assumptions, and those assumptions matter. Outputs should be treated as one line of evidence among several, not as forecasts with the reliability of a weather prediction. Uncertainty quantification is not a nice-to-have; it is essential. And when models disagree (as they frequently do) the disagreement itself is informative.
The complexity of the ocean is not a reason to stop modelling. It is a reason to model carefully, to be transparent about what we know and what we don't, and to keep investing in the observations that make better models possible.
The ocean will not wait for us to finish understanding it. The pressures on marine ecosystems (warming, acidification, overfishing, pollution) are accelerating. We model because we must, and we keep trying because the alternative (managing these systems blind) is far worse.