
VO2 Max: Garmin, Apple, COROS, Suunto, Polar, Pixel, Amazfit or Samsung? (Scientific Review)
AI Summary
Rob, a post-doctoral scientist specializing in biological data analysis, is conducting a comprehensive, data-driven study to determine the accuracy of VO2 Max estimations across eight popular wearable brands. The study compares devices from Garmin, Coros, Suunto, Polar, Apple, Samsung, Google, and Amazfit against clinical reference standards. Unlike many casual reviews that rely on sporadic measurements, this research utilizes a systematic protocol involving multiple devices per person and repeated measurements over an extended period. Rob emphasizes that the current findings are preliminary, representing the start of a nine-month project that will eventually include at least twenty participants. The goal is to move beyond the typical method of showing a single number from a single day and instead understand how these wearables perform in terms of both absolute accuracy and long-term consistency.
To provide context, the transcript defines VO2 Max as a metric indicating how much oxygen the body can utilize during maximum effort, typically expressed in milliliters of oxygen per kilogram of body weight per minute. In a laboratory setting, this is measured through respiratory gas analysis while the subject undergoes a progressively difficult exercise test. Because wearables cannot measure gas exchange directly, they must infer VO2 Max using a combination of heart rate responses, pace, power proxies, GPS data, and proprietary algorithms. The study aims to reveal how closely these inferences align with reality.
The methodology is designed to be as systematic as possible to reduce "noise" and ensure fair comparisons. Each participant undergoes two lab-based reference tests: one for running and one for cycling, conducted at the Sports Medicine Institute in Vienna. Although the study focuses primarily on running VO2 Max—as many watches do not estimate cycling values—the cycling reference helps identify an individual’s true physiological ceiling. The wearable testing is structured into three two-week blocks over a six-week period. During each block, participants wear a subset of two or three devices, aiming for at least six runs per device to allow for proper calibration. To ensure data quality, most watches are paired with an external ECG chest strap to eliminate heart rate sensor errors, focusing the test solely on the VO2 Max estimation algorithms. Only the Apple Watch, Pixel Watch, and Samsung Galaxy Watch use internal sensors, per brand recommendations or hardware limitations. Participants are also required to run on flat routes with stable GPS and maintain consistent training conditions.
The first participant, Raphael, showed very similar lab results for both running (52.8) and cycling (52.9). When testing the wearables, Samsung and Polar emerged as the most impressive performers for him, yielding estimates very close to his lab-confirmed values. Polar’s data remained robust even during a period when Raphael was slightly ill. In the second phase of testing, Google and Garmin provided stable results, though they were less accurate than the first group. Suunto was the most significant outlier for Raphael, underestimating his fitness with a value of approximately 44. In the final phase, Coros and the Apple Watch performed decently, ranking better than Garmin and Google but slightly behind Samsung and Polar.
The second participant, Stefan, provided a contrasting profile. His lab-tested running VO2 Max was 59, while his cycling value was significantly lower at 42. This discrepancy is attributed to his musculoskeletal system being better adapted to running than cycling. For Stefan, the results were more varied. Samsung and Polar again performed well, sitting close to his running reference. Garmin and Coros also showed high accuracy for him. However, the Apple Watch was a notable outlier in this instance, significantly overestimating his VO2 Max. Suunto and Amazfit—a brand added specifically for Stefan’s round—were further off the mark but did not struggle as much as the Apple Watch did in this specific case.
Comparing the two participants reveals a heterogeneous landscape. While Samsung and Polar showed relative consistency across both subjects, other brands performed differently depending on the individual. For example, Garmin was a top performer for Stefan but one of the weaker ones for Raphael. Suunto struggled with Raphael’s data but was more competitive with Stefan’s. These inconsistencies underscore why a large-scale, systematic study is necessary. The early data suggests that while some brands may appear more reliable than others, individual physiological differences can significantly impact a device's estimation.
The initial takeaway is that users should view wearable VO2 Max values primarily as "trend signals" rather than absolute truths. These numbers are useful for tracking changes in fitness over time but should be treated with caution when compared to clinical standards or when switching between different brands. As the study continues over the next nine months, more definitive patterns are expected to emerge. Rob encourages viewers to follow the ongoing research as the data set matures, promising more generalizable conclusions once the full cohort of twenty participants has completed the protocol. This systematic approach aims to eventually provide a definitive ranking of which devices are the most trustworthy for athletes.