Atrial fibrillation (AF) remains a “hidden” threat, often undiagnosed until complications arise, highlighting the need for predictive tools that can identify at-risk individuals while they are still in sinus rhythm (SR).1 In a recent study published in the European Heart Journal, Jabbour and colleagues developed an innovative deep learning-based approach that leverages ECG records to predict AF, years before its onset.2 This ECG-based AI tool was found to outperform both traditional clinical models, such as the CHARGE-AF score, and polygenic risk scores (AF-PGS), with an AUC-ROC of 0.76, compared to 0.62 and 0.59, respectively, showcasing the potential of artificial intelligence (AI) to reveal “hidden” indicators within ECG signals that evade conventional methods. Developed and validated internally and externally on datasets of over 400,000 ECG records, this ECG-AI tool has demonstrated its capacity to identify high-risk individuals who were subsequently found to develop AF with a more than 4-fold increased probability over follow-up periods extending up to 15 years.
While previous studies have explored AI-based ECG approaches for AF prediction,3–7 this work stands out for its innovation and methodological rigour, serving also as a paradigm for future efforts. The AI tool was initially developed on the Montreal Heart Institute Biobank, providing also demographic and genetic data, enabling robust benchmarking of ECG-AI against other clinical and genetic-based tools. Critically, it was further tested on the open MIMIC-IV cohort to establish external validity – an essential property of AI developments often lacking in practice.8 The authors took it one step further by enhancing the model’s credibility and included a transparent development process, with open access to model weights, structure, and training procedures, thus facilitating replicability from both clinical and data science perspectives.
While impressive, the findings of this and similar efforts should be interpreted within the broader context of the challenges inherent in novel AI approaches.9 AI models often carry a risk of overfitting, potentially identifying patterns tied to demographic or comorbidity-related factors rather than true AF-specific features. This can lead to strong performance in the development cohort, yet for the wrong reasons – a risk mitigated here by the external validation on a separate cohort to confirm performance. Ensuring model fairness is complex but important to verify since it ensures that the model performs equitably also across minorities. Achieving this requires the equal representation of diverse populations to account for deviating characteristics, and the careful evaluation of model accuracy within subpopulations. The authors have taken steps in this direction, examining model performance across subgroups defined by sex, age, and socio-economic status. However, fully assessing fairness remains challenging, particularly for individuals at the distribution extrema, where unique combinations may impact generalisability. On top of that, the validity of the model is ensured only for patient groups with characteristics similar to those of the development population. Explainability remains a concern. Saliency maps highlight regions like the P wave, yet case-specific differences and complex temporal patterns may limit interpretability. Moreover, such patterns may evolve during the “subclinical” progression until AF manifests. Introducing a time variable to reflect the interval between the index ECG and AF onset could capture these varying “states”, refining the model’s ability to indicate also ‘when’ AF occurs, rather than simply ‘if’, and potentially revealing such evolving patterns. Finally, the common issue of undetected AF episodes in patients classified as non-AF still persists. This underscores the difficulty of establishing truly AF-negative cases without continuous monitoring (e.g., via implantable devices or wearable devices capturing data on a continuous basis), reflecting the broader challenge posed by the intermittent nature of AF in screening contexts.
We see how this study exemplifies both the promise and challenges of integrating AI into clinical practice, harnessing the ‘hidden’ potential of ECGs to perform prognostic tasks beyond human capability. Complementing previous efforts in the field, not only confirms the feasibility of such AI-driven approaches but also establishes a methodological benchmark for future studies. Yet, despite AI’s remarkable ability to uncover invisible patterns and provide early alerts for ‘silent’ conditions, common concerns in these approaches remind us that clinicians must interpret such outcomes thoughtfully to ensure their true clinical relevance.