FLEX: Knowledge-Guided Feature Enhancement
A novel framework for robust and fair pathology foundation models
FLEX is a novel framework designed to solve two critical failures in current pathology AI: poor cross-site generalization and demographic bias. By leveraging expert-guided visual and textual knowledge, FLEX disentangles true biological signals from "site-specific signatures," ensuring robust and fair diagnostics across diverse clinical environments.
The Challenge: Shortcut Learning in Pathology
Pathology foundation models often achieve high performance by learning spurious correlations rather than pathology.
-
Site-Specific Signatures: Variations in tissue preparation, staining, and scanning equipment create unique "signatures" for each hospital. Models learn these signatures as shortcuts. They perform well on data from familiar hospitals (In-Domain) but fail catastrophically on data from new sites (Out-of-Domain).
-
Demographic Bias: Models frequently exhibit performance disparities across racial and ancestral groups due to dataset imbalances and biological variations.
Figure: (a-b) Illustration of site-specific signatures causing shortcut learning. (d) The FLEX workflow utilizing visual and textual knowledge.
The Solution: Knowledge-Guided Information Bottleneck
FLEX operates as a model-agnostic module that refines features extracted by Vision-Language Models (VLMs). It employs a Variational Information Bottleneck to filter out noise (site/demographic artifacts) while retaining task-relevant biological information.
The Core Mechanism
FLEX guides feature learning using two forms of expert-verified prior knowledge:
Visual Concepts:
- Representative patch images for specific cancer subtypes are identified by attention mechanisms.
- These patches are filtered by GPT-4o and verified by human pathologists to ensure they represent true biological features.
Textual Concepts:
- Clinically accurate text prompts (e.g., "invasive ductal carcinoma") are generated and approved by board-certified pathologists.
- Why text? Textual descriptions are inherently free of image-based artifacts (like scanner noise or stain variations), providing a clean signal for alignment.
Training Process: An InfoNCE loss aligns patch features with these clean textual concepts, effectively "forcing" the model to unlearn site-specific shortcuts.
Figure: The architecture of FLEX. (a-b) Generation of visual and textual concepts. (c) The variational information bottleneck guided by concept constraints.
3. Key Results & Evidence
Evaluation was conducted across 16 clinical tasks (morphology, biomarkers, gene mutations) using the TCGA dataset for training, and CPTAC and NFH (in-house) datasets for external validation.
A. Superior Cross-Domain Generalization
FLEX significantly closes the gap between in-domain and out-of-domain performance.
- Robustness: In site-preserved cross-validation, FLEX reduced the performance drop seen in baseline models. For the STAD Lauren task, FLEX increased accuracy on unseen sites by 15.13%.
- Zero-Shot Success: When applied to completely unseen external cohorts (CPTAC and NFH) without fine-tuning, FLEX consistently outperformed baselines (see Figure 2).
Figure: (a) UMAP shows FLEX removing site clustering. (c) FLEX outperforms baselines in both In-Domain (IND) and Out-of-Domain (OOD) settings.
B. Improving Demographic Fairness
FLEX creates more equitable diagnostic tools.
- Reduced Disparities: The framework reduced the performance gap between racial groups (e.g., White vs. Black patients) across multiple tasks.
- Equitable Detection: Analysis of True Positive Rate (TPR) disparities shows that FLEX yields diagnostic decisions that are less biased toward specific demographic groups compared to standard stain normalization methods.
Figure: Evaluation of demographic fairness. FLEX (Pink) consistently shows lower fairness gaps and lower TPR disparity compared to baselines.
C. Versatility and Interpretability
- Model Agnostic: FLEX proved effective across different foundation models (CONCH, PathGen-CLIP, QuiltNet) and various MIL architectures.
- Interpretability: Attention maps reveal that while baseline models focus on noise or background tissue, FLEX focuses precisely on tumor regions and relevant histological structures.
4. Conclusion
FLEX establishes a new standard for deploying pathology AI in real-world clinical settings. By strategically using multimodal prior knowledge to suppress site-specific and demographic noise, it offers a pathway to reliable, responsible, and equitable computational pathology.
Citation
@article{huang2025flex,
title={Knowledge-Guided Adaptation of Pathology Foundation Models Effectively Improves Cross-domain Generalization and Demographic Fairness},
author={Huang, Yanyan and others},
journal={Nature Communications},
year={2025}
}