Changing the main architecture of the segmentation model alone maintained segmentation performance within ischemic-stroke cohorts, while achieving better classification in broader disease populations. This study highlights the need for deep-learning models to be validated not only for segmentation performance within target disease cohorts but also across diverse clinical environments to ensure practical utility.