Computational analysis of peripheral blood smears detects disease-associated cytomorphologies

de Almeida, José Guilherme; Gudgin, Emma; Besser, Martin; Dunn, William G.; Cooper, Jonathan; Haferlach, Torsten; Vassiliou, George S.; Gerstung, Moritz

doi:10.1038/s41467-023-39676-y

Download PDF

Article
Open access
Published: 20 July 2023

Computational analysis of peripheral blood smears detects disease-associated cytomorphologies

Nature Communications volume 14, Article number: 4378 (2023) Cite this article

3953 Accesses
1 Citations
76 Altmetric
Metrics details

Subjects

Abstract

Many hematological diseases are characterized by altered abundance and morphology of blood cells and their progenitors. Myelodysplastic syndromes (MDS), for example, are a group of blood cancers characterised by cytopenias, dysplasia of hematopoietic cells and blast expansion. Examination of peripheral blood slides (PBS) in MDS often reveals changes such as abnormal granulocyte lobulation or granularity and altered red blood cell (RBC) morphology; however, some of these features are shared with conditions such as haematinic deficiency anemias. Definitive diagnosis of MDS requires expert cytomorphology analysis of bone marrow smears and complementary information such as blood counts, karyotype and molecular genetics testing. Here, we present Haemorasis, a computational method that detects and characterizes white blood cells (WBC) and RBC in PBS. Applied to over 300 individuals with different conditions (SF3B1-mutant and SF3B1-wildtype MDS, megaloblastic anemia, and iron deficiency anemia), Haemorasis detected over half a million WBC and millions of RBC and characterized their morphology. These large sets of cell morphologies can be used in diagnosis and disease subtyping, while identifying novel associations between computational morphotypes and disease. We find that hypolobulated neutrophils and large RBC are characteristic of SF3B1-mutant MDS. Additionally, while prevalent in both iron deficiency and megaloblastic anemia, hyperlobulated neutrophils are larger in the latter. By integrating cytomorphological features using machine learning, Haemorasis was able to distinguish SF3B1-mutant MDS from other MDS using cytomorphology and blood counts alone, with high predictive performance. We validate our findings externally, showing that they generalize to other centers and scanners. Collectively, our work reveals the potential for the large-scale incorporation of automated cytomorphology into routine diagnostic workflows.

Mapping genotypes to chromatin accessibility profiles in single cells

Article 08 May 2024

Single-cell and spatial transcriptomics analysis of non-small cell lung cancer

Article Open access 23 May 2024

The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms

Article Open access 22 June 2022

Introduction

The diagnosis of hematological malignancies relies on expert cytomorphological examination of blood, bone marrow and/or other tissue biopsies, together with molecular analyses that aid subclassification and prognosis¹. For example, anemias, characterized by reduced hemoglobin concentration (Hb) and altered red blood cell (RBC) numbers, can be both a disease and a feature of other conditions such as myelodysplastic syndromes (MDS), a heterogeneous group of myeloid neoplasms that can progress to acute myeloid leukemia (AML)^2,3,4. For this reason, the diagnosis and further subtyping of MDS requires the detection of cytopenias, changes to white blood cell (WBC) and RBC maturation blood cell through cytomorphologic analysis of bone marrow (BM) and peripheral blood slides (PBS), cyto- and histochemistry, karyotyping and immunophenotyping^4,5,6,7,8.

An accurate diagnosis of MDS and other hematological malignancies is essential to guide treatment: while megaloblastic anemia (MA), which can be confused with MDS^9,10,11, is generally treated with dietary changes or supplements¹², the treatment of MDS generally involves chemotherapeutic agents, blood/platelet transfusions and hypomethylating agents^13,14 and depends on risk stratification which considers blood counts, BM cytomorphology and cytogenetics¹⁵. Furthermore, MDS prognosis can also benefit from molecular genetics, used to define clinically-relevant MDS subtypes such as SF3B1-mutant MDS that is associated with improved survival times^16,17. It should be noted that MDS cases with splicing factor mutations such as SF3B1-mutant MDS account for over 50% of all cases^18,19, constituting an important MDS subtype.

While abnormalities such as an increased prevalence of hypolobulated granulocytes, abnormal granularity in neutrophils or abnormal RBC are common in MDS^8,20,21,22, peripheral blood cell morphology is generally insufficient for MDS diagnosis. This is compounded by challenges in the assessment of subtle cytomorphological alterations and heterogeneity across any given PBS leading to inter-observer variation. While diagnoses stemming from the analysis of a PBS (requiring the analysis of hundreds of cells) typically show high concordance, the classification and characterization of individual WBC is more challenging^23,24. Additionally, the evidence on whether trained experts can distinguish specific cell types is conflicting^25,26, and a study looking specifically at cell type classification concordance among 28 morphologists showed that experts agreed on only 60% of all classified cells²⁷. This creates challenges in identifying relevant cytomorphology-disease associations. Computational methods, which have shown promise in the characterization and prognostication of MDS and AML using bone marrow slides^28,29,30 and identification of abnormal leukocytes³¹, can help address some of these problems.

Here we present Haemorasis, a machine-learning protocol that automatically detects and characterizes blood cells in PBS, and apply it to a cohort of individuals with MDS or anemia demonstrating its use in predicting diseases and deriving novel “morphotypes”, associations between cellular morphology and different blood conditions. We show that SF3B1-mutant MDS can be distinguished from other MDS using cytomorphology and blood counts alone with high predictive performance, with hypolobulated neutrophils and large RBC being more prevalent in this MDS subtype. Using expert-annotated WBC and RBC, we show that virtual cell types are enriched in commonly recognized WBC and RBC types/abnormalities. Finally, we externally validate our approach, showing that it largely generalizes to different centers and WBS scanners.

Results

The MLL cohort captures previously described clinical features of MDS and anemia

The MLL cohort was composed of 203 male and 159 female individuals, with mean age 66.1 (362 individuals in total). Individuals with MDS were older than the remaining MLL cohort, with a bias towards males as previously reported³²—the chance of having MDS in our cohort increased by 12% every year, with males being more than twice as likely to have MDS (p = 8 × 10⁻¹⁶ and p = 0.00017, respectively, for the binomial regression of MDS diagnosis based on age and sex; Fig. 1a, b; Table 1).

**Fig. 1: General features of anemias and myelodysplastic syndromes (MDS) in the MLL cohort.**

Table 1 Statistical comparisons of different features of the MLL cohort

Full size table

Additionally, for the linear regression of WBCC against binary MDS and anemia (vs. Normal), MDS and deficiency anemias were associated with leukopenia (1,200 (p = 0.04) and 1,800 (p = 0.009) fewer WBC/µL respectively (Fig. 1c). However, this leukopenic tendency in anemias was driven by MA—whereas IDA was indistinguishable from controls, MA had approximately 3200 fewer WBC/\({\mu L}\) than controls (\(p=6\times {10}^{-14}\) for a two-sample t-test) as in previous studies^10,33,34. Hb was also much lower in MDS and anemias (Fig. 1d)—indeed, the Hb of these individuals was lower than that of normal individuals by 4.34 and 6.38 g/dL, respectively (\(p \, < \, 2\times {10}^{-16}\) and \(p \, < \, 2\times {10}^{-16}\), respectively, for the linear regression of Hb against binary MDS and anemia diagnosis indicators). No difference between controls and MDS or anemia cases was observable with regards to platelet counts (Plt), but MA had approximately 146,000 fewer platelets/µL than controls (\(p=3\times {10}^{-12}\) for two sample t-test; Fig. 1e) in keeping with previous reports³³.

Finally, SF3B1-mutant MDS displayed distinct features compared to SF3B1-wt MDS—particularly, WBC and Plt were comparable to those of controls and higher than those found in SF3B1-wt MDS (\(p=0.3\) and \(p=0.15\) for two sample t-tests comparing WBC and Plt between SF3B1-mutant MDS and controls; \(p=0.002\) and \(p \, < \, 2\times {10}^{-16}\) for two sample t-tests comparing WBC and platelet counts, respectively, between SF3B1-mutant and SF3B1-wt MDS), in keeping with previous reports^16,17.

To validate the disease prediction findings we will report ahead, we also digitized slides for the CUH2 cohort (Methods) and compared it with the MLL cohort in terms of age and blood counts. We found statistically significant differences in Hb and Plt in controls (p = 0.009 and p = 0.002, respectively, for two-sided t-tests comparing between cohorts; Table 2), all of which are a likely consequence of the difference in ages (p = 4 × 10⁻⁷). Finally, we also found relatively small but statistically-significant differences between Hb in IDA (p = 0.001) and age in MA (p = 0.0001) and other MDS subtypes (p = 0.001).

Table 2 Differences between cohorts (MLL vs. CUH2) regarding age and blood counts (Hb hemoglobin concentration, Plt platelet count, WBCC WBC counts) stratified by condition (Control; IDA iron deficiency anemia, MA megaloblastic anemia, SF3B1-mutant - SF3B1-mutant MDS; Other - Other MDS subtypes)

Full size table

Computational cytomorphology of peripheral blood slides

We detected cells in PBS using Haemorasis (Fig. 2a). For the first stage of this method, quality control of PBS tiles, we trained a DL model to predict whether specific tiles are of “good” or “poor” quality (Supplementary Fig. S1a). This (i) reduces the inclusion of non-cellular objects in downstream analyses, thus reducing artifact-associated variation and (ii) limits processing to the clinically-relevant part of the PBS (usually hematologists will consider <20% of the total area; Supplementary Results). Next, we detect both WBC and RBC on “good” quality tiles. To detect WBC, we trained a U-Net-based³⁵ DL model on a dataset of >2800 manually annotated WBC in PBS from CUH1. Extensive data augmentation (random image alterations; Supplementary Table S3) were used to make the model more robust. We validated this model on test sets from CUH1, CUH2 and MLL, with test time augmentation (TTA) improving predictions and prediction post-processing greatly reducing the number of false positive WBC predictions (Fig. 2b, Supplementary Fig. S2a-c). We confirmed the good performance of the model through visual inspection (Fig. 2b, c, Supplementary Fig. S2d, e) and, while some errors were detected (Fig. 2c), these were small and rare with the model performing well across different cohorts (Supplementary Results; Fig. 2b). RBC were detected using a simple computer vision protocol and predictions were filtered using XGBoost, a fast and scalable machine-learning algorithm³⁶ (Supplementary Methods; Supplementary Results; Fig. 2d), ensuring that non-RBC objects in PBSs were removed and reducing the rate of false positives from 17.3% (the false positive rate (FPR) in the training dataset) to 1.9% (the product of the validation FPR—11%—of our RBC filtering model FPR and the original FPR in the dataset; in other words, only 1 out of 50 RBC candidates predicted as RBC are false positives).

**Fig. 2: Haemorasis – automated detection and analysis of blood cells in peripheral blood slides (PBS).**

Across all cohorts, for each PBS we detected an average of 26,000 (range 70 to 133,916) RBC per PBS (a total of 12,042,425 RBC) and around 1,400 (range 12 to 39,862) WBC/PBS (a total of 646,952; Fig. 2e)). The cellular density for the MLL cohort was on average smaller by 44% for RBC/mm² and 10.5% for WBC/mm² compared to CUH (Supplementary Table S4; Supplementary Fig. S3a). Further heterogeneity was observed across conditions, with controls having the highest WBC density and the lowest RBC density (28.9 WBC/mm², 189 RBC/mm²), anemia having the highest RBC density (383 RBC/mm²) and MDS having the lowest WBC density (13.8 WBC/mm²; Supplementary Fig. S3b). Lastly, we also noted that automated blood films produced a higher fraction of good tiles compared to manually prepared slides while controlling for cohort and condition—an additional 5%, highlighting the utility of standardization (Supplementary Fig. S3c).

In line with the findings in Fig. 1c, we extracted on average more cells in controls than in individuals with either MDS or anemia (Supplementary Table S4), although heterogeneity across slides rendered this trend statistically insignificant. Generally, the cellular density of detected WBC in the PBS correlated with WBCCs from automated analysers, validating our detection protocol through an orthogonal approach (robust \({R}^{2}=0.39\), \({{CI}}_{95\%}=\left[{{{{\mathrm{0.30,0.49}}}}}\right]\); Fig. 2f), and demonstarting that we detect a representative number of WBC in PBS. Finally, we characterized all individual cells using morphological features used in other morphometric software programs^37,38,39 (Supplementary Table S5; Supplementary Fig. S3). For each cell, we quantified its size, shape, color distribution and texture and for WBCs we also characterized their nuclear size and shape (Supplementary Fig. S4). We note here that our method for WBC nuclei segmentation underperforms in conditions of low contrast (where nucleus and cytoplasm are hard to distinguish) or high granularity (particularly for eosinophils and basophils; Supplementary Fig. S5), leading us to focus on cases of high contrast and avoiding conclusions pertaining to eosinophils or basophils.

Morphological heterogeneity informs disease prediction

We test four distinct tasks to determine whether Haemorasis can be used to meaningfully predict conditions from PBS: (i) disease detection, (ii) disease classification, (iii) MDS genetic subtyping and (iv) anemia classification. Morphometric moments (feature mean and variance across all cells in a PBS) differed across different conditions (Supplementary Results; Supplementary Fig. S6). This qualitative assessment was corroborated by fitting a binomial elastic-net regression model (glmnet)⁴⁰ for each task using morphometric moments in addition to WBCC, Hb and Plt. Performance was evaluated using 5-fold cross-validation (the data were split into 5 non-overlapping validation sets while the rest was used for training, leading to less biased models⁴¹).

Morphometric regression showed high cross-validated predictive performance across all tasks (Fig. 3a, Supplementary Fig. S7a), including an AUC of 89.7% for MDS genetic subtyping (Supplementary Fig. S7a). Additionally, blood counts are highly predictive of SF3B1-mutant MDS as indicated in Fig. 1c, e and previous publications^16,17 (Fig. 3, Supplementary Fig. S7a). Notably, morphological feature variance had a significant impact on prediction, revealing that cytomorphological heterogeneity is important for diagnosis (Fig. 3b), as previously suggested for red cell distribution width (RDW)⁴². Finally, the relative importance of different features revealed important trends: for instance, SF3B1-mutant MDS was characterized by higher Plt, larger RBC and smaller WBC nuclear area (Supplementary Fig. S7b). However useful, this protocol makes retrieving illustrative examples of blood cells more challenging: larger RBC or more irregular WBC are easily understandable morphometric changes, but changes in morphometric variance do not permit satisfactory explanations, making the pictorial demonstration of their importance more elusive.

**Fig. 3: Morphometric moments improve prediction.**

Discovering diagnostically relevant morphotypes

At first inspection, the two-dimensional representation of the distribution of cytomorphological characteristics of different conditions revealed an interwoven landscape without immediately recognizable cell clusters (Fig. 4a). However, it becomes apparent that different parts of the cytomorphology space are differentially populated by different conditions. To partition this space and define morphotypes (disease-associated cytomorphological phenotypes), we use a MIL approach that clusters cells based on their cytomorphological characteristics such that the resulting computational morphotypes (CMs) become relevant to the aforementioned diagnostic tasks (Fig. 4b; Supplementary Methods).

We performed Morphotype analysis simultaneously considering the four objectives described earlier and established stable morphotypes consistently found through 5-fold cross-validation (Supplementary Methods). Morphotype analysis performed similarly to morphometric moment prediction when predicting conditions (Fig. 4; Supplementary Fig. S8, Supplementary Fig. S9), with the added benefit of producing human-interpretable, disease-associated cytomorphologies. To further demonstrate this, we provide an online visualization tool that allows readers to observe the visual cohesion of different morphotypes (https://josegcpa.github.io/haemorasis-umap; Supplementary Methods). This approach revealed 8 stable WBC morphotypes (denoted WCM 1–8), accounting for 60% of WBCs in normal samples, as well as 12 stable RBC morphotypes (RCM 1–12) comprising 90% of RBCs (Supplementary Fig. S10). These stable WBC and RBC morphotypes displayed distinct cytomorphological characteristics, while the remaining morphotypes were found to be of variable nature. Among the stable morphotypes, 7 WCMs and 7 RCMs exhibited robust associations with specific clinical conditions (Fig. 5a-c, Supplementary Fig. S11).

**Fig. 5: Computational morphotypes across conditions.**

Among the stable WBC morphotypes, four mostly consisted of different neutrophil morphologies (WCM-1,2,3 and 4 in Fig. 5a, b), highlighting their cytomorphological diversity and diagnostic relevance. WCM-5 contained small lymphocytes, WCM-6 larger lymphocytes and myeloid progenitors, whereas WCM-7 consisted of diverse myeloid cells. We confirmed clinically-relevant cellular phenotypes such as the increased prevalence of abnormal neutrophils in cases of MDS and deficiency anemia (WCM-1 and 2); in MDS, lymphocytes (WCM-5) were less prevalent while immature myeloid cells (WCM-6) are more prevalent as previously suggested^43,44. Morphotype analysis also identified novel morphotypes—particularly, WCM-3 (normal hypolobulated neutrophils) appeared to be more prevalent in SF3B1-mutant MDS, and larger and/or hyperlobulated neutrophils were more prevalent in MA than in IDA (WCM-2 and 4). We confirm these using single-objective Morphotype analysis, where Morphotype analysis models are trained on a single task (Supplementary Fig. S12). Finally, we found that WCM-5 (small lymphocytes) were more prevalent in anemia when compared with MDS.

Stable RBC morphotypes showed more subtle differences. Some morphotypes were relatively more normal—RCM-1 and 2 contained mostly normal or spherocytic RBCs (Fig. 5a, c)—whereas others (RCM-3 and 4) captured larger RBC and elliptocytes. RCM-5 captured relatively small RBC and some poikilocytes, and RCM-6 and 7 captured hypochromic RBCs. We show that RCM-6 and 7 (hypochromic RBCs) were more typical of anemia than MDS as previously reported in IDA¹, with RCM-7 being more prevalent in IDA compared to MA. RCM-3 and 4 (large RBC and elliptocytes) were more prevalent in SF3B1-mutant MDS compared with SF3B1-wt MDS, while RCM-5 (poikilocytic RBC) were more common in IDA compared to MA.

Notably, morphotype frequency offers more tangible explanations for the associations of morphometric moments with certain diagnoses. If, on average, certain morphotypes are more prevalent in a specific condition, this will manifest as a relation with shifts in the means and/or heterogeneity (variances) of different features (Fig. 6, Supplementary Fig. S13). For example, the variance of the WBC nuclear perimeter, shown to be important for disease detection (Supplementary Fig. S13a), can be explained by the differential frequencies of different WCMs (Fig. 6a): the higher prevalence of WCM-7 drives the increased heterogeneity of this feature in normal individuals. In disease classification, we can further observe how the increased variance of RBC shape irregularity (standard deviation of the centroid distance function) in anemia compared to MDS is partly explained by the elevated prevalence of RCM-5, 6, and 7 (relatively circular, some poikilocytes) and lower prevalence of RCM-3 and 4 (larger and more elliptic; Fig. 6b). Finally, WBC nuclear convexity exhibits a stronger bimodality and therefore greater variance in SF3B1 mutant MDS cases, driven in part by WCM-1 and 3 (Fig. 6c). Finally, the clear increase in the mean of RBC area in SF3B1-mutant MDS, is due to the higher prevalence of RCM-3 and 4 and lower prevalence of RCM-1 and 7 in SF3B1-mutant MDS (Fig. 6d).

**Fig. 6: Computational morphotype proportions explain morphometric feature distributions.**

Computational cytomorphology validation

To confirm the nature of the computational morphotypes and their diagnostic associations, we performed: (i) a blinded annotation of cell types by expert clinical hematologists and (ii) a validation of their predictive value in the CUH2 cohort. First, we assessed whether the morphotypes determined by Morphotype analysis were enriched in known cell types. Three hematologists labeled up to 1746 RBC and 1600 WBC. This demonstrates that morphotypes are enriched in known RBC and WBC types (Fig. 7a, b), but inter-expert concordance was limited for some rare cell types (particularly hypolobulated neutrophils and blasts; Supplementary Fig. S14). These results are also observed in single objective morphotype analysis, particularly for disease detection and classification models (Supplementary Fig. S15). Furthermore, the CMs enriched in artifacts were rarely enriched with known cell types.

**Fig. 7: Expert and external validation of computational cytomorphology.**

To validate the accuracy and robustness of our models, we used a second cohort of 63 slides from the CUH2 cohort representing a similar spectrum of diagnoses to the MLL cohort but digitized using a different slide scanner (Aperio AT2). In all cases, we evaluated the best performing fold from the previous cross-validations—we did this to get a clear measure of the real-world performance of such methods in a clinical context using the prediction of a single model with interpretable morphotypes, rather than a set of models which may not be available or yield slightly different results. Both our models—glmnet and multi-objective Morphotype analysis—displayed good generalization, with most external validation AUC intervals overlapping with cross-validated AUC estimates (Fig. 7c, d). We note that including morphotypes found to be statistically unstable in the original discovery step led to a deterioration of validation accuracy in the disease detection task. Finally, the single objective Morphotype analysis yielded worse generalization even when limiting to stable morphotypes (Supplementary Fig. S16), indicating that simultaneously learning multiple tasks unravels more robust morphotypes.

Discussion

We present an automated protocol for the detection and characterization of thousands of blood cells in PBS linked with machine-learning methods that can use these cellular descriptions to distinguish between clinical conditions and identify novel associations between cytomorphological phenotypes and clinical diagnoses. Importantly, we show that our approach generalizes to other centers and scanners.

Haemorasis, our open-source method to extract and characterize large numbers of WBC and RBC from digitized PBS, demonstrates how this can be automated with no recourse to proprietary software. We make it publicly available as a Docker container, enabling its straightforward application. Using Haemorasis, we detect and characterize over half a million WBC and millions of RBC. With morphometric moments (the mean and variance of morphometric features for each PBS) we show the diagnostic importance of cytomorphological heterogeneity for various conditions. This observation bears similarity to previous reports that associated RDW with increased AML transformation risk^42,45. It is also worth considering that quantifying morphological variation, especially of subtle features, is likely to be challenging to achieve by visual assessment, as it requires the absolute quantification and evaluation of large numbers of cells.

To establish disease-associated cytomorphological changes of RBC and WBC, we developed morphotype analysis and applied it to over half a million WBC and millions of RBC. This showed that MDS cases with a larger prevalence of hypolobulated neutrophils and larger RBCs are more likely to harbor SF3B1 mutations. The latter finding corroborates previous findings, where higher MCV was observed in SF3B1-mutant MDS when compared with other MDS subtypes^46,47, highlighting the role of SF3B1 mutations in erythropoiesis⁴⁸ and the potential role of PBS RBC morphology in diagnosis. Additionally, neutrophil hyperlobulation was robustly detectable not only in MA but also in IDA^49,50, demonstrating that the ability to computationally analyze large numbers of cells can detect this feature even when it is subtle and would otherwise require enumeration of large numbers of neutrophils and their lobe count by experts⁵⁰. We also observed larger neutrophils in MA, highlighting a common mechanism behind the enlargement of both RBCs and neutrophils in this condition. Reassuringly, morphotypes are enriched with known cell types and that our approach validates externally, generalizing to PBS from other centers digitized using different slide scanners. Nonetheless, further studies are required to validate the clinical relevance of the described morphotypes using more diverse training and validation data, to accommodate possible preparation- and scanner-specific artifacts^51,52. Furthermore, while we determine new and previously known morphological trends, others pertaining to rarer cell types may require other approaches more sensitive to rarer cells. Here, we tried to maximize the explainability of our models by retrieving morphotypes and enhancing their generalization by using stable morphotypes, which eliminate unlikely solutions specific to subsets (folds) of the training data. However, the truth is that artificial intelligence models, more often than not, fail in clinical settings^53,54, such that the path forward should include the clinical application of these models as a point-of-care solution with larger sample sizes. Additionally, ongoing efforts to establish multicentre cohorts and tissue-bank for MDS, such as the National MDS Natural History Study⁵⁵, could be used as a better assessment of how well computational cytomorphology performs in routine MDS diagnosis.

It is also important to point out some additional limitations of our method. Firstly, the blood cell detection techniques, ranging from WBC and RBC detection to nuclear segmentation in WBC and morphometric characterization, could be improved. For example, methods using transformers for hierarchical multi-resolution deep-learning architectures can replace the task of using predefined features, which can also bias results, as has been recently shown in histopathology⁵⁶. Our protocol was designed to mimic the typical steps of hematological assessment—detecting relevant regions of the slide/image, identification of individual cells and analyzing their morphology. Other steps were pragmatic: owing to their great numbers, annotating RBCs in order to train a dedicated detection algorithm would be time consuming; for this reason, we opted for a fast and simple, but perhaps less accurate detection protocol. With appropriately annotated data, supervised RBC detection could be further improved with deep-learning models. Lastly, the characterization of blood cells and nuclei using self-supervised or unsupervised deep-learning methods might also help avoid some biases introduced by more predefined sets of features.

Notwithstanding the limitations discussed above, our work provides proof-of-principle that computational cytomorphology can augment the ability of automated blood cell analyses to identify abnormalities suggestive of hematological disease, with minimal additional cost. This can help identify patients needing further and usually more invasive and expensive testing, such as bone marrow aspirates or genomic sequencing. Recent applications of computational cytomorphology on bone marrow smears have demonstrated how it can identify leukocytes^31,57 and assist diagnostic predictions^28,29,30 in specialized haemato-oncology. By demonstrating that this can now be extended to blood smears/slides, our work reveals the potential for the large-scale incorporation of automated cytomorphology into routine diagnostic workflows.

Methods

Collection and digitalization of peripheral blood slides

Three retrospective sets of PBS with coverslips from two different centers were digitized at 40× magnification using two different slide scanners:

Training/discovery

∘ CUH1: used for training of cell detection. 54 PBS from randomly selected cases. PBS were automatically prepared at Cambridge University Hospitals (CUH) and scanned using a Hamamatsu NanoZoomer 2.0 (ndpi format).

∘ MLL: used for training the disease prediction models and discovery of disease-associated morphologies (or computational morphotypes, as detailed below in the Methods). 362 PBS from individuals with MDS with mutations in either SF3B1, SRSF2, U2AF1 or RUNX1, iron deficiency anemia (IDA), megaloblastic anemia (MA) and hematological controls. Manually prepared at Munich Leukemia Laboratory and scanned using a Hamamatsu NanoZoomer 2.0 (ndpi format).
Validation – CUH2: used for validation of disease predictions. 68 PBS from individuals with MDS with mutations in either SF3B1 or SRSF2, IDA, MA, and hematologically normal controls. The PBS were prepared manually (MDS) or automatically (controls, anemias) at CUH and digitized using an Aperio AT2 (svs format).

MLL data were collected and digitized with individual informed consent for research purposes and the study was reviewed and approved by the Munich Leukemia Laboratory’s internal institutional review board and follow the European Union’s General Data Protection Regulation (GDPR). Regarding Addenbrooke’s data, the study was approved by the National Health Service Health Research Authority and the Health and Care Research Wales (Research Ethics Committee reference: 23/PR/0578).

We either digitized the entire PBS or selected the region of the PBS containing blood cells. Each slide was inspected and removed if lacking in quality (Supplementary Methods) and the final cohort composition is presented in Table 3. Details for MLL samples regarding age, sex, blood counts, and clinical diagnostic, and CUH2 regarding blood counts and clinical diagnostic are listed in Supplementary Table S1, Supplementary Table S2, respectively. To the best of our knowledge, no treatment had been administered at the time of sample collection and slide preparation.

Table 3 Condition-specific composition of each cohort. Numbers in brackets represent the total after excluding poor quality digitized PBS

Full size table

Haemorasis—computational detection and characterization of blood cells

Our blood cell detection and analysis pipeline—Haemorasis—assesses small, computationally tractable parts of the large PBS scans and consists of the following four steps (Supplementary Methods):

(1)
Quality control to detect informative \(512\times 512\) pixel areas on each slide based on DenseNet121⁵⁸. This removes tiles where the concentration of cells was too small (very few/no cells) or too large (high frequency of overlapping cells) or images were blurred (Supplementary Fig. S1). This ensures that the analyzed area corresponds approximately to the monolayer, the recommended area of analysis for hematologists⁵⁹.
(2)
Red blood cell detection on tiles using a combination of Canny edge detection⁶⁰ and other simple computer vision operations and filtering of other objects (platelet clumps, groups of RBCs or individual WBCs) with an XGBoost model³⁶, using as features the morphometric characteristics described below in point 4. and more extensively in the Supplementary Methods. In essence, we generated a set of candidate RBCs which were then categorized according to their morphometry as illustrated by other methods^{61,62,63,64,65,66}.
(3)
White blood cell detection and segmentation from tiles was based on U-Net³⁵, a popular and robust algorithm for cell segmentation⁶⁷. WBC nuclei were segmented by clustering WBC pixels using k-means clustering (k = 2)⁶⁸, assuming that the darker region of the WBC corresponds to the nucleus as shown by others⁶⁸.
(4)
Morphometric characterization of RBC and WBC was performed using well-established morphometric features, available in popular bioimage analysis packages^37,38,39. 53 features were calculated for each WBC (42 for cellular characterization and 11 for nuclear characterization) and 42 features for each RBC (Supplementary Methods).

Cytomorphological prediction of clinical conditions

We assess the predictive performance of Haemorasis with four binary prediction tasks:

1.
Disease detection—identifying the presence of either deficiency anemia or MDS vs. normal blood;
2.
Disease classification—distinguishing between deficiency anemia and MDS;
3.
MDS genetic subtyping: distinguishing between SF3B1-mutant and SF3B1-wildtype (SF3B1-wt) MDS;
4.
Anemia classification—distinguishing between IDA and MA.

Machine-learning using morphometric moments

We assess the predictive performance of morphometric moments using elastic-net regression (glmnet)⁴⁰ with 5-fold cross-validation on MLL, calculating the cross-validated area under the receiver operating characteristic curve (AUC). Morphometric moments, the mean and variance of each feature for each cell type (RBC, WBC), were calculated for each PBS individually and used as a proxy for the distribution of each feature on each PBS. We test how blood counts (WBC counts (WBCC; cells/µL), hemoglobin concentration (Hb; g/dL) and platelet counts (Plt; platelets/µL)) affect classification performance, and preprocess features by standardizing them. Finally, we assess the contribution of features/groups of features on prediction (Supplementary Methods).

Morphotype analysis

The task of diagnosing hematological conditions from a PBS can be abstracted—given a set of objects (cells), each is classified into a given class (cell type) and the presence/relative prevalence of different cell types is indicative of specific hematological conditions. This can be viewed as a problem of multiple instance learning (MIL), a machine-learning field that focuses on classifying a set of objects based on its composition⁶⁹.

Considering this, we devised Morphotype analysis, an approach that (i) identifies relevant morphological classes of cells (computational morphotypes—CMs) without recourse to human-based cell annotation and (ii) distinguishes between conditions using CM proportions. Morphotype analysis can also incorporate other data types (i.e., blood counts; Supplementary Methods). We consider WBC and RBC separately, deriving separate WBC and RBC CMs. Being continuous, we optimized Morphotype analysis using gradient-descent (particularly Adam⁷⁰). We test Morphotype analysis using a single set of CM for the four tasks specified above (multiple objectives—MO) and with a different model for each task (single objective—SO), and the impact of different assumptions regarding the number of CM (25 and 50 for MO and 10, 25, and 50 for SO) on prediction. For validation, we considered only stable CMs—less biased CMs which are consistently detected across different cross-validation folds (Supplementary Methods). Similarly to our models using morphometric moments, we tested the effect of blood counts on classification predictions.

Validation

Expert annotation of blood cells

Three expert clinical hematologists annotated 1746 RBC and 1600 WBC automatically detected in MLL PBS to assess whether CMs were enriched in any expert-annotated blood cell type. RBC and WBC were annotated as belonging to a set of classes including normal and abnormal cell types and artefacts (Supplementary Methods). Enrichment for each CM was calculated as the ratio between the proportion of cells of a given expert-annotated cell type belonging to that CM (type/CM), divided by the proportion of cells of the given cell type in the entire set of expert-annotated cells (type/total).

External validation

To externally validate the performance of our predictive methodologies—glmnet with morphometric moments and Morphotype analysis—we tested the best performing models on CUH2, reporting their AUC estimates with standard errors (calculated as \(\frac{1}{n}\), where \(n\) is the size of the validation sample).

Statistical analysis

All statistical analyses in this work were conducted using the R statistical software (v3.6.3)⁷¹. The MASS package⁷² was used to calculate robust R and dunn.test⁷³ was used to calculate Dunn-Bonferroni tests. Machine-learning models were implemented in either R (glmnet package⁴⁰) or Python⁷⁴ for morphotype analyses (using PyTorch⁷⁵).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The digitized PBS image data generated in this study and used for training (MLL) have been deposited in the BioImage Archive database under accession code S-BIAD440. The annotated datasets for tile quality classification, white blood cell segmentation and red blood cell filtering are available in https://doi.org/10.6084/m9.figshare.19153760. The machine-learning model parameters are available at https://doi.org/10.6084/m9.figshare.19164209. The necessary data to run Morphotype analysis is available at https://doi.org/10.6084/m9.figshare.19372292. The output of the Morphotype analysis, as well as the expert annotated cells, and the data necessary for downstream analysis are available at https://doi.org/10.6084/m9.figshare.19369391 and https://doi.org/10.6084/m9.figshare.19371008, respectively. An online platform for morphotype visualization is available in https://josegcpa.github.io/haemorasis-umap and the data supporting it is available in https://github.com/josegcpa/json-haemorasis.

Code availability

We have made the Haemorasis pipeline available in https://github.com/josegcpa/haemorasis and as a Docker container in https://hub.docker.com/repository/docker/josegcpa/blood-cell-detection (Supplementary Methods). Morphotype analysis (mil-comori) and the statistical analysis and plot generation code (analysis-plotting) are available at⁷⁶. The code for the quality control network is available at https://github.com/josegcpa/quality-net. The code for the U-Net is available at https://github.com/josegcpa/u-net-tf2.

References

Bain, B. J. Blood Cells: A Practical Guide. (John Wiley & Sons, 2014).
Valent, P. et al. Proposed minimal diagnostic criteria for myelodysplastic syndromes (MDS) and potential pre-MDS conditions. Oncotarget 8, 73483–73500 (2017).
Article PubMed Google Scholar
Hofmann, W.-K. & Koeffler, H. P. Myelodysplastic syndrome. Annu. Rev. Med. 56, 1–16 (2005).
Article CAS PubMed Google Scholar
Garcia-Manero, G., Chien, K. S. & Montalban-Bravo, G. Myelodysplastic syndromes: 2021 update on diagnosis, risk stratification and management. Am. J. Hematol. 95, 1399–1420 (2020).
Article PubMed Google Scholar
Cremers, E. M. P. et al. Multiparameter flow cytometry is instrumental to distinguish myelodysplastic syndromes from non-neoplastic cytopenias. Eur. J. Cancer 54, 49–56 (2016).
Article PubMed Google Scholar
Porwit, A. et al. Revisiting guidelines for integration of flow cytometry results in the WHO classification of myelodysplastic syndromes—proposal from the International/European LeukemiaNet Working Group for Flow Cytometry in MDS. Leuk. vol. 28, 1793–1798, https://doi.org/10.1038/leu.2014.191 (2014).
Article CAS Google Scholar
Najean, Y. & Lecompte, T. Chronic pure thrombocytopenia in elderly patients. Asp. myelodysplastic Syndr. Cancer 64, 2506–2510 (1989).
CAS Google Scholar
Campo, E. & Harris, N. L. WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. (International Agency for Research on Cancer, 2017).
Kaferle, J. & Strzoda, C. E. Evaluation of macrocytosis. Am. Fam. Physician 79, 203–208 (2009).
PubMed Google Scholar
Vašeková, P., Szépe, P., Marcinek, J., Balhárek, T. & Plank, L. Klinicky relevantné možnosti a limity diferenciálnej diagnostiky megaloblastovej anémie a myelodysplastického syndrómu typu refraktérnej anémie v trepanobioptických vzorkách kostnej drene. Vnitr. Lek. 62, 692–697 (2016).
PubMed Google Scholar
Corey, S. J. et al. Myelodysplastic syndromes: the complexity of stem-cell diseases. Nat. Rev. Cancer 7, 118–129 (2007).
Article CAS PubMed Google Scholar
Devalia, V., Hamilton, M. S. & Molloy, A. M. & British Committee for Standards in Haematology. Guidelines for the diagnosis and treatment of cobalamin and folate disorders. Br. J. Haematol. 166, 496–513 (2014).
Article CAS PubMed Google Scholar
Platzbecker, U. Treatment of MDS. Blood 133, 1096–1107 (2019).
Article CAS PubMed Google Scholar
Uy, N., Singh, A., Gore, S. D. & Prebet, T. Hypomethylating agents (HMA) treatment for myelodysplastic syndromes: alternatives in the frontline and relapse settings. Expert Opin. Pharmacother. 18, 1213–1224 (2017).
Article CAS PubMed Google Scholar
Greenberg, P. L. et al. Revised international prognostic scoring system for myelodysplastic syndromes. Blood 120, 2454–2465 (2012).
Article CAS PubMed Google Scholar
Malcovati, L. et al. SF3B1 mutation identifies a distinct subset of myelodysplastic syndrome with ring sideroblasts. Blood 126, 233–241 (2015).
Article CAS PubMed Google Scholar
Malcovati, L. et al. SF3B1-mutant MDS as a distinct disease subtype: a proposal from the International Working Group for the Prognosis of MDS. Blood 136, 157–170 (2020).
Article CAS PubMed Google Scholar
Papaemmanuil, E. et al. Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood 122, 3616–3627 (2013).
Article CAS PubMed Google Scholar
Haferlach, T. et al. Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia 28, 241–247 (2014).
Article CAS PubMed Google Scholar
Langenhuijsen, M. M. Neutrophils with ring-shaped nuclei in myeloproliferative disease. Br. J. Haematol. 58, 227–230 (1984).
Article CAS PubMed Google Scholar
Kuriyama, K., Tomonaga, M., Matsuo, T., Ginnai, I. & Ichimaru, M. Diagnostic significance of detecting pseudo-Pelger-Huët anomalies and micro-megakaryocytes in myelodysplastic syndrome. Br. J. Haematol. 63, 665–669 (1986).
Article CAS PubMed Google Scholar
Davey, F. R., Erber, W. N., Gatter, K. C. & Mason, D. Y. Abnormal neutrophils in acute myeloid leukemia and myelodysplastic syndrome. Hum. Pathol. 19, 454–459 (1988).
Article CAS PubMed Google Scholar
de Swart, L. et al. Cytomorphology review of 100 newly diagnosed lower-risk MDS patients in the European LeukemiaNet MDS (EUMDS) registry reveals a high inter-observer concordance. Ann. Hematol. 96, 1105–1112 (2017).
Article PubMed Google Scholar
Howe, R. B., Porwit-MacDonald, A., Wanat, R., Tehranchi, R. & Hellström-Lindberg, E. The WHO classification of MDS does make a difference. Blood 103, 3265–3270 (2004).
Article CAS PubMed Google Scholar
Goasguen, J. E. et al. Morphological evaluation of monocytes and their precursors. Haematologica 94, 994–997 (2009).
Article PubMed Google Scholar
Foucar, K. et al. Concordance among hematopathologists in classifying blasts plus promonocytes: A bone marrow pathology group study. Int. J. Lab. Hematol. 42, 418–422 (2020).
Article PubMed Google Scholar
Zini, G. et al. A European consensus report on blood cell identification: terminology utilized and morphological diagnosis concordance among 28 experts from 17 countries within the European LeukemiaNet network WP10, on behalf of the ELN Morphology Faculty. Br. J. Haematol. 151, 359–364 (2010).
Article PubMed Google Scholar
Brück, O. E. et al. Machine Learning of Bone Marrow Histopathology Identifies Genetic and Clinical Determinants in Patients with MDS. Blood Cancer Discov. 2, 238–249 (2021).
Article PubMed Google Scholar
Eckardt, J.-N. et al. Deep learning detects acute myeloid leukemia and predicts NPM1 mutation status from bone marrow smears. Leukemia https://doi.org/10.1038/s41375-021-01408-w (2021).
Nagata, Y. et al. Machine learning demonstrates that somatic mutations imprint invariant morphologic features in myelodysplastic syndromes. Blood 136, 2249–2262 (2020).
Article PubMed Google Scholar
Matek, C., Krappe, S., Münzenmayer, C., Haferlach, T. & Marr, C. Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set. Blood 138, 1917–1927 (2021).
Article CAS PubMed Google Scholar
Rollison, D. E. et al. Epidemiology of myelodysplastic syndromes and chronic myeloproliferative disorders in the United States, 2001-2004, using data from the NAACCR and SEER programs. Blood 112, 45–52 (2008).
Article CAS PubMed Google Scholar
Castle, W. B. Megaloblastic anemia. Postgrad. Med. 64, 117–122 (1978).
Article CAS PubMed Google Scholar
Torrez, M., Chabot-Richards, D., Babu, D., Lockhart, E. & Foucar, K. How I investigate acquired megaloblastic anemia. Int. J. Lab. Hematol. 44, 236–247 (2022).
Article PubMed Google Scholar
Falk, T. et al. U-Net: deep learning for cell counting, detection, and morphometry. Nat. Methods 16, 67–70 (2018).
Article PubMed Google Scholar
Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. arXiv [cs.LG] (2016).
Sommer, C., Straehle, C., Kothe, U. & Hamprecht, F. A. Ilastik: interactive learning and segmentation toolkit. in Proceedings - International Symposium on Biomedical Imaging https://doi.org/10.1109/ISBI.2011.5872394 (2011).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
Article PubMed PubMed Central Google Scholar
Mingqiang, Y., Kidiyo, K. & Joseph, R. A Survey of Shape Feature Extraction Techniques. in Pattern Recognition Techniques, Technology and Applications (2008).
Jerome, A., Hastie, T., Tibshirani, R. & Simon, N. Package ‘ glmnet’. (2019).
Stone, M. Cross-Validatory Choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Stat. Methodol. 36, 111–147 (1974).
MathSciNet MATH Google Scholar
Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 559, 400–404 (2018).
Article ADS CAS PubMed Google Scholar
Pollyea, D. A., Hedin, B. R., O’Connor, B. P. & Alper, S. Monocyte function in patients with myelodysplastic syndrome. J. Leukoc. Biol. 104, 641–647 (2018).
Article CAS PubMed Google Scholar
Silzle, T. et al. Lymphopenia at diagnosis is highly prevalent in myelodysplastic syndromes and has an independent negative prognostic value in IPSS-R-low-risk patients. Blood Cancer J. 9, 63 (2019).
Article PubMed Google Scholar
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Article PubMed Google Scholar
Cui, R. et al. Clinical importance of SF3B1 mutations in Chinese with myelodysplastic syndromes with ring sideroblasts. Leuk. Res. 36, 1428–1433 (2012).
Article CAS PubMed Google Scholar
Cazzola, M. et al. Natural history of idiopathic refractory sideroblastic anemia. Blood 71, 305–312 (1988).
Article CAS PubMed Google Scholar
Clough, C. A. et al. Coordinated missplicing of TMEM14C and ABCB7 causes ring sideroblast formation in SF3B1-mutant myelodysplastic syndrome. Blood 139, 2038–2049 (2022).
Article CAS PubMed Google Scholar
Lindenbaum, J. & Nath, B. J. Megaloblastic anaemia and neutrophil hypersegmentation. Br. J. Haematol. 44, 511–513 (1980).
Article CAS PubMed Google Scholar
Westerman, D. A., Evans, D. & Metz, J. Neutrophil hypersegmentation in iron deficiency anaemia: a case-control study. Br. J. Haematol. 107, 512–515 (1999).
Article CAS PubMed Google Scholar
van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775–784 (2021).
Article PubMed Google Scholar
Kobayashi, S., Saltz, J. H. & Yang, V. W. State of machine and deep learning in histopathological applications in digestive diseases. World J. Gastroenterol. 27, 2545–2575 (2021).
Article PubMed Google Scholar
Cohen, J. P. et al. Problems in the deployment of machine-learned models in health care. CMAJ: Can. Med. Assoc. J. = J. de. l’Assoc. Med. canadienne 193, E1391–E1394 (2021).
Article Google Scholar
Volovici, V., Syn, N. L., Ercole, A., Zhao, J. J. & Liu, N. Steps to avoid overuse and misuse of machine learning in clinical research. Nat. Med. 28, 1996–1999 (2022).
Article CAS PubMed Google Scholar
Sekeres, M. A. et al. The National MDS Natural History Study: design of an integrated data and sample biorepository to promote research studies in myelodysplastic syndromes. Leuk. Lymphoma 60, 3161–3171 (2019).
Article PubMed Google Scholar
Chen, R. J. et al. Scaling vision Transformers to gigapixel images via hierarchical self-supervised learning. arXiv [cs.CV] (2022).
Matek, C., Schwarz, S., Spiekermann, K. & Marr, C. Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nat. Mach. Intell. 1, 538–544 (2019).
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269 (2017).
Abramson, N. Rouleaux formation. Blood 107, 4205 (2006).
Article PubMed Google Scholar
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986).
Article CAS PubMed Google Scholar
Alomari, Y. M., Sheikh Abdullah, S. N. H., Zaharatul Azma, R. & Omar, K. Automatic detection and quantification of WBCs and RBCs using iterative structured circle detection algorithm. Comput. Math. Methods Med. 2014, 979302 (2014).
Article PubMed MATH Google Scholar
Elsalamony, H. A. Healthy and unhealthy red blood cell detection in human blood smears using neural networks. Micron 83, 32–41 (2016).
Article CAS PubMed Google Scholar
Tomari, R., Zakaria, W. N. W., Jamil, M. M. A., Nor, F. M. & Fuad, N. F. N. Computer aided system for red blood cell classification in blood smear image. Procedia Comput. Sci. 42, 206–213 (2014).
Article Google Scholar
Delgado-Font, W. et al. Diagnosis support of sickle cell anemia by classifying red blood cell shape in peripheral blood images. Med. Biol. Eng. Comput. 58, 1265–1284 (2020).
Article PubMed Google Scholar
Sunarko, B. et al. Red blood cell classification on thin blood smear images for malaria diagnosis. J. Phys. Conf. Ser. 1444, 012036 (2020).
Article Google Scholar
Chadha, G. K., Srivastava, A., Singh, A., Gupta, R. & Singla, D. An automated method for counting red blood cells using image processing. Procedia Comput. Sci. 167, 769–778 (2020).
Article Google Scholar
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).
Article CAS PubMed Google Scholar
Andrade, A. R. et al. Recent computational methods for white blood cell nuclei segmentation: a comparative study. Comput. Methods Prog. Biomed. 173, 1–14 (2019).
Article Google Scholar
Amores, J. Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013).
Article MathSciNet MATH Google Scholar
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. arXiv [cs.LG] (2014).
R Core Team. R: A Language and Environment for Statistical Computing. https://www.R-project.org (2020).
Venables, W. N. & Ripley, B. D. MASS: modern applied statistics with S. R package version.
Dinno, A. dunn. test: Dunn’s test of multiple comparisons using rank sums. R package version.
Van Rossum & Drake. The python language reference. Python software foundation.
Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019).
Almeida, J. G. josegcpa/wbs-prediction: PBS-Prediction-Code-final. https://doi.org/10.5281/zenodo.7276598 (2022).
Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

J.G.A. was supported by the NIHR Cambridge BRC and their opinions are not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. G.S.V. is funded by a Cancer Research UK Senior Cancer Fellowship (C22324/A23015) and work in his lab is also funded by the European Research Council, Kay Kendall Leukaemia Fund, Blood Cancer UK and the Wellcome Trust. We would like to acknowledge Drs. Martin Besser, James Russell, and Duncan Brian for their efforts in annotating blood cells.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

José Guilherme de Almeida
Present address: Champalimaud Foundation—Centre for the Unknown, Lisbon, Portugal

Authors and Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
José Guilherme de Almeida & Moritz Gerstung
Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
Emma Gudgin, Martin Besser & William G. Dunn
Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Jonathan Cooper & George S. Vassiliou
Munich Leukemia Laboratory GmbH, Munich, Germany
Torsten Haferlach
Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
George S. Vassiliou
Department of Haematology, University of Cambridge, Cambridge, UK
George S. Vassiliou
Division of AI in Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
Moritz Gerstung

Authors

José Guilherme de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Emma Gudgin
View author publications
You can also search for this author in PubMed Google Scholar
Martin Besser
View author publications
You can also search for this author in PubMed Google Scholar
William G. Dunn
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Haferlach
View author publications
You can also search for this author in PubMed Google Scholar
George S. Vassiliou
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Gerstung
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Contribution: M.G., G.S.V., and J.G.A. conceived the project and wrote the manuscript. J.G.A. developed and implemented the project. J.G.A., E.G., and J.C. scanned peripheral blood slides. T.H. provided peripheral blood slides and assisted with scanning and retrieval of blood count data for the Munich Leukemia Laboratory Set. W.G.D. retrieved blood count data. M.B. and E.G. annotated white blood cells.

Corresponding authors

Correspondence to George S. Vassiliou or Moritz Gerstung.

Ethics declarations

Competing interests

G.S.V. is a consultant for Astrazeneca and STRM.BIO. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Guillermo Garcia-Manero and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

de Almeida, J.G., Gudgin, E., Besser, M. et al. Computational analysis of peripheral blood smears detects disease-associated cytomorphologies. Nat Commun 14, 4378 (2023). https://doi.org/10.1038/s41467-023-39676-y

Download citation

Received: 27 April 2022
Accepted: 22 June 2023
Published: 20 July 2023
DOI: https://doi.org/10.1038/s41467-023-39676-y

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Mapping genotypes to chromatin accessibility profiles in single cells

Single-cell and spatial transcriptomics analysis of non-small cell lung cancer

The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms

Introduction

Results

The MLL cohort captures previously described clinical features of MDS and anemia

Computational cytomorphology of peripheral blood slides

Morphological heterogeneity informs disease prediction

Discovering diagnostically relevant morphotypes

Computational cytomorphology validation

Discussion

Methods

Collection and digitalization of peripheral blood slides

Haemorasis—computational detection and characterization of blood cells

Cytomorphological prediction of clinical conditions

Machine-learning using morphometric moments

Morphotype analysis

Validation

Expert annotation of blood cells

External validation

Statistical analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links