Scientists tackle single-cell data's reliability crisis with new tool 'scICE'

Sadie Harley
scientific editor

Robert Egan
associate editor

The ability to analyze gene expression at the single-cell level鈥攌nown as single-cell RNA sequencing (scRNA-seq)鈥攈as transformed life sciences, driving discoveries across immunology, oncology, and developmental biology. Over 40,000 studies have leveraged this technique to map the complex diversity of cells within tissues and organisms.
Yet beneath this explosive growth lies a persistent problem: clustering instability. When researchers attempt to group cells by expression patterns to identify cell types or disease states, they often face inconsistent results鈥攅ven when analyzing the same dataset repeatedly.
Inaccurate clustering can lead to misclassifying normal cells as cancerous or missing rare but critical cell types鈥攋eopardizing interpretation and therapeutic decisions. This "reliability crisis" forces scientists to rerun analyses or rely on computationally expensive pipelines to extract trustworthy insights.
Now, a research team led by Professor Kim Jae Kyoung of the Korea Advanced Institute of Science and Technology (KAIST) and the Institute for Basic Science (IBS) has developed a solution: a mathematical framework named scICE (single-cell Inconsistency Clustering Estimator). The study is in the journal Nature Communications.
Traditionally, clustering reliability is assessed by deriving a consensus through repeated analysis of whether individual cell pairs are classified into the same cluster. However, this approach is a computationally demanding process, ill-suited for large-scale datasets with tens of thousands of cells.
In contrast, scICE can be applied to large-scale datasets as it bypasses the computationally demanding process of pairwise co-clustering. It instead employs a mathematically defined Inconsistency Coefficient (IC) to assess the stability of cell assignments directly. This allows the tool to efficiently detect and filter out unreliable assignments, preserving only the most stable and biologically meaningful clusters.

Dr. Kim Hyun, the first author of the paper (IBS), explained, "Reliability in single-cell clustering has long been overlooked. scICE opens a new path for quickly and easily verifying results."
The research team validated the effectiveness of scICE by applying it to 48 real and simulated scRNA-seq datasets collected from various tissues, including the brain, lungs, and blood. The results revealed that approximately two-thirds of existing analyses were statistically unstable and unreliable.
Meanwhile, scICE efficiently selected only a small number of reliable results, saving researchers' time and computational resources while maintaining high accuracy.
scICE provides a way to validate clustering outcomes mathematically, ensuring higher confidence in conclusions drawn from single-cell data. Additionally, scICE has drawn attention for its ability to effectively detect rare cell types, which are often overlooked by conventional clustering methods.
In practice, scICE reliably identified rare immune cells that can be easily missed in conventional analyses, using subclustering based on its framework.
Corresponding author Professor Kim Jae Kyoung stated, "scICE will help researchers swiftly pursue follow-up studies based on reliable results. I hope it becomes a standard tool for trustworthy data interpretation across the life sciences."
The research team made scICE publicly available on .
More information: Hyun Kim et al, scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation, Nature Communications (2025).
Journal information: Nature Communications
Provided by Institute for Basic Science