ÌÇÐÄÊÓÆµ

July 3, 2025

Scientists tackle single-cell data's reliability crisis with new tool 'scICE'

Quantification of clustering consistency using Inconsistency Coefficients (IC). Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-60702-8
× close
Quantification of clustering consistency using Inconsistency Coefficients (IC). Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-60702-8

The ability to analyze gene expression at the single-cell level—known as single-cell RNA sequencing (scRNA-seq)—has transformed life sciences, driving discoveries across immunology, oncology, and developmental biology. Over 40,000 studies have leveraged this technique to map the complex diversity of cells within tissues and organisms.

Yet beneath this explosive growth lies a persistent problem: clustering instability. When researchers attempt to group cells by expression patterns to identify or disease states, they often face inconsistent results—even when analyzing the same dataset repeatedly.

Inaccurate clustering can lead to misclassifying as cancerous or missing rare but critical cell types—jeopardizing interpretation and therapeutic decisions. This "reliability crisis" forces scientists to rerun analyses or rely on computationally expensive pipelines to extract trustworthy insights.

Now, a research team led by Professor Kim Jae Kyoung of the Korea Advanced Institute of Science and Technology (KAIST) and the Institute for Basic Science (IBS) has developed a solution: a mathematical framework named scICE (single-cell Inconsistency Clustering Estimator). The study is in the journal Nature Communications.

Traditionally, clustering reliability is assessed by deriving a consensus through repeated analysis of whether individual cell pairs are classified into the same cluster. However, this approach is a computationally demanding process, ill-suited for large-scale datasets with tens of thousands of cells.

In contrast, scICE can be applied to large-scale datasets as it bypasses the computationally demanding process of pairwise co-clustering. It instead employs a mathematically defined Inconsistency Coefficient (IC) to assess the stability of cell assignments directly. This allows the tool to efficiently detect and filter out unreliable assignments, preserving only the most stable and biologically meaningful clusters.

Different random seeds in the clustering algorithm yield different clustering outcomes. Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-60702-8
× close
Different random seeds in the clustering algorithm yield different clustering outcomes. Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-60702-8

Dr. Kim Hyun, the first author of the paper (IBS), explained, "Reliability in single-cell clustering has long been overlooked. scICE opens a new path for quickly and easily verifying results."

The research team validated the effectiveness of scICE by applying it to 48 real and simulated scRNA-seq datasets collected from various tissues, including the brain, lungs, and blood. The results revealed that approximately two-thirds of existing analyses were statistically unstable and unreliable.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

Meanwhile, scICE efficiently selected only a small number of reliable results, saving researchers' time and computational resources while maintaining high accuracy.

scICE provides a way to validate clustering outcomes mathematically, ensuring higher confidence in conclusions drawn from single-cell data. Additionally, scICE has drawn attention for its ability to effectively detect rare cell types, which are often overlooked by conventional clustering methods.

In practice, scICE reliably identified rare immune cells that can be easily missed in conventional analyses, using subclustering based on its framework.

Corresponding author Professor Kim Jae Kyoung stated, "scICE will help researchers swiftly pursue follow-up studies based on reliable results. I hope it becomes a standard tool for trustworthy data interpretation across the life sciences."

The research team made scICE publicly available on .

More information: Hyun Kim et al, scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation, Nature Communications (2025).

Journal information: Nature Communications

Load comments (0)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

fact-checked
peer-reviewed publication
trusted source
proofread

Get Instant Summarized Text (GIST)

scICE is a mathematical framework that addresses clustering instability in single-cell RNA sequencing (scRNA-seq) data by using an Inconsistency Coefficient (IC) to directly assess cluster reliability. Applied to 48 datasets, scICE identified that about two-thirds of existing analyses were unstable, while efficiently isolating reliable clusters and detecting rare cell types, improving accuracy and computational efficiency.

This summary was automatically generated using LLM.