Funded by EU Horizon Europe · OSCARS Initiative

Similarity-Preserving Codes
for Bioimaging Data

BIOCODES brings the ISO 24138 International Standard Content Code to bioimaging data. Verify integrity, find duplicates, and trace provenance across platforms — from raw data to publication.

Get Started Browse Tools Forum

Apache 2.0 · Open Source ISO 24138:2024 FAIR Principles

Enhancing AI-Readiness of Bioimaging Data
with Content-Based Identifiers

Challenge

Growing volume of data: Bio(imaging) data exist at different states — raw, repository, publications — with no shared identity across them.
No Audit Trail: No reliable way to verify data integrity or detect manipulation after the fact.
Lost Provenance: Published figures are disconnected from raw data and the processing steps that produced them.

Solution

International Standard Content Code (ISCC ISO 24138). Open and open-source. Interoperable content identification & fingerprinting system. Computed directly from the asset itself — can never be removed or decoupled from the data.
Anyone can compute ISCCs from available data — independently and without any central authority. Use the ISCC Generator at iscc.io/resources. Learn more
Sign and timestamp ISCCs to create persistent identifiers that securely link repository data with the figures in published papers.

Scientific Impact

Cryptographic figure data verification via the ISCC audit trail.
Better data integrity and reusability through transparent, verifiable provenance chains.
AI-ready bioimaging datasets with verified origins, so AI models can train on data you can trace.

ISO 24138

International Standard Content Code

A standardised (ISO 24138) multi-component fingerprint for various media types and file formats. Computed from the asset itself — it can never be removed or decoupled from the data.

Semantic level

Detects conceptually related content

Syntactic level

Detects near-duplicate and structurally similar content

Data level

Detects exact copies via cryptographic hash

ISCC Homepage Specification ISO 24138

ISCC-ID

A persistent identifier derived from ISCC content codes. ISCC-IDs connect raw data, processed derivatives, and published figures into one auditable provenance chain.

Fingerprinting

Helps find metadata even when filenames or paths have changed.

Digital signing

Proves authenticity of the content and its originator.

Timestamping

Demonstrates when content was created or registered.

Secure linking

Verifies provenance across repositories, publications, and analysis pipelines.

Capabilities

Built for scientific data & bioimaging

Designed for the specific problems of large-scale imaging data in research.

ISO 24138:2024 Compliant

Follows the international standard, so codes generated anywhere are compatible everywhere — across institutions, repositories, and tools.

High Performance

Rust-based engine processes data at 1+ GB/s — up to 184× faster than the pure Python reference implementation, and faster than SHA-256.

Format Agnostic

Works with OME-TIFF, OME-Zarr, CZI, ND2, LIF, DICOM, HDF5 and virtually any binary scientific data format.

FAIR Principles

Meets the Findable, Accessible, Interoperable, and Reusable data requirements from EOSC and European funding bodies.

AI-Ready Data

Content-based identifiers survive format conversions, so provenance stays intact when datasets move into AI training pipelines.

Platform Integration

Native plugins for OMERO and Galaxy, with Napari, CellProfiler, and ImageJ integrations in progress.

Open Source Tools

The BIOCODES toolkit

Three complementary tools covering the full bioimaging identification workflow, all Apache 2.0 licensed.

iscc-sum

Stable v0.1

High-performance ISCC Data-Code and Instance-Code generation. Single-pass processing with a Rust core and Python bindings — a drop-in replacement for md5sum and sha256sum in scientific pipelines. Faster than SHA-256 at any data size.

pip install iscc-sum

Platforms: Linux macOS Windows

Formats: Zarr HDF5 OME-TIFF NGFF

Docs Quickstart GitHub PyPI

Rust + Python CLI Apache 2.0

iscc-bio

Beta

ISCC processing for multi-dimensional bioimage data. Implements the IMAGEWALK specification — deterministic Z→C→T plane traversal for format-agnostic, reproducible content hashing of microscopy volumes.

Platforms: Linux macOS Windows

Formats: OME-TIFF OME-Zarr CZI ND2 LIF DICOM HDF5

GitHub PyPI

Python OME-TIFF OME-Zarr CZI / ND2 / LIF Apache 2.0

omero-iscc

Alpha

OMERO server plugin. Generates and stores ISCC identifiers automatically on image import, so facilities can deduplicate and track provenance without extra steps.

Platforms: Linux macOS

GitHub

OMERO Plugin Python Server Apache 2.0

Community & Integrations

Built for the open bioimaging ecosystem

BIOCODES integrates with the tools researchers already use — no new infrastructure required.

OMERO

Available

Server plugin that generates ISCCs automatically on image import. Facilities get deduplication and provenance tracking without changing their existing OMERO workflows.

Automatic ISCC generation on import
Facility-level deduplication
FAIR-compliant metadata annotation

omero-iscc

Galaxy

Available

Galaxy tools for ISCC generation, near-duplicate detection, and content verification within reproducible Galaxy workflows. Part of the BMCV galaxy-image-analysis tool suite.