BIOCODES

Live · Runs entirely in your browser

Generate an ISCC-SUM - Client-side in your Browser

Drop in any file and watch its 256-bit Data-Code and Instance-Code appear. The bytes never leave this browser tab — hashing runs locally in WebAssembly, so even large microscopy images are fingerprinted entirely client-side, at full speed.

ISCC-SUM reads your file once and derives two complementary fingerprints, then combines them into a single composite code:

Data-Code similarity

A similarity-preserving fingerprint. Near-identical files produce near-identical codes, so variants and near-duplicates stay close — measurable by simple bit-distance.

Instance-Code identity

An exact BLAKE3 checksum of the bytes. Change a single bit and it changes completely — proof of integrity and exact duplicates.

Drop a file here or click to choose — nothing is uploaded

Starting the WebAssembly engine…

Composite ISCC-SUM 128-bit units

—

Data-Code similarity

—

Instance-Code identity

—

BLAKE3 datahash multihash 1e20

—

Try and compare with local ISCC-SUM generation

Reproduce this result natively on your own machine. Install uv once, then run ISCC-SUM on the same file — the codes should match bit for bit.

1 Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
2 Run ISCC-SUM

uvx iscc-sum --units …

Nothing is uploaded

Your file never leaves this tab. Every byte is hashed locally in your browser.

Built for big data

Single-pass streaming keeps memory bounded — the same engine handles multi-GB images.

ISO 24138 core

The Rust core of the standard, compiled to WebAssembly. Reproducible anywhere.

Powered by @iscc/wasm, the WebAssembly build of iscc-lib (the Rust core of ISO 24138).

Enhancing AI-Readiness of Bioimaging Data
with Content-Based Identifiers

Challenge

Growing volume of data: Bio(imaging) data exist at different states — raw, repository, publications - with no shared identity across them.
No Audit Trail: No reliable way to verify data integrity or detect manipulation after the fact.
Lost Provenance: Published figures are disconnected from raw data and the processing steps that produced them.

Solution

International Standard Content Code (ISCC ISO 24138). Interoperable and open-source content identification & fingerprinting system. Computed directly from the asset itself - can never be removed or decoupled from the data.
Anyone can compute ISCCs from available data - independently and without any central authority. Learn more
Sign and timestamp ISCCs to create persistent identifiers that securely link repository data with the figures in published papers.

Scientific Impact

Cryptographic figure data verification via the ISCC audit trail.
Better data integrity and reusability through transparent, verifiable provenance chains.
AI-ready bioimaging datasets with verified origins, so AI models can train on data you can trace.

ISO 24138

International Standard Content Code

A standardised (ISO 24138) multi-component fingerprint for various media types and file formats. Computed from the asset itself — it can never be removed or decoupled from the data.

Semantic level

Detects conceptually related content

Syntactic level

Detects near-duplicate and structurally similar content

Data level

Detects exact copies via cryptographic hash

ISCC Homepage Specification ISO 24138

ISCC-ID

A persistent identifier derived from ISCC content codes. ISCC-IDs connect raw data, processed derivatives, and published figures into one auditable provenance chain.

Fingerprinting

Helps find metadata even when filenames or paths have changed.

Digital signing

Proves authenticity of the content and its originator.

Timestamping

Demonstrates when content was created or registered.

Secure linking

Verifies provenance across repositories, publications, and analysis pipelines.

Capabilities

Built for scientific data & bioimaging

Designed for the specific problems of large-scale imaging data in research.

ISO 24138:2024 Compliant

Follows the international standard, so codes generated anywhere are compatible everywhere — across institutions, repositories, and tools.

High Performance

Rust-based engine processes data at 1+ GB/s — up to 184× faster than the pure Python reference implementation, and faster than SHA-256.

Format Agnostic

Works with OME-TIFF, OME-Zarr, CZI, ND2, LIF, DICOM, HDF5 and virtually any binary scientific data format.

FAIR Principles

Meets the Findable, Accessible, Interoperable, and Reusable data requirements from EOSC and European funding bodies.

AI-Ready Data

Content-based identifiers survive format conversions, so provenance stays intact when datasets move into AI training pipelines.

Platform Integration

Native plugins for OMERO and Galaxy, with Napari, CellProfiler, and ImageJ integrations in progress.

Open Source Tools

The BIOCODES toolkit

Three complementary tools covering the full bioimaging identification workflow, all Apache 2.0 licensed.

iscc-sum

Stable v0.1

High-performance ISCC Data-Code and Instance-Code generation. Single-pass processing with a Rust core and Python bindings — a drop-in replacement for md5sum and sha256sum in scientific pipelines. Faster than SHA-256 at any data size. Core algorithms are integrated into iscc-lib, a polyglot Rust library with native bindings for Python, Node.js, Go, Java, .NET, Swift, Kotlin, Ruby, C++, and WebAssembly.

pip install iscc-sum

Platforms: Linux macOS Windows

Formats: Zarr HDF5 OME-TIFF NGFF

Docs Quickstart GitHub PyPI iscc-lib

Rust + Python CLI Polyglot via iscc-lib Apache 2.0

iscc-bio

Beta

ISCC processing for multi-dimensional bioimage data. Implements the IMAGEWALK specification — deterministic Z→C→T plane traversal for format-agnostic, reproducible content hashing of microscopy volumes.

Platforms: Linux macOS Windows

Formats: OME-TIFF OME-Zarr CZI ND2 LIF DICOM HDF5

Visualization GitHub PyPI

Python OME-TIFF OME-Zarr CZI / ND2 / LIF Apache 2.0

omero-iscc

Alpha

OMERO server plugin. Generates and stores ISCC identifiers automatically on image import, so facilities can deduplicate and track provenance without extra steps.

Platforms: Linux macOS

GitHub

OMERO Plugin Python Server Apache 2.0

iscc-bio

IMAGEWALK — Bio Codes for multi-dimensional bioimages

iscc-bio implements deterministic plane traversal (Z → C → T) for format-agnostic, reproducible content identification of microscopy volumes. Explore the interactive 3D visualization to see how ISCC-SUM and per-plane SIMPRINTs are generated from multi-scene bioimage data.

Deterministic traversal

IMAGEWALK defines a canonical Z → C → T ordering that produces identical content codes regardless of file format or reader library.

Per-plane SIMPRINTs

Each 2D plane is extracted, flattened to row-major order, normalized to big-endian bytes, and hashed to produce a similarity-preserving fingerprint.

Multi-scene support

Handles multi-scene containers (CZI, ND2, LIF) — each scene gets its own ISCC-SUM, enabling granular identification within complex acquisitions.

Similarity search at scale

Outputs conform to the iscc-search index schema - no conversion needed. ISCC-SUMs and per-plane SIMPRINTs can be indexed directly for asset-level and segment-level similarity search across billions of codes.

Z

C

T

Launch IMAGEWALK Playground →

Launch IMAGEWALK Playground View on GitHub

Community & Integrations

Built for the open bioimaging ecosystem

BIOCODES integrates with the tools researchers already use — no new infrastructure required.

OMERO

Available

Server plugin that generates ISCCs automatically on image import. Facilities get deduplication and provenance tracking without changing their existing OMERO workflows.

Automatic ISCC generation on import
Facility-level deduplication
FAIR-compliant metadata annotation

omero-iscc

Galaxy

Available

Galaxy tools for ISCC generation, near-duplicate detection, and content verification within reproducible Galaxy workflows. Part of the BMCV galaxy-image-analysis tool suite.