SCIFIO future directions

Tags: #<Tag:0x00007fd5406f1f40>


As part of a progress report on SCIFIO, I quickly wrote up some future directions for the project. I figure it’s informative enough that the greater community here may be interested, so I reproduce it below. It’s a bit rough, but conveys my current thoughts on where SCIFIO is heading.

The initial architecture of SCIFIO is complete and working. However, there are some architectural improvements which will benefit SCIFIO in terms of scalability and modularity:

  1. SCIFIO currently operates on planes as its fundamental unit of data. This needs to be generalized to N-dimensional blocks. It is increasingly common for systems to produce extremely large image planes. While SCIFIO is capable of extracting subsets of these planes at a time, the API for doing so is not ideal; it makes more sense for SCIFIO to operate on blocks of data, which may or may not correspond to hyperslices of N-dimensional data. This generalization will facilitate concrete use cases e.g. for the Keller and Saalfeld labs at Janelia Research Campus. Operating on blocks will also be an important step on ImageJ’s path toward a fully scalable N-dimensional image viewer which loads pixels on demand from data sources, regardless of source format. [see also scifio/scifio#283]

  2. SCIFIO is built on top of the SciJava Common library, and some of its pieces belong upstream there. In particular, we are currently enhancing SciJava Common to better support Locations as extensible data descriptors, coupled with DataHandle plugins which know how to read and write data for each given type of Location. This two-layer scheme was developed to better support use cases such as OMERO servers, which cannot operate with a “random access to bytes” paradigm alone. On the SCIFIO side, its existing DataHandle classes need to be updated and pushed upstream into SciJava Common. This will actually slim down the SCIFIO codebase, so that it can better focus on its primary goal of image I/O. [see also scijava/scijava-common more-handles, scijava/scijava-common#167]

  3. The ImageJ Common library provides the foundation of SCIFIO’s image data model. As SCIFIO developed, it became increasingly clear that SCIFIO should inherit as much as possible from ImageJ Common, rather than reinventing its own metadata structures. For this and other reasons, we have recently been making a major effort toward solidifying and improving the ImageJ data model in ImageJ Common. The SCIFIO library needs to be updated to fully leverage the ImageJ data model, including newly emerging layers such as RichImage and MetaSpace and the updated Dataset which builds on them. This work is a joint effort of the ImgLib2, ImageJ and SCIFIO teams. [see also 2016-06-23 - ImgLib2 hackathon]

  4. In the ImageJ world, support for quite a few image formats is still provided using the ImageJ 1.x plugin paradigm. This approach allows images to be opened via the ImageJ user interface and legacy API, but is a fundamentally limited compared to SCIFIO, since other tools such as KNIME Image Processing will not benefit from it, and the supported metadata is extremely limited—typically, little to no metadata is parsed by these legacy plugins. These formats should all be updated to the SCIFIO paradigm. We also want to provide better and more thorough support for movie formats, since it is a very common request from users of ImageJ. [see also fiji/IO, Links#Movie_support, scifio/scifio-javacv]

  5. The size of images continues to explode, so it is vital that working with these formats be as performant as possible. One of the original goals of the SCIFIO grant proposal was to improve the speed of common formats like TIFF. Unfortunately, there is still work to be done on that front. There is nothing inherently performance limited in SCIFIO’s architecture—it just takes time and effort to identify and address the performance bottlenecks in specific cases.

  6. SCIFIO was originally envisioned as a refactoring of Bio-Formats, and much of the code (especially for specific formats) was forked from the Bio-Formats project. As such, there are still “Bio-Formats-isms” in various places of the SCIFIO code, which need to be refactored away in favor of the N-dimensional SCIFIO/ImageJ data model.

  7. Since multiple actively developed software projects are using SCIFIO, bug reports roll in regularly. Right now, the SCIFIO issue tracker has more than 150 open issues, each of which needs investigation, review and often debugging to resolve. Fixing these issues will make SCIFIO stronger and more robust.

  8. Currently, the ImageJ user interface lets users switch between SCIFIO and legacy (ImageJ 1.x) I/O mechanisms for operations such as File :arrow_forward: Open. At the moment, there are tradeoffs between the two modes: SCIFIO generally supports more image types, including some TIFFs which ImageJ 1.x cannot read, while the legacy mode is often faster for formats like TIFF when it works. We want the SCIFIO mechanism to work better and faster than the legacy mode in all cases, so that users no longer need to worry about switching between these two modes manually. Achieving this goal largely depends on items 4, 5 and 7 above.

What hardware is needed for running ImageJ with big data?