Wrap ImagePlus VirtualStack into imglib2

imglib2
bigdataviewer
Tags: #<Tag:0x00007fb882fe3f48> #<Tag:0x00007fb882fe3b88>

#1

Hello,

I am running the following code to view an ImagePlus in the BigDataViewer:

final ImagePlus imp = IJ.getImage();
final Img<UnsignedShortType> image = ImageJFunctions.wrapShort( imp );
BdvSource bdv = BdvFunctions.show(image, "stream");
bdv.setDisplayRange(0, 1000);

First, of all it is awesome that this works! Thanks for the implementation of this.

I have a question: In my case the ImagePlus imp object is a VirtualStack that I am streaming from disk. My feeling from how long it takes to run above code in certain cases is that either the ImageJFunctions.wrapShort or later the BdvFunctions.show load the whole data set into RAM. I would understand if it loads the current time-point because it needs the whole volume to display in the BigDataViewer, but why would it load all the time-points?


#2

I think ImageJFunctions.wrapShort is loading the image into memory. See constructors of ShortImagePlus to which wrapShort redirects.

If I am not mistaken, ImageJ2 (imagej-legacy I assume) could take care of converting a VirtualStack to a CellImg that is backed by a file stored on disk.

Best,
Stefan


#3

I think I found the code (see below); and in fact it does load all of it into RAM.

@stelfrich: Do you have any more information about your idea of converting the VirtualStack to a CellImg? And, probably even better, is there somewhere example code of how to directly instantiate a CellImg from, e.g., paths to files on disk?


#4

I have used it with a @Parameter annotation in a plugin. In that case, it was taken care of as in “it just worked”. In that case, imagej-legacy took care of everything for me (which is really awesome if you think about it ;-)). You can see that when you open an image as virtual stack and run the following Groovy script:

// @Dataset img
print(img.getImgPlus().getImg().getClass());

Maybe @ctrueden could help out here on how to access that functionality directly (I’d also be interested in it…)?


#5

I found something related here: http://imagej.net/ImgLib2_Examples
Have a look at Example1b.java

But in the example it streams data from one file. Not sure how to use that kind of code for streaming from data distributed into multiple files. @ctrueden: Would that be possible?

And, as @stelfrich pointed out, something like
Img < T > imageCell = ImageJFunctions.wrapAsCellImg( imp );
would be great of course. Does it exist?


#6
Dataset d = ij.scifio().datasetIO().open("/path/to/my/file.ext")

#7

I just discussed in person with @StephanPreibisch and he said one would need some changes in the imglib2 code for wrapping a VirtualStack to a CellImg. But he was motivated to look into it :slight_smile:


#8

Hi Curtis,
Thanks for the answer, I was wondering about streaming data distributed into multiple files. Is there something like defining the dataset as a file list, probably with some information how the files correspond to the z,c,t?
I guess one could rephrase the question into how to programmatically access the bioformats checkbox “group files with similar names” through scifio?


#9
SCIFIOConfig config = new SCIFIOConfig();
config.groupableSetGroupFiles(true);
Dataset d = ij.scifio().datasetIO().open("/path/to/my/file.ext", config);

API may change in the future, but that’s the current way.


#10

There is something in bigdataviewer-fiji that you might use to do this:

From the javadoc:

 * ImageLoader backed by a ImagePlus. The ImagePlus may be virtual and in
 * contrast to the imglib2 wrappers, we do not try to load all slices into
 * memory. Instead slices are stored in {@link VolatileGlobalCellCache}.

So this does what you want basically.

However, it is not exactly made for this (but instead used by BDV for export from IJ stacks to HDF5).

You get the thing wrapped as a BDV ImgLoader.
Basically you can then get to an individual 3D slice of your image by .getSetupImgLoader( c ).getImage( t, 0 )
Unfortunately there is no easy way to get at the 4D or 5D stack directly.

Maybe you can just start with the above and see how far it gets you.

I’ll consider adding something like this to vistools to handle virtual stacks more directly, but I don’t know when I’ll have time to look into it. Actually, we are currently revising the imglib CellImg and Bdv Cache architecture, which will probably make it very easy to add this. Maybe next week, let’s see…


#11

@tpietzsch
Thanks for the answer! In fact, I was just about to post something concerning exactly the interplay of cellImg, Bdv and caching :slight_smile:

Namely the following; tested with this file: T0006.ome.tif.zip (599.5 KB)

public < T extends RealType< T > & NativeType< T > >
        void testBDVandSCIFIO(String path)
        throws ImgIOException
{
 
    //
    // Test SCIFIO cellImg with BDV
    //

    ImgOpener imgOpener = new ImgOpener();

    // Open as ArrayImg
    java.util.List<SCIFIOImgPlus<?>> imgs = imgOpener.openImgs( path );
    Img<T> img = (Img<T>) imgs.get(0);
    BdvSource bdv = BdvFunctions.show(img, "RAM");
    bdv.setDisplayRange(0,1000);

    // Open as CellImg
    SCIFIOConfig config = new SCIFIOConfig();
    config.imgOpenerSetImgModes( SCIFIOConfig.ImgMode.CELL );
    java.util.List<SCIFIOImgPlus<?>> cellImgs = imgOpener.openImgs( path, config );
    Img<T> cellImg = (Img<T>) cellImgs.get(0);
    BdvSource bdv2 = BdvFunctions.show(cellImg, "STREAM");
    bdv2.setDisplayRange(0,1000);

}

First of all it is awesome that all of that works! Thanks for all the efforts!

1.) First question is of course whether above code is correct or whether I am doing something stupid? It runs, but maybe I still do something sub-optimal?

Anyway, with above code, as probably to be expected, the visualization of the cellImg is much slower. To be honest I was a bit surprised as to how slow it is given that it is only 2.8 MB and was stored on SSD in my case.
I was wondering about a really simple caching strategy for the BDV, in pseudo code:

t = timePointToBeDisplayed
bdvTaskRender(cellImg(t)).start()
if(cellImg(t) fits in RAM):
  arrayImgThisTimePoint = cellImg(t).loadAsArrayImg()
  bdvTaskRender(cellImg(t)).end()
  bdvTaskRender(arrayImgThisTimePoint).start()

Probably super naive, but I think for a long time-series where each time-point is not too big it could help somewhat?!

2.) I just realized that above pseudo code would make no sense if the cellImg itself could be the cache, i.e. hold some parts of the data in RAM (and for others only know how to load them if needed). Can the cellImg be its own cache or do you need another data structure on top?


#12

@Christian_Tischer All images that BDV reads from HDF5/CATMAID/etc are CellImgs backed by a cache. So I would not say that CellImg is slow per se. It is of course slower than ArrayImg, but in your case I think that there is some other problem. I’m in the middle of releasing a lot of stuff currently, so nothing works on my computer. I’ll try your example when I get to a stable state again and see whether I can find out more.


#13

Thanks for looking into this!

Is there somewhere some information on how a CellImg really works? I looked at the imglib2 publication and also looked through the source code on github, but neither really helped me.

For instance I do not understand, how the caching works, i.e. how and where is it implemented that some parts of the data are in RAM and some or not…is this logic part of the CellImg class itself?


#14

The SCIFIO-backed CellImg implementation lives in io.scif:scifio for the moment. But @tpietzsch has just now rewritten/improved all of that—there is now imglib2-cache whose whole purpose in life is to facilitate this sort of thing. :smile: We will be switching SCIFIO to use the official-ImgLib2-sanctioned way of doing disk-based cell caching, as soon as things stabilize.


#15

Yes! Sorry for the long silence. I was planning to come back to this discussion with a working example. But as usual everything takes longer than expected…


#16

I am wondering what is the best way to open up tif a image one frame at a time, using ImageJ2/imglib2/scifio

Basically I have an op that works on a 2D image, and it needs to process frames from a 3D image that is tens of thousands of frames long.

I was going to start a new post, however it looks like this thread is relevant. So am I to understand that this is all bleeding edge, and there is a working example coming soon?? I may poke around and try to see if I can get a working example going myself, however if there are any updates I should be aware of let me know.

Thanks


#17

The general idea is that using ImgOpener, you can open the dataset as a SCIFIO cell image, which reads pixels on demand. From there, you should be able to access the frames you need, without needing to read all pixels from the entire dataset. Have you tried it? Does the metadata parsing step take too long? Other problems? Happy to discuss further based on your experiences.


#18

Hi @ctrueden I finally got around to testing CellImg code on a 10000+++ frame .tif image and so far it works fine. I seem to be able to go to the middle of the image, process a few frames, and get a result, with little overhead.

Thanks to everybody who contributed to this thread, it was very helpful.


#19

@Christian_Tischer Finally, I have an exhaustive answer for Virtual Stack / CellImg related questions…

The following describes the state of things in imglib2 4.3.0 and imglib2-cache 1.0.0-beta-7, which should make it into Fiji soon, hopefully.


First, regarding how CellImg works:
CellImg, or rather AbstractCellImg, is an imglib2 Img that divides its underlying storage into Cells.
Each Cell represents a (hypercube) block of the full image. It has flat data (like an ArrayImg) and knows where its coordinates are in the full image. The accessors (Cursor, RandomAccess) of the CellImg know in which cell they currently “are” (which cell feeds the pixel data) and how to move between cells when you move them across the whole image.

Technically, the cells of a CellImg are in an image (RandomAccessibleInterval) whose pixel type is Cell.

For the virtual stack, and caching in general, LazyCellImg is interesting. The “image of cells” of a LazyCellImg is lazily evaluated. It relies on an implementation of the LazyCellImg.Get interface

public interface Get< T >
{
	T get( long index );
}

which provides the cell (T) for a given flattened index. Whenever an accessor walks into a cell, this is where it asks for data.

This opens the way for cached CellImgs that implement Get in various ways. These cached images live in imglib2-cache and can be build using ImgFactorys as usual. For example,

		final long[] dimensions = new long[] { 640, 640, 640 };
		final Img< ARGBType > img = new DiskCachedCellImgFactory< ARGBType >()
				.create( dimensions, new ARGBType() );

creates a 640^3 image that is backed by a disk cache:
Initially, cells don’t really exist, so the image takes up no memory (although the full image would be 640^3 * 4 = 1000 MB)
A new empty cell is created whenever an accessor “walks into it”. You can read/write the image as any other image. Cells keep track of whether they are “dirty”, i.e. have been modified from their initial empty state. A cache keeps all the created cells until memory runs full. Then, if a dirty cell is evicted from the cache, it (its data) is written to a temporary folder. When an accessor walks into that cell the next time, it is restored from disk.
imglib2-cache takes care of all of that, you just have to use DiskCachedCellImgFactory, that’s it.

It becomes even more useful by the fact that you can provide a CellLoader which is used to fill the data of “empty” cells when they are initially created. (After that, the disk cache takes over to track modifications.)

imglib2-cache comprises various variations on that scheme. You can use the factories, maybe with your own CellLoaders or build it all up from scratch using the lower-level building blocks of imglib2-cache.


On to virtual stacks.

What you want to do here, is to make a CellImg where each cell represents exactly one plane of the image (you can do that by setting cellDimensions to w*h*1) and provide a CellLoader that gets the data from the planes of the virtual stack. Here is how to do it:

final ImagePlus imp = IJ.openVirtual( "16bit.tif" );

// assuming we know it is a 3D, 16-bit stack...
final long[] dimensions = new long[] {
		imp.getStack().getWidth(),
		imp.getStack().getHeight(),
		imp.getStack().getSize()
};

// set up cell size such that one cell is one plane
final int[] cellDimensions = new int[] {
		imp.getStack().getWidth(),
		imp.getStack().getHeight(),
		1
};

// make a CellLoader that copies one plane of data from the virtual stack
final CellLoader< UnsignedShortType > loader = new CellLoader< UnsignedShortType >()
{
	@Override
	public void load( final SingleCellArrayImg< UnsignedShortType, ? > cell ) throws Exception
	{
		final int z = ( int ) cell.min( 2 );
		final short[] impdata = ( short[] ) imp.getStack().getProcessor( 1 + z ).getPixels();
		final short[] celldata = ( short[] ) cell.getStorageArray();
		System.arraycopy( impdata, 0, celldata, 0, celldata.length );
	}
};

// create a CellImg with that CellLoader
final Img< UnsignedShortType > img = new ReadOnlyCachedCellImgFactory().create(
		dimensions,
		new UnsignedShortType(),
		loader,
		ReadOnlyCachedCellImgOptions.options().cellDimensions( cellDimensions ) );

This works.

Some variations:

  1. You want to be able to write to the resulting img. Then you should use a DiskCachedCellImgFactory instead of the ReadOnlyCellImgFactory used in the example. The cells (planes) are then initialized from the virtual stack but all your modifications will be tracked and swapped to disk if necessary.
  2. You don’t want to copy the data, but directly use the short[] arrays underlying the virtual stack. Then you need to use a CacheLoader<Long,Cell<VolatileShortArray>> instead of the CellLoader. But this goes deeper into the imglib-cache internals and maybe this leads too far here. (If you want an example, I can post the code. The code is not much longer, but the explanation of what is going on would be…)

Finally, https://github.com/imglib/imglib2-cache-examples has a lot of examples going from extremely easy to quite low-level.


Read a single plane from a multi-dimensional image
How to create/work with hyperstacks too large for RAM
ImgLib2 Virtual Stack of multiple images
#20

Thanks @tpietzsch

Hoping to piggyback on this a little as it feels related

Disabling (or clearing) the cache
Is there a way to either disable caching (so that only the current cell is the only one in memory and once the cursor finds itself in another cell the previous one is removed). I know this seems undesirable but due to other processes going on in my program I don’t want my CellImg to hold on to cells, given I know I am not going back to them as I am just iterating over the images once. As you say imglib2-cache takes care of evicting and loading cells to and from the cache, but due to other concurrent processes instantiating objects at the same time I don’t want to be teetering near my memory limit. If I could disable caching that would be great, if not if there is a way to call a method to flush the cache that could work also.

ReadOnlyCachedCellImgFactory does not extend ImgFactory
ReadOnlyCachedCellImgFactory does not extend ImgFactory. I am loading my images using SCIFIO and passing a ImgFactoryHeuristic to the config as a way of selecting what ImgFactory to use when SCIFIO loads an image. This requires a return type of ImgFactory. This isn’t a huge issue as of yet as following the examples I have been using DiskCachedCellImgFactory, but figured I would just mention this as it might be an issue for someone not wrapping a VirtualStack (which I am aware was the topic at hand) but instead using SCIFIO to obtain a CellImg that acts like a virtual stack (without consuming resources beyond its active slice).

As a newcomer I just want to say great library, and thanks for any help you can give.