Reading TIFFs directly from InputStream

spark
scifio
Tags: #<Tag:0x00007fb879060700> #<Tag:0x00007fb879060340>

#1

I’m currently using Apache Spark with SCIFIO to read and process tarballs of TIFF files. Unfortunately I couldn’t find a way to open images directly from an input stream of bytes. My temporary solution is to write each image to a temporary file, read it using SCIFIO, and then delete the temporary file.

Is there any way I’m not aware of to read an image directly from an InputStream instead of a filename? And if not, where can I submit that as a feature request?

Best Wishes.


Saving images to Byte Arrays or Streams
Serializing multi-dimensional images to byte strings
#2

Hi @tomerk

That sounds awesome. Thanks for using the SCIFIO API - it is very helpful to get developer input here.

We use GitHub for our issue tracking, and there’s actually a topic banch you might be interested in - called robust-io. Things are still heavily string-based, so that branch focuses on redesigning the API to be source-agnostic.

That said, there actually is API to do this right now - it’s just lower-level. You want to use the setSource(RandomAccessInputStream) method.

The tutorials demonstrate how to get a specific Format and create components (like the Reader you will need).

Of the top of my head, one big difference to be aware of is that, if you’re writing an ImageJ plugin, there is already a Context available - so you should get your SCIFIO instance as an @Parameter.

If you run into any problems let us know!


#3

Thanks for the response @hinerm!

I’m not writing this as a plugin, but I am calling into SCIFIO from scala which may be complicating things a little bit.

I ran into some issues trying to follow the tutorial but using a RandomAccessInputStream wrapped around an array of bytes.

First, there is no scifio.format().getFormat() method that takes a RandomAccessInputStream instead of a file name / string id. I was able to manually choose the right one from scifio.format().getAllFormats(), but it’s not a robust solution.

The bigger problem is that ImgOpener still seems to be expecting the RandomAccessInputStream to be pointing to a file internally.

(Some code from the decompiled class file, sorry if the variable names don’t match):

  private <T extends RealType<T>> SCIFIOImgPlus<T> makeImgPlus(Img<T> img, Reader r, int imageIndex) {
    String id = r.getCurrentFile();
    File idFile = new File(id);
    String name = idFile.exists()?idFile.getName():id;

Because r.getCurrentFile() returns null, I end up with a NullPointerException.


#4

You’re absolutely correct. There is a lot of historical code in SCIFIO that still assumes a file source. Thank you for pushing development forward here!

I removed the blocking File-isms and released a scifio-0.26.0.

First, there is no scifio.format().getFormat() method that takes a RandomAccessInputStream instead of a file name / string id. I was able to manually choose the right one from scifio.format().getAllFormats(), but it’s not a robust solution.

I wrote a unit test you may find useful that goes from byte[] to ImgPlus. Of course, I cheated on the format-detection part :sweat_smile:

Actually this can provide a robust general solution - you want to iterate over that list and create Checker instances, then use those Checker instances to see if they can match your InputStream. If they do, then you’ve found a format match. This is the logic that’s used in the string-based signatures.

I will add the InputStream equivalents for these methods… it was silly of me not to put them in 0.26.0, I was just happy to have a working in-memory test :laughing:


#5

Much better! Available now in 0.27.0

Let me know if you run into more problems @tomerk


#6

Thank you @hinerm that worked great!

For some reason the auto-detected format was Micro-Manager instead of the TIFF format, so I needed to manually get the TIFF format by class. Besides that it works!

We did at one point run into a strange synchronization bug with the format service formatCache that would occasionally lead to a deadlock when using 8 spark threads to read thousands of images, but only on some machines. We got around it by only getting the format once for each JVM.


#7

TIFF-based formats are notoriously annoying to match properly, especially when they have companion metadata files (like Micro-Manager does).

Micro-Manager uses a delegate TIFF reader so in theory it should still read your data correctly. However I talked with @ctrueden about this and we agreed this behavior is confusing.

So I changed Micro-Manager to not match if the RAIS check is called directly; this will be our policy for all multi-file formats for now, which means you may get false negatives if only using a RAIS (instead of false positives)

We’ll eventually be moving to a more robust solution that will allow proper modelling of multi-file formats, regardless of where they come from.

Interesting. If you ever want to amuse/horrify yourself I highly recommend looking at code you wrote years ago :laughing: . I was actually looking at the caching yesterday and pondering how it looked odd and unsafe. I will take a harder look… if you come up with a reproducible example in the mean time, let me know.

I’ve heard interest from other people in the community about Spark use with ImageJ. The fact that you have something working is really cool; I encourage you to write a bit about it on the ImageJ wiki - whether it’s about your work in particular, or experiences with Spark + SCIFIO/ImageJ, etc… I think it could be of general interest and use.


#8

I tweaked the locking behavior a bit in the FormatService.

I don’t know if it’s possible, but if you’re able to recreate the deadlock and get a stack trace that’d still be useful.


#9

So we’re mainly just using it for reading the TIFF files, then we convert to a breeze DenseMatrix, but it could still be useful to people. We could probably add in calls to ImageJ transformations at that point also though.

We grabbed a stack trace earlier. (this is with scifio 0.24.0).

One thread was at

java.util.WeakHashMap.put(WeakHashMap.java:521)
io.scif.services.DefaultFormatService.getFormat(DefaultFormatService.java:311)
io.scif.services.DefaultInitializeService.initializeReader(DefaultInitializeService.java:89)
io.scif.img.ImgOpener.createReader(ImgOpener.java:547)
io.scif.img.ImgOpener.openImgs(ImgOpener.java:167)
vectors.MagneticField$.loadFromTiff(MagneticField.scala:24)
loaders.MagneticFieldLoaderUtils$.loaders$MagneticFieldLoaderUtils$$loadFile(MagneticFieldLoaderUtils.scala:83)
loaders.MagneticFieldLoaderUtils$$anonfun$loadFiles$1.apply(MagneticFieldLoaderUtils.scala:48)
loaders.MagneticFieldLoaderUtils$$anonfun$loadFiles$1.apply(MagneticFieldLoaderUtils.scala:48)

And another thread was at

java.util.WeakHashMap.get(WeakHashMap.java:471)
io.scif.services.DefaultFormatService.getFormat(DefaultFormatService.java:308)
io.scif.services.DefaultInitializeService.initializeReader(DefaultInitializeService.java:89)
io.scif.img.ImgOpener.createReader(ImgOpener.java:547)
io.scif.img.ImgOpener.openImgs(ImgOpener.java:167)
vectors.MagneticField$.loadFromTiff(MagneticField.scala:24)
loaders.MagneticFieldLoaderUtils$.loaders$MagneticFieldLoaderUtils$$loadFile(MagneticFieldLoaderUtils.scala:83)

I don’t remember which was stuck and which was running but frozen.

My guess is it’s related to a known issue where unsynchronized access to WeakHashMaps can cause endless loops.