SciJava Batch Processor?

batch-processing
scijava
Tags: #<Tag:0x00007fb87e44b248> #<Tag:0x00007fb87e44b018>

#1

Hi all,

I’d like to make batch processing (with SciJava scripts, commands or ops) even easier.

The current options for batch processing files in ImageJ are outlined on the batch processing page. There are mainly two options:

  1. Process > Batch > Macro…, the ImageJ 1.x
  2. Use one of the Process folder templates (currently available in IJ1 macro and Python) and write your own script.

While setting up a script is easy with the use of templates (i.e. option 2), it still results in a lot of code duplication if you create a lot of scripts this way, thereby increasing the maintenance burden when those scripts need to be adapted to different use cases, e.g. changing input parameters etc.

I therefore would like to improve option 1 from above: create a Scijava Batch Processor that is aware of the @INPUT parameters of a script and performs batch processing over a list of files, recursively from one or many folders. Basically, it should do the following:

  • Choose any script, command or op that has at least one File or Img/Dataset/ImagePlus input

  • If multiple compatible inputs exist (file and/or image), offer a choice to either:

  • have them filled by a constant value (e.g. subtract the same shading reference image from all images in a folder), or

  • populate them from the list of files (batch processing input)

  • Ask for an input (and an optional output) directory, and harvest all other input parameters of the script by creating a dialog (usual @Parameter processing)

  • Run the given command for each input file/image with the provided (constant) additional parameters.


I realize this is pretty similar to how ImageJ2 plugins currently work in KNIME, with the column binding option in the advanced settings:

So I think something similar would be nice to have in ImageJ (as not every batch workflow is suitable to be run from KNIME (yet) ;-)). I have some ideas how I would go about creating a small Java wrapper that runs a command for every file in a folder, but I’d need some pointers on:

  • How do I get the inputs of a given script/command? I guess I need ModuleInfo.inputs(), but I’d be grateful if someone had a small illustrative example at hand.

  • Also, I can possibly use ModuleService.getSingleInput() to ask for compatible file/image input parameters, right?

  • How do I add the inputs of the script/command to the current instance of my batch processor, so that a new dialog can be created that asks for those parameters as well as the target folders?

Before I start: does that make sense at all? Do you have comments or objections? (/cc @ctrueden)


Search field on menu
A search bar for ImageJ
#2

I have to say that the SciJava service architecture is plain awesome. Here’s how I get all modules and scripts that have a File input parameter:


That leaves me with just the third question above, how to dynamically add the inputs before generating a new dialog…


#3

There is MutableModule with the associated MutableModuleInfo. Looks like it would suit your use-case, although I don’t know how you’ll implement all this. DynamicCommand implements the MutableModule interface and looks like a good starting point, maybe? (really just thinking out loud here…)


#4

Thanks @stelfrich for the hints, very much appreciated! :slight_smile:

I tried with DynamicCommand, but figured that I might not even need it if I process the inputs of my ScriptModule once and then only update the @File input for every run. I’ll play a bit more.


In any case, my progress is here, so comments and suggestions are always welcome:

https://github.com/fmi-faim/scijava-batch-processor/blob/batch-processor/src/main/java/org/scijava/batch/BatchProcessor.java#L16-L53

This allows a simple script with only a single File input (and currently no other “interactive” inputs):

#!groovy

#@File input
#@LogService log

log.info("Processing $input now!")

to be run using the batch processor, e.g. like this (to be simplified of course):

#!groovy

#@ScriptService ss
#@CommandService cs

import org.scijava.batch.BatchProcessor

scriptFile = new File("C:\\path\\to\\my\\script\\Process_Single_File.groovy")
scriptInfo = ss.getScript(scriptFile)

cs.run(BatchProcessor.class, true, "inputFolder", new File("C:\\temp"), "script", scriptInfo)

#5

I’ve made some further progress, but now I’m stuck at the point where I’d like to be able to:

  • preprocess a ScriptModule so that all (unresolved) inputs get processed (by showing an interactive dialog, just as if I had run the script directly), but I do not want to run the script just yet;
  • then run the script repeatedly, while newly setting the value of a File input (and marking it as resolved) for each iteration.

The problem is that I get a bunch of NullPointerExceptions, apparently because the script doesn’t get preprocessed as intended, and all the inputs remain null.

Any suggestions anyone?

My current progress is here:

https://github.com/fmi-faim/scijava-batch-processor/commit/c7c116f278e4881981202c5852a9d52f5c451282--

Update:
Script execution now seems to work, but I get a NullPointerException when processing the outputs, apparently (I just filed an issue for this):

[ERROR] Module threw exception
java.lang.NullPointerException
	at org.scijava.script.ScriptModule.run(ScriptModule.java:180)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:167)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:126)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:65)
	at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:237)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
[ERROR] Error during module execution
java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.scijava.batch.BatchProcessor.run(BatchProcessor.java:95)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:167)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:126)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:65)
	at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:237)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at org.scijava.script.ScriptModule.run(ScriptModule.java:180)
	... 8 more

The current code is on the batch-processor branch:

https://github.com/fmi-faim/scijava-batch-processor/tree/batch-processor


#6

I looked quickly at your branch, but did not have time to test it (yet). How are things working now? Are you still stuck, or does the dialog prompt work the way you want?

From the code, I can imagine that it might actually work, since the first time you invoke the module, the input harvester will harvest all unresolved inputs, and then for subsequent invocations, since the inputs are already all resolved, the dialog does not appear anymore. OTOH, there have been reports in the past that module instances cannot be reused (see e.g. knime-ip/knip-scripting#56), so I would not be shocked if it does not work.

If things are not yet working: did you already check out the ModuleRunner class? That is where the magic happens regarding preprocessing, running, and postprocessing of modules. If you need really deep control over this execution flow, you might have to (partially) recapitulate that code. On the other hand, you may not; even in the SciJava Jupyter Kernel, we are able to use ModuleRunner for the code cells.


#7

Yeah, thanks, as you probably saw from the solved issue, I had problems with content injection but had solved it. Your suggestion to use ModuleService to create the module will make it even better.

Otherwise, all seems to be working as intended. I still plan some improvements:

  • Beautify the list of available scripts by displaying their menu path (?) instead of their identifier

  • Also support scripts with multiple File inputs, by offering a choice which of them should be iterated on (i.e. filled with the batch list of files); the remaining File inputs will be harvested as all other inputs, and kept constant over all iterations.

  • Do the same for scripts with Img/ImgPlus/Dataset/ImagePlus inputs. Maybe two menu commands, i.e. Batch Process Files and Batch Process Images ??

Suggestions welcome!

@ctrueden you surely noticed that I named this scijava-batch-processor :slight_smile: What do you think about adding this to scijava? Seeing the amount of questions like “how can I run my macro on a folder of files?”, I think an improved batch processor could be of general interest. I would be willing to take care of the plugin according to the roles, of course, and try to update the batch processing page accordingly.


#8

Menu path sounds like a good idea, but can you filter the available scripts according to the requirements of your batch processor?

I really like that idea! Couldn’t we even generalize the concepts more on Scijava’s side s.t. @Parameter Img opens up a file chooser if no images are open?


#9

Yes, currently I’m filtering the available scripts and keep only those where modules.getSingleInput(module, File.class) returns non-null:

(My plan was to instead allow any script that has at least one File input, see the second point in my list above.)


That would be nice indeed. But it’s not directly related to batch processing, is it? Or do you mean, a file chooser that allows selecting multiple files?
Some basic file multi-selection with wildcards/regexes would also be nice, of course.


#10

There is an API call which tries to be smart about giving the best possible human-readable string: ModuleInfo#getTitle().

For the moment, it might be enough to just loop on the first one. Then at least people can write batchable commands which require auxiliary files, as long as they list the primary file first.

Why limit yourself? It would be cool to be able to do it for any input that has a collection of objects available from the ObjectService. Of course, then we need to think of a UI that makes it easy for the user to say she wants to do that, and with which input(s)… I agree that having images be first class citizens in the menu still makes sense—though those menu items extending the functionality could not live in scijava-batch-processor, since they are image-specific, but rather would need to go into some imagej component or other.

Sounds good. In years past, I would have said this should belong in scijava-plugins-commands. But now I am not so sure. A separate component probably makes sense here.