How to find memory leaks in plugins / what needs to get disposed?


#1

Hey all,

I am trying to find memory leaks in one of our plugins and wanted to ask for best strategies.

When calling the plugin in a loop over and over again, RAM was used until it is out of memory.

I was able to free resources by calling dispose() on some objects. I read that in Java it is generally not necessary to destroy objects, but for example DatasetView has a dispose() function that needs to be called at the end I think.

RAM usage is better now, but calling the plugin in a loop still throws a java.lang.OutOfMemoryError: GC overhead limit exceeded after a while. Running System.gc() after each plugin call did not help.

I read the section about Debugging Memory Leaks in the Wiki, here are two screenshots, not sure where to go from here.

Screenshot_2018-04-02_10-43-07

Screenshot_2018-04-02_10-43-36

This looks to me like threads get accumulated and cause the issue - but I am new to this so I’m not sure. The plugin creates threads in this fashion:

ExecutorService pool = Executors.newSingleThreadExecutor();
pool.submit(..)
pool.shutdown();

Multiple topics, sorry… These are my general questions:

  • What is the best way to tell if a plugin is leaking memory? Can I do this with a JUnit test or something?
  • How do I know which classes have special dispose functions, especially in ImageJ / imglib2?
  • How to dispose ExecutorService threads correctly / could this be a cause for GC overhead limit exceeded?

#2

Ok I will document what I tried so far, feel free to intervene, this is all pretty new to me and I want to learn and get better.

So I minimized my test code and got the first GC overhead caused by creating the ImageJ instance in the loop:

for(int i = 0; i < 1000; i++) {
	ImageJ ij = new ImageJ();
}

Result: java.lang.OutOfMemoryError: GC overhead limit exceeded after ~200 loops.

The GC is able to handle it if I add a pause:

for(int i = 0; i < 1000; i++) {
	ImageJ ij = new ImageJ();
	Thread.sleep(500);
}

But it makes way more sense to just put it outside the loop.


#3

Next test was to create a DatasetView (our plugin has DatasetView as input, maybe a bad idea?).

ImageJ ij = new ImageJ();
for(int i = 0; i < 1000; i++) {
	Dataset dataset = ij.dataset().create( new FloatType(), new long[] { 50, 50, 50 }, "", new AxisType[] { Axes.X, Axes.Y, Axes.Z} );
	DatasetView datasetView = new DefaultDatasetView();
	datasetView.setContext( dataset.getContext() );
	datasetView.initialize( dataset );
	datasetView.rebuild();
}

Result: java.lang.OutOfMemoryError: GC overhead limit exceeded.

What helped was disposing the datasetView inside the loop:

	datasetView.dispose();


#4

@frauzufall,

Glad you’re tackling this. Unfortunately, leaks are often very tricky to track down.
In part because of how little control the programmer has over garbage collection (it’s usually great to not have to think about it).

In your first screenshot, do you know where that LinkedHashMap is and what it holds?
Certainly looks like a possible candidate.

I’d be surprised if that were the problem (unless you see that objects associated with the that should be garbage collected never are, though of course something that the threads execute could be causing leaking… not sure)

Some ideas from here:

  • -XX:+UseConcMarkSweepGC.
    • to use the concurrent GC
  • -verbose:gc -XX:+PrintGCDetails
    • to see what the GC is doing.
  • -XX:-UseGCOverheadLimit
    • to disable the check that gives you the GC overhead limit error.

Details on that last flag, also from that link above:

The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.

…TIL. So, I guess disabling the OverheadLimit check might just eat up your CPU without helping much, and if there is a leak, you’ll eventually just run out of memory. May be worth trying anyway.

Just some ideas, and maybe not much useful info here. Will be looking forward to what you find though!

John


#5

Next thing I did was creating a super basic test plugin with the same main inputs and outputs I need for our plugin:

@Plugin( type = Command.class )
public class TestPlugin implements Command {

	@Parameter( type = ItemIO.INPUT )
	private DatasetView input;

	@Parameter( type = ItemIO.OUTPUT )
	private List< DatasetView > output = new ArrayList<>();

	@Override
	public void run() {
		output.add(input);
	}
}

… and running it:

ImageJ ij = new ImageJ();
for (int i = 0; i < 1000; i++) {
	Dataset dataset = ij.dataset().create(new FloatType(), new long[]{50, 50, 50}, "", new AxisType[]{Axes.X, Axes.Y, Axes.Z});
	DatasetView datasetView = new DefaultDatasetView();
	datasetView.setContext(dataset.getContext());
	datasetView.initialize(dataset);
	datasetView.rebuild();
	final Future<CommandModule> future = ij.command().run(TestPlugin.class, false, "input", datasetView);
	final Module module = ij.module().waitFor(future);
	List<DatasetView> result = (List<DatasetView>) module.getOutput("output");
	datasetView.dispose();
	// no need to dispose result list entries yet because nothing is copied
}

Result: running smoothly.

@bogovicj Thanks, appreciate your suggestions a lot! I also don’t consider the threads to be the issue anymore, commented them out and still get GC overhead. I made a few more screenshots of a heapdump from the code above (with the TestPlugin) that seems to have no memory issues (at least it is not crashing, I still don’t know exactly how to determine if there is a leak or not) and there the LinkedHashMap also appears. Not sure though what to do with the Instances view, how do I know where they come from?


#6

This might be interesting. Again, this is my current code to run the TestPlugin from the last post in a loop:

ImageJ ij = new ImageJ();
for (int i = 0; i < 1000; i++) {
	Dataset dataset = ij.dataset().create(new FloatType(), new long[]{50, 50, 50}, "", new AxisType[]{Axes.X, Axes.Y, Axes.Z});
	DatasetView datasetView = new DefaultDatasetView();
	datasetView.setContext(dataset.getContext());
	datasetView.initialize(dataset);
	datasetView.rebuild();
	final Future<CommandModule> future = ij.command().run(TestPlugin.class, false, "input", datasetView);
	final Module module = ij.module().waitFor(future);
	List<DatasetView> result = (List<DatasetView>) module.getOutput("output");
	datasetView.dispose();
}

This is its memory consumption graph:

When I do the same but execute the command with pre- and postprocessing, I’m out of memory.

final Future<CommandModule> future = ij.command().run(TestPlugin.class, true, "input", datasetView);

As far as I see it, TestPlugin has no pre- and postprocessing, does it? Why is this happening?


#7

Current status: The plugin is running in a loop without throwing OOM errors. Memory accumulates over a few plugin runs but the garbage collector seems to be able to handle that (peaks in CPU and Threads depict each plugin run, heap is growing over longer period).

Since the process does not get slower or crashes anymore, I am fine, not sure though whether it’s still leaking or not or how to tell.

In summary, these are the changes to the original code:

  • call close() on Tensor and SavedModelBundle from the TensorFlow lib
  • call dispose() on DatasetView
  • avoid all instantiations of ImageJ within the loop (I rather used the services that can be added to a plugin as parameters)
  • avoid calling a plugin with pre- and postprocessing, do instead
    ij.command().run(Plugin.class, false, ..);

The last two aspects do not seem intentional to me.

Thanks for reading this photo heap story, please comment if you got any.notes :slight_smile:


#8

Thank you for following up! :slight_smile:

I am not entirely sure about the implementation so this is just thinking out loud: spinning up a Context, i.e. creating a new net.imagej.ImageJ() will do a lot of different things, among them Service discovery. By creating a thousand Contexts you end up with 10000s of household objects. Creating an ImageJ instance outside of the loop should be pretty similar to using @Paramater (at least that made sense at the time of writing this post).

Again just thinking out loud: the List<DatasetView> that you have defined as output might be an issue for the DisplayPostprocessor. You could try to manually resolve the output using the Module.resolveOutput("output") to avoid any postprocessing:

// ...
CommandInfo cmd = ij.command().getCommand(TestPlugin.class);
Module module = ij.module().createModule(cmd);
module.resolveOutput("output");
Future<Module> future = ij.module().run(module, true, "input", datasetView);
future.get();
//...

Best,
Stefan


#9

Hi @frauzufall

The pre and post-processor are triggered automatically depending on input/output type. In this case it’s possible the post-processor is trying to convert and display your outputs, and in the process keeping copies in memory. I don’t know what the post-processor is attempting to do with a List<DatasetView>, however it’s probably ends up keeping a copy in memory somehow.

If you experiment with a plugin that uses Img as the output, you can see the behavior of the postprocessor. In the below example, If postprocessing is turned on, several images are displayed.

@Plugin(type = Command.class, headless = true, menuPath = "Plugins>PostProcessingTests")
public class PostProcessingTest<T extends RealType<T> & NativeType<T>> implements Command {

	final static String inputName = "./C2-confocal-stack.tif";

	@Parameter
	OpService ops;

	@Parameter
	Img<T> img;

	@Parameter(type = ItemIO.OUTPUT)
	Img<T> out;

	public void run() {
		out = (Img<T>) ops.math().multiply((IterableInterval) img, (IterableInterval) img);
	}

	public static <T extends RealType<T> & NativeType<T>> void main(final String[] args)
			throws InterruptedException, IOException {
		final ImageJ ij = new ImageJ();

		ij.launch(args);

		Img<T> image = (Img<T>) ij.io().open("http://imagej.net/images/fluoview-multi.tif"); // convenient
																								// example																							// stac
		for (int i = 0; i < 5; i++) {
			final Future<CommandModule> future = ij.command().run(PostProcessingTest.class, true, "img", image);
			final Module module = ij.module().waitFor(future);
		}

	}

}

#10

@stelfrich @bnorthan Thank you for your helpful comments, I finally had the chance to test the code examples and you must be right about the postprocessing having issues with the DatasetView…

Yes I was somehow of the opinion that ImageJ is implemented as a Singleton, but @haesleinhuepf told me that was the case with IJ1 and has been dropped since, so I got that mixed up I guess. Still, is there a way to properly dispose an ImageJ instance?


#11

The following should work:

ImageJ ij = new ImageJ();
ij.dispose(); // see https://github.com/scijava/scijava-common/blob/master/src/main/java/org/scijava/Context.java#L416-L427

If not, we will have to investigate further.


#12

That’s what I would have expected it to be, but the link in your code lead me to the right place:

ij.getContext().dispose();

Thanks :smiley: With this, I can also create the ImageJ instance inside the loop (although it’s way slower, no surprise)