Best practice : finding a local maximum along a line

imglib2
code-review
Tags: #<Tag:0x00007fd53ede2b58> #<Tag:0x00007fd53ede2a18>

#1

Hello everybody,

So I have a piece of code where I’m doing the following thing :

I have a list of points in an arraylist (dots, which can be up to a few millions). Each of this point has a position (dot.pos) normal vector (dot.norm) in 3D.

What I would like to do is to find the local maximum for each dot, in a certain range along its normal vector.

Here’s my code currently :

1 - Launching findLocalMax with CPU parallelization for the dots arrayList (inside an “Optimizer” object)

dots.parallelStream().forEach(dot -> {
	// I create these two "temporary" to avoid repeated creation of these objects into the findLocalMax function. Is that ok ?
	RandomAccess rA = image3D.randomAccess();
        long[] posImg = new long[ 3 ];    			
	dot.findLocalMax(rA,posImg);	// call find max function
    }
);

2 - Function for local maximum finding, (inside the “dot” object)


stepSize=0.5;		// Sampling along the normal vector
rangeOfSearch=5;	// range of searching

public < T extends RealType< T > & NativeType< T >> void findLocalMax(RandomAccess< T > rA, long[] posImg ) {
    // Initialisation of position and increments
    float dx,dy,dz,xp,yp,zp;
    dx=stepSize*norm.x;  dy=stepSize*norm.y;  dz=stepSize*norm.z;	// increments
    xp=pos.x;            yp=pos.y;            zp=pos.z;			// float coordinates
    posImg[0]=(int)xp;   posImg[1]=(int)yp;   posImg[2]=(int)zp;	// int coordinates
       
    int NSteps=(int) (rangeOfSearch/stepSize); // Number of steps along the normal line
		
    // Initialisation of the temporary maximum variable
    rA.setPosition(posImg);
    T val_max;
    val_max = rA.get().createVariable();
    val_max.set(rA.get());
    T val_test;

    // Loop along the normal vector
    for (int i=0;i<NSteps;i++) {
        xp+=dx;		    yp+=dy;	        zp+=dz;
        posImg[0]=(int)xp;  posImg[1]=(int)yp;  posImg[2]=(int) zp;
        rA.setPosition( posImg );
        val_test=rA.get();
   
        if (val_test.compareTo(val_max)>0) {
            // updates new found maximum
            val_max.set(val_test);
            currentMaxPosition=i;
        }
    }
}

My questions :

  1. In theory, it should be sufficient that T is a Comparable. However I couldn’t find a way to create an object that store the current local maximal value. Is there a way to do that ?
  2. I create temporary objects before launching the function, to avoid recreating objects for each function call. Does that make sense ?
  3. I’m creating these temp objects inside the parallelstream “loop”. Could I declare them before parallelstream ?
  4. I saw that there a neighbourhood iterator that looks like what I’m doing PeriodicLineNeighborhood. Unfortunately, it seems to be implemented with ints as increments, while I would need to have it with float as increments. What can I do to remedy this ?
  5. Is there a way for me to create the temporary maximum variable outside the findLocalMax function (avoiding creation of millions of Objects) ? Is that useful ?
  6. Any other way to improve the code ? I’ll take it! :slight_smile:

Thanks!


#2

Hi @NicoKiaru, welcome to the forum!

There’s one significant thing about that code I suggest you reconsider, and some other minor ideas (take with a grain of salt).

The big thing:
As you say in (4), ‘findLocalMax’ works on continuous (float) positions rather than int positions. However, the code here measures intensities of the RandomAccess at discrete (int) positions: not what I expected. I recommend that you pass a RealRandomAccess to findLocalMax, so that you can measure intensities at float postions.

Other thoughts:
(1) I guess you’d have create val_max outside ‘findLocalMax’, pass it, and set it.
(2) & (3) I don’t see a huge benefit to creating ‘rA’ outside the function call. Go with what is readable and see 5.
(4) I don’t think such a thing exists, but not sure. Even if it did, I would expect this code to be faster, and am sure it would be easier to understand/read too.
(5) Do what is most readable, don’t prematurely optimize. After all creating objects should be cheap.. Also, since this code is meant for 3d, you may be able to get rid of the posImg array and instead do:

rA.setPosition( xp, 0 );
rA.setPosition( yp, 1 );
rA.setPosition( zp, 2 );

Good idea to do this, I should post some of my code here for review sometime :neutral_face:
John


#3

Thanks so much for looking through this!

I’ll definitely change to RealRandomAccess. If I understand correctly, I may even gain an interpolated value between voxels, which I think will improve the method.

I mentioned it briefly, but the PeriodicLineNeighborhood could even shorten and simplify my code, unfortunately it seems that the “increments” in the implementation of PeriodicLineNeighborhood are integer and not float as I would like (https://github.com/imglib/imglib2-algorithm/blob/master/src/main/java/net/imglib2/algorithm/neighborhood/PeriodicLineNeighborhood.java). Maybe if I have time I can try to code it and suggest a “pull”. Do you think that’s a good idea ?

Good point for the readability… You’re right I should probably stick with good readability right now. On the other side, the link you provide shows that it takes around 1sec to create 4 millions objects, which is in fact not so negligible. I’ll measure to see if it is worth or not.

Thanks again!

Nicolas


#4

Yes, you understand correctly, it will interpolate.

I’d be very interested to see what you find.


#5

Well, it’s actually longer to create the variable outside the function (by 4% in my test case) :confounded: (and messier)

Now I realized that my question and the code was a bit stupid… Indeed there is no benefit to create the variable rA outside the funtion call, there’s really no reason for that. In fact I got confused because of parallelstream that I do not master.

I guess parallelstream parallelizes on each cpu available. So let’s say if I have 10 cpu and 1 million dots, each cpu calls findLocalMax on 100 000 dots, and these calls are sequentials within each cpu. Thus in theory I would need only 10 temporary variables - one for each cpu. However my code here creates 1 million temporary variables. That was the thing I was trying to optimize.

I’ll have a look to see whether it is possible to create a temp variable per cpu (and also if it is worth it).

Regarding the point (4), I’ll also probably stick to my code if you think it is faster.

Thanks,

Nicolas