Convert CSV to TIF - Best practice


#1

Hello everyone,

I would like to convert data from csv files into several stacks.
I have a batch of input files, say a number of Z csv files, one for each slice. Each file has 3 columns and n lines:

double_01, double_02, double_03
double_11, double_12, double_13 
...
double_n1, double_n2, double_n3

My task is to produce a stack with given dimensions W, H, Z from each column, according to the following pattern:The W first rows of a column go into the first line (y=0) of the current slice of the stack, then the following W rows go into the 2nd line (y=1), and so on. Here’s a sketch for a single file/slice:

This is somehow like the “reshape” function of Matlab, but between a text file and an image. The output stacks are ~2002501000 vox. 32 bits, so memory is an issue.

Before going any further: I did it, my code works. I’m just wondering if it can run faster.

My algorithm goes like this:

  • Create 3 empty stacks
  • For each csv file (z)
    • scan the file line by line
      • convert the line into 3 pixel values
      • put 1st value at (x,y,z) in the 1st stack
      • put 2nd value at (x,y,z) in the 2nd stack
      • put 3rd value at (x,y,z) in the 3rd stack
    • Go to next line
  • Go to the next file/slice
  • Save the stacks

So, I’m putting pixel values one by one into each image. My question is: Is there a more efficient option?
For example, I noticed there exists a putrow method. Does anyone know if it could run faster?
Or would it be more efficient to first scan the whole csv file, load it into an Array or a List, process it line by line, and write-append each slice into TIF files?

Thank you for your help!


#2

I would not be surprised if you got a decent speed improvement if you imported the csv as a text image (File->Import->Text Image), and then simply copied columns of pixels from the text image to each stack. Be sure to run it in batch mode, or the switching between windows will bog things down tremendously.


#3

Thank for your answer.
Actually, I’m working with a Groovy script. I can still load the csv file using ij.plugin.TextFileReader but if possible, I would like to avoid loading the whole file into memory.


#4

I answered my own question: Putting pixel values row by row is way faster than one by one.


#5

:thumbsup: Great!
Would you mind to share the script code as well? I’d be interested to see how you solved it in the end.


#6

Sure, but it is still a work in progress. If find there is too much duplicate code.
Here’s the main part. I do not include the pre-processing of the files (GUI, opening…).
files is my array of input files.
u, v, zncc are the data in the 3 columns of each file.
nbPtsX, nbPtsY and nbSlices are the dimensions of the output stacks (nbSlices = number of input files).

ImagePlus imgU = IJ.createImage("u", "32-bit", nbPtsX, nbPtsY, nbSlices);
ImagePlus imgV = IJ.createImage("v", "32-bit", nbPtsX, nbPtsY, nbSlices);
ImagePlus imgZNCC = IJ.createImage("zncc", "32-bit", nbPtsX, nbPtsY, nbSlices);

stackU = imgU.getStack();
stackV = imgV.getStack();
stackZNCC = imgZNCC.getStack();

// u, v, zncc values are stored into 1D arrays that are put in the images at increasing (0, y) coordinates
float[] uRow = new float[nbPtsX];
float[] vRow = new float[nbPtsX];
float[] znccRow = new float[nbPtsX];

for (int z = 1; z <= nbSlices; z++){
	file = files[z - 1];
	ipU = stackU.getProcessor(z);
	ipV = stackV.getProcessor(z);
	ipZNCC = stackZNCC.getProcessor(z);
	i = 0; // Line counter
	file.withReader { reader ->
	   while (((line=reader.readLine()) != null)) {
	   	String[] uvzncc = line.split(',');
	   	uRow[i % nbPtsX] = Float.parseFloat(uvzncc[0]);
	      	vRow[i % nbPtsX] = Float.parseFloat(uvzncc[1]);
	      	znccRow[i % nbPtsX] = Float.parseFloat(uvzncc[2]);
	      	// When nbPtsX lines have been scanned, put rows at (0, y)
	      	if((i+1) % nbPtsX == 0){
			int y = i / nbPtsX;
			ipU.putRow(0, y, uRow, uRow.size());
			ipV.putRow(0, y, vRow, uRow.size());
			ipZNCC.putRow(0, y, znccRow, znccRow.size());
	      	}
	      	i++;
	   }
	}
	stackU.setProcessor(ipU, z);
	stackV.setProcessor(ipV, z);
	stackZNCC.setProcessor(ipZNCC, z);
	IJ.showProgress(z, nbSlices);
}
imgU.show();
imgV.show();
imgZNCC.show();

#7

Thanks @Nicolas, that’s a nice example using Groovy’s file.withReader syntax!


Some comments:

In Groovy, you can also write more concisely:

for (z in 1..nbSlices)

Also, you don’t need to use semicolon (;) at the end of each line.


I wonder how using only ImageJ2 types (i.e. Ops and ImgLib2) would compare in performance.
If you like to benchmark this, here’s a modified version of your script (using Script Parameters and runnable from within Fiji’s Script Editor):

// @File(label="Input file (csv)") csvFile
// @OpService ops
// @StatusService sts
// @OUTPUT Img imgU
// @OUTPUT Img imgV
// @OUTPUT Img imgZNCC

import net.imglib2.type.numeric.real.FloatType
import net.imglib2.img.array.ArrayImgFactory

sizeX = 10
sizeY = 10
sizeZ = 1

imgU = ops.run("create.img", [sizeX, sizeY, sizeZ], new FloatType(), new ArrayImgFactory())
imgV = imgU.copy()
imgZNCC = imgU.copy()

cursorU = imgU.cursor()
cursorV = imgV.cursor()
cursorZNCC = imgZNCC.cursor()

// for (z in 1..sizeZ) { } // optionally loop over slices

csvFile.withReader { reader ->
	while (((line=reader.readLine()) != null)) {
		uvzncc = line.split(',')
		cursorU.next().set(Float.parseFloat(uvzncc[0]))
		cursorV.next().set(Float.parseFloat(uvzncc[1]))
		cursorZNCC.next().set(Float.parseFloat(uvzncc[2]))
	}
	// sts.showProgress(z, sizeZ)
}

#8

Thank you @imagejan! I will try that tomorrow.
Your example tells me that I wouldn’t waste my time learning Ops and Imglib2.