python - Calculating stats across 1000 arrays -
i writing python module needs calculate mean , standard deviation of pixel values across 1000+ arrays (identical dimensions).
i looking fastest way this.
currently looping through arrays , using numpy.dstack stack 1000 arrays rather large 3d array...and calculate mean across 3rd(?) dimension. each array has shape (5000,4000).
this approach taking quite long time!
would able advise on more efficient method of approaching problem?
maybe calculate mean
, std
in cumulative way (untested):
im_size = (5000,4000) cum_sum = np.zeros(im_size) cum_sum_of_squares = np.zeros(im_size) n = 0 filename in filenames: image = read_your_image(filename) cum_sum += image cum_sum_of_squares += image**2 n += 1 mean_image = cum_sum / n std_image = np.sqrt(cum_sum_of_squares / n - (mean_image)**2)
this limited how fast can read images disk. not limited memory, since have 1 image in memory @ time. calculation of std
in way might suffer numerical problems, since might subtracting 2 large numbers. if problem have loop on files twice, first calculate mean , accumulate (image - mean_image)**2
in second pass.
Comments
Post a Comment