Python - Quick Upscaling of Array with Numpy, No Image Libary Allowed [duplicate]












2
















This question already has an answer here:




  • How to repeat elements of an array along two axes?

    4 answers




Note on duplicate message:



Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.



Goal:



Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.



Problem:



I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.



For a working example, a [2,2] array of [16,24;8,16]:




16 , 24



8 , 16




Multiplied by a factor of 2 for a [4,4] array would have the values:




4 , 4 , 6 , 6



4 , 4 , 6 , 6



2 , 2 , 4 , 4



2 , 2 , 4 , 4




The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.



It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.



Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.



Code examples:



Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).



Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.



% MAIN: To run these



import timeit

timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));

timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));


% FUN: Regridder 1 - for loop



import numpy as np
from numba import prange, jit

@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places

return outArray;


% FUN: Regridder 2 - numpy



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;

return outArray;


Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.



Current best solution:



On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.



For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.



% FUN: Regridder 1 - for loop



import numpy as np

def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)

outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop

return outArray;
#END def Regridder1

from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale


% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done

return outArray;
#END def Regridder2









share|improve this question















marked as duplicate by unutbu arrays
Users with the  arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Is the [4,4] array example the desired output?, or... i am confused?

    – U9-Forward
    Nov 16 '18 at 3:20













  • It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

    – user2403531
    Nov 16 '18 at 3:48











  • It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

    – unutbu
    Nov 16 '18 at 3:56













  • No need to compute outSize in Regridder2.

    – unutbu
    Nov 16 '18 at 4:00











  • outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

    – unutbu
    Nov 16 '18 at 4:01
















2
















This question already has an answer here:




  • How to repeat elements of an array along two axes?

    4 answers




Note on duplicate message:



Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.



Goal:



Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.



Problem:



I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.



For a working example, a [2,2] array of [16,24;8,16]:




16 , 24



8 , 16




Multiplied by a factor of 2 for a [4,4] array would have the values:




4 , 4 , 6 , 6



4 , 4 , 6 , 6



2 , 2 , 4 , 4



2 , 2 , 4 , 4




The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.



It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.



Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.



Code examples:



Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).



Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.



% MAIN: To run these



import timeit

timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));

timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));


% FUN: Regridder 1 - for loop



import numpy as np
from numba import prange, jit

@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places

return outArray;


% FUN: Regridder 2 - numpy



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;

return outArray;


Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.



Current best solution:



On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.



For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.



% FUN: Regridder 1 - for loop



import numpy as np

def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)

outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop

return outArray;
#END def Regridder1

from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale


% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done

return outArray;
#END def Regridder2









share|improve this question















marked as duplicate by unutbu arrays
Users with the  arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Is the [4,4] array example the desired output?, or... i am confused?

    – U9-Forward
    Nov 16 '18 at 3:20













  • It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

    – user2403531
    Nov 16 '18 at 3:48











  • It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

    – unutbu
    Nov 16 '18 at 3:56













  • No need to compute outSize in Regridder2.

    – unutbu
    Nov 16 '18 at 4:00











  • outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

    – unutbu
    Nov 16 '18 at 4:01














2












2








2


1







This question already has an answer here:




  • How to repeat elements of an array along two axes?

    4 answers




Note on duplicate message:



Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.



Goal:



Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.



Problem:



I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.



For a working example, a [2,2] array of [16,24;8,16]:




16 , 24



8 , 16




Multiplied by a factor of 2 for a [4,4] array would have the values:




4 , 4 , 6 , 6



4 , 4 , 6 , 6



2 , 2 , 4 , 4



2 , 2 , 4 , 4




The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.



It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.



Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.



Code examples:



Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).



Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.



% MAIN: To run these



import timeit

timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));

timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));


% FUN: Regridder 1 - for loop



import numpy as np
from numba import prange, jit

@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places

return outArray;


% FUN: Regridder 2 - numpy



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;

return outArray;


Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.



Current best solution:



On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.



For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.



% FUN: Regridder 1 - for loop



import numpy as np

def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)

outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop

return outArray;
#END def Regridder1

from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale


% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done

return outArray;
#END def Regridder2









share|improve this question

















This question already has an answer here:




  • How to repeat elements of an array along two axes?

    4 answers




Note on duplicate message:



Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.



Goal:



Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.



Problem:



I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.



For a working example, a [2,2] array of [16,24;8,16]:




16 , 24



8 , 16




Multiplied by a factor of 2 for a [4,4] array would have the values:




4 , 4 , 6 , 6



4 , 4 , 6 , 6



2 , 2 , 4 , 4



2 , 2 , 4 , 4




The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.



It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.



Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.



Code examples:



Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).



Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.



% MAIN: To run these



import timeit

timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));

timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np

factor = 10;

inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';

print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));


% FUN: Regridder 1 - for loop



import numpy as np
from numba import prange, jit

@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places

return outArray;


% FUN: Regridder 2 - numpy



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;

return outArray;


Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.



Current best solution:



On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.



For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.



% FUN: Regridder 1 - for loop



import numpy as np

def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];

outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)

outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop

return outArray;
#END def Regridder1

from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale


% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer



import numpy as np

def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops

outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done

return outArray;
#END def Regridder2




This question already has an answer here:




  • How to repeat elements of an array along two axes?

    4 answers








python arrays numpy scaling






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 3 '18 at 23:35







user2403531

















asked Nov 16 '18 at 3:15









user2403531user2403531

487




487




marked as duplicate by unutbu arrays
Users with the  arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by unutbu arrays
Users with the  arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • Is the [4,4] array example the desired output?, or... i am confused?

    – U9-Forward
    Nov 16 '18 at 3:20













  • It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

    – user2403531
    Nov 16 '18 at 3:48











  • It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

    – unutbu
    Nov 16 '18 at 3:56













  • No need to compute outSize in Regridder2.

    – unutbu
    Nov 16 '18 at 4:00











  • outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

    – unutbu
    Nov 16 '18 at 4:01



















  • Is the [4,4] array example the desired output?, or... i am confused?

    – U9-Forward
    Nov 16 '18 at 3:20













  • It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

    – user2403531
    Nov 16 '18 at 3:48











  • It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

    – unutbu
    Nov 16 '18 at 3:56













  • No need to compute outSize in Regridder2.

    – unutbu
    Nov 16 '18 at 4:00











  • outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

    – unutbu
    Nov 16 '18 at 4:01

















Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20







Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20















It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48





It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48













It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56







It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56















No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00





No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00













outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01





outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01












2 Answers
2






active

oldest

votes


















4














I did some benchmarks about this using a 512x512 byte image (10x upscale):



a = np.empty((512, 512), 'B')


Repeat Twice



>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Repeat Once + Reshape



>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The two methods above all involve copying twice, while two methods below all copies once.



Fancy Indexing



Since t can be repeatedly used (and pre-computed), it is not timed.



>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Viewing + Reshape



>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins



2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


while the result for 2048x2048 float64 image is



3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which, though the itemsize is 8 times larger, didn't take much more time






share|improve this answer


























  • Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

    – user2403531
    Nov 19 '18 at 19:43





















3














Some new functions which show that order of operations is important :



import numpy as np
from numba import jit

A=np.random.rand(2048,2048)

@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B

def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)

def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)

def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)


And runs :



In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""





share|improve this answer





















  • 1





    Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

    – ZisIsNotZis
    Nov 19 '18 at 2:26




















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














I did some benchmarks about this using a 512x512 byte image (10x upscale):



a = np.empty((512, 512), 'B')


Repeat Twice



>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Repeat Once + Reshape



>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The two methods above all involve copying twice, while two methods below all copies once.



Fancy Indexing



Since t can be repeatedly used (and pre-computed), it is not timed.



>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Viewing + Reshape



>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins



2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


while the result for 2048x2048 float64 image is



3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which, though the itemsize is 8 times larger, didn't take much more time






share|improve this answer


























  • Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

    – user2403531
    Nov 19 '18 at 19:43


















4














I did some benchmarks about this using a 512x512 byte image (10x upscale):



a = np.empty((512, 512), 'B')


Repeat Twice



>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Repeat Once + Reshape



>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The two methods above all involve copying twice, while two methods below all copies once.



Fancy Indexing



Since t can be repeatedly used (and pre-computed), it is not timed.



>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Viewing + Reshape



>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins



2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


while the result for 2048x2048 float64 image is



3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which, though the itemsize is 8 times larger, didn't take much more time






share|improve this answer


























  • Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

    – user2403531
    Nov 19 '18 at 19:43
















4












4








4







I did some benchmarks about this using a 512x512 byte image (10x upscale):



a = np.empty((512, 512), 'B')


Repeat Twice



>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Repeat Once + Reshape



>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The two methods above all involve copying twice, while two methods below all copies once.



Fancy Indexing



Since t can be repeatedly used (and pre-computed), it is not timed.



>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Viewing + Reshape



>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins



2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


while the result for 2048x2048 float64 image is



3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which, though the itemsize is 8 times larger, didn't take much more time






share|improve this answer















I did some benchmarks about this using a 512x512 byte image (10x upscale):



a = np.empty((512, 512), 'B')


Repeat Twice



>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Repeat Once + Reshape



>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The two methods above all involve copying twice, while two methods below all copies once.



Fancy Indexing



Since t can be repeatedly used (and pre-computed), it is not timed.



>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Viewing + Reshape



>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins



2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


while the result for 2048x2048 float64 image is



3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which, though the itemsize is 8 times larger, didn't take much more time







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 '18 at 5:58

























answered Nov 16 '18 at 4:49









ZisIsNotZisZisIsNotZis

725619




725619













  • Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

    – user2403531
    Nov 19 '18 at 19:43





















  • Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

    – user2403531
    Nov 19 '18 at 19:43



















Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43







Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43















3














Some new functions which show that order of operations is important :



import numpy as np
from numba import jit

A=np.random.rand(2048,2048)

@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B

def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)

def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)

def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)


And runs :



In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""





share|improve this answer





















  • 1





    Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

    – ZisIsNotZis
    Nov 19 '18 at 2:26


















3














Some new functions which show that order of operations is important :



import numpy as np
from numba import jit

A=np.random.rand(2048,2048)

@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B

def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)

def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)

def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)


And runs :



In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""





share|improve this answer





















  • 1





    Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

    – ZisIsNotZis
    Nov 19 '18 at 2:26
















3












3








3







Some new functions which show that order of operations is important :



import numpy as np
from numba import jit

A=np.random.rand(2048,2048)

@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B

def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)

def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)

def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)


And runs :



In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""





share|improve this answer















Some new functions which show that order of operations is important :



import numpy as np
from numba import jit

A=np.random.rand(2048,2048)

@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B

def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)

def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)

def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)


And runs :



In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 '18 at 17:57

























answered Nov 16 '18 at 17:35









B. M.B. M.

13.1k11934




13.1k11934








  • 1





    Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

    – ZisIsNotZis
    Nov 19 '18 at 2:26
















  • 1





    Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

    – ZisIsNotZis
    Nov 19 '18 at 2:26










1




1





Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26







Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26





這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini