Python - Quick Upscaling of Array with Numpy, No Image Libary Allowed [duplicate]
This question already has an answer here:
How to repeat elements of an array along two axes?
4 answers
Note on duplicate message:
Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.
Goal:
Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.
Problem:
I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.
For a working example, a [2,2] array of [16,24;8,16]:
16 , 24
8 , 16
Multiplied by a factor of 2 for a [4,4] array would have the values:
4 , 4 , 6 , 6
4 , 4 , 6 , 6
2 , 2 , 4 , 4
2 , 2 , 4 , 4
The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.
It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.
Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.
Code examples:
Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).
Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.
% MAIN: To run these
import timeit
timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));
timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));
% FUN: Regridder 1 - for loop
import numpy as np
from numba import prange, jit
@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places
return outArray;
% FUN: Regridder 2 - numpy
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;
return outArray;
Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.
Current best solution:
On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.
For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.
% FUN: Regridder 1 - for loop
import numpy as np
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)
outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop
return outArray;
#END def Regridder1
from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale
% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done
return outArray;
#END def Regridder2
python arrays numpy scaling
marked as duplicate by unutbu
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
|
show 3 more comments
This question already has an answer here:
How to repeat elements of an array along two axes?
4 answers
Note on duplicate message:
Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.
Goal:
Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.
Problem:
I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.
For a working example, a [2,2] array of [16,24;8,16]:
16 , 24
8 , 16
Multiplied by a factor of 2 for a [4,4] array would have the values:
4 , 4 , 6 , 6
4 , 4 , 6 , 6
2 , 2 , 4 , 4
2 , 2 , 4 , 4
The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.
It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.
Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.
Code examples:
Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).
Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.
% MAIN: To run these
import timeit
timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));
timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));
% FUN: Regridder 1 - for loop
import numpy as np
from numba import prange, jit
@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places
return outArray;
% FUN: Regridder 2 - numpy
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;
return outArray;
Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.
Current best solution:
On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.
For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.
% FUN: Regridder 1 - for loop
import numpy as np
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)
outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop
return outArray;
#END def Regridder1
from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale
% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done
return outArray;
#END def Regridder2
python arrays numpy scaling
marked as duplicate by unutbu
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Is the[4,4]
array example the desired output?, or... i am confused?
– U9-Forward
Nov 16 '18 at 3:20
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It's a teensy bit faster to do the division first, before callingrepeat
in Regridder2 (as you already did in Regridder1). i.e.outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
No need to computeoutSize
in Regridder2.
– unutbu
Nov 16 '18 at 4:00
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to computeoutArray
in Regridder2, but nowhere near as fast as Regridder1.
– unutbu
Nov 16 '18 at 4:01
|
show 3 more comments
This question already has an answer here:
How to repeat elements of an array along two axes?
4 answers
Note on duplicate message:
Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.
Goal:
Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.
Problem:
I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.
For a working example, a [2,2] array of [16,24;8,16]:
16 , 24
8 , 16
Multiplied by a factor of 2 for a [4,4] array would have the values:
4 , 4 , 6 , 6
4 , 4 , 6 , 6
2 , 2 , 4 , 4
2 , 2 , 4 , 4
The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.
It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.
Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.
Code examples:
Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).
Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.
% MAIN: To run these
import timeit
timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));
timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));
% FUN: Regridder 1 - for loop
import numpy as np
from numba import prange, jit
@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places
return outArray;
% FUN: Regridder 2 - numpy
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;
return outArray;
Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.
Current best solution:
On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.
For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.
% FUN: Regridder 1 - for loop
import numpy as np
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)
outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop
return outArray;
#END def Regridder1
from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale
% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done
return outArray;
#END def Regridder2
python arrays numpy scaling
This question already has an answer here:
How to repeat elements of an array along two axes?
4 answers
Note on duplicate message:
Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.
Goal:
Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.
Problem:
I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.
For a working example, a [2,2] array of [16,24;8,16]:
16 , 24
8 , 16
Multiplied by a factor of 2 for a [4,4] array would have the values:
4 , 4 , 6 , 6
4 , 4 , 6 , 6
2 , 2 , 4 , 4
2 , 2 , 4 , 4
The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.
It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.
Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.
Code examples:
Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).
Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.
% MAIN: To run these
import timeit
timeitSetup = '''
from Regridder1 import Regridder1
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));
timeitSetup = '''
from Regridder2 import Regridder2
import numpy as np
factor = 10;
inArrayX = np.float64(np.arange(0,2048,1));
inArrayY = np.float64(np.arange(0,2048,1));
[inArray, _] = np.meshgrid(inArrayX,inArrayY);
''';
print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));
% FUN: Regridder 1 - for loop
import numpy as np
from numba import prange, jit
@jit(nogil=True)
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.zeros(outSize); #preallcoate
outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places
return outArray;
% FUN: Regridder 2 - numpy
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;
return outArray;
Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.
Current best solution:
On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.
For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.
% FUN: Regridder 1 - for loop
import numpy as np
def Regridder1(inArray,factor):
inSize = np.shape(inArray);
outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];
outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.empty(outSize) #preallcoate
outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster
factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)
outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop
return outArray;
#END def Regridder1
from numba import jit, prange
@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True
def RegridderUpscale(inSize, factor, outArray, outBlocks ):
for i in prange(0,inSize[0]):
for j in prange(0,inSize[1]):
outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];
#END for j
#END for i
#scales the original data up, note for other languages you need i*factor+factor-1 because slicing
return outArray; #return success
#END def RegridderUpscale
% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer
import numpy as np
def Regridder2(inArray,factor):
inSize = np.shape(inArray);
#outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops
outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels
outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done
return outArray;
#END def Regridder2
This question already has an answer here:
How to repeat elements of an array along two axes?
4 answers
python arrays numpy scaling
python arrays numpy scaling
edited Dec 3 '18 at 23:35
user2403531
asked Nov 16 '18 at 3:15
user2403531user2403531
487
487
marked as duplicate by unutbu
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by unutbu
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Is the[4,4]
array example the desired output?, or... i am confused?
– U9-Forward
Nov 16 '18 at 3:20
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It's a teensy bit faster to do the division first, before callingrepeat
in Regridder2 (as you already did in Regridder1). i.e.outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
No need to computeoutSize
in Regridder2.
– unutbu
Nov 16 '18 at 4:00
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to computeoutArray
in Regridder2, but nowhere near as fast as Regridder1.
– unutbu
Nov 16 '18 at 4:01
|
show 3 more comments
Is the[4,4]
array example the desired output?, or... i am confused?
– U9-Forward
Nov 16 '18 at 3:20
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It's a teensy bit faster to do the division first, before callingrepeat
in Regridder2 (as you already did in Regridder1). i.e.outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
No need to computeoutSize
in Regridder2.
– unutbu
Nov 16 '18 at 4:00
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to computeoutArray
in Regridder2, but nowhere near as fast as Regridder1.
– unutbu
Nov 16 '18 at 4:01
Is the
[4,4]
array example the desired output?, or... i am confused?– U9-Forward
Nov 16 '18 at 3:20
Is the
[4,4]
array example the desired output?, or... i am confused?– U9-Forward
Nov 16 '18 at 3:20
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It's a teensy bit faster to do the division first, before calling
repeat
in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
It's a teensy bit faster to do the division first, before calling
repeat
in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
No need to compute
outSize
in Regridder2.– unutbu
Nov 16 '18 at 4:00
No need to compute
outSize
in Regridder2.– unutbu
Nov 16 '18 at 4:00
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to compute outArray
in Regridder2, but nowhere near as fast as Regridder1.– unutbu
Nov 16 '18 at 4:01
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to compute outArray
in Regridder2, but nowhere near as fast as Regridder1.– unutbu
Nov 16 '18 at 4:01
|
show 3 more comments
2 Answers
2
active
oldest
votes
I did some benchmarks about this using a 512x512
byte image (10x upscale):
a = np.empty((512, 512), 'B')
Repeat Twice
>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Repeat Once + Reshape
>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The two methods above all involve copying twice, while two methods below all copies once.
Fancy Indexing
Since t
can be repeatedly used (and pre-computed), it is not timed.
>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Viewing + Reshape
>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048
byte image is the following where view + reshape still wins
2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
while the result for 2048x2048
float64
image is
3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which, though the itemsize is 8 times larger, didn't take much more time
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
add a comment |
Some new functions which show that order of operations is important :
import numpy as np
from numba import jit
A=np.random.rand(2048,2048)
@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B
def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)
def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)
def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)
And runs :
In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshapingview
. I guess that's because the firstrepeat
takes negligible time compares to the secondrepeat
. If the secondrepeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)
– ZisIsNotZis
Nov 19 '18 at 2:26
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I did some benchmarks about this using a 512x512
byte image (10x upscale):
a = np.empty((512, 512), 'B')
Repeat Twice
>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Repeat Once + Reshape
>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The two methods above all involve copying twice, while two methods below all copies once.
Fancy Indexing
Since t
can be repeatedly used (and pre-computed), it is not timed.
>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Viewing + Reshape
>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048
byte image is the following where view + reshape still wins
2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
while the result for 2048x2048
float64
image is
3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which, though the itemsize is 8 times larger, didn't take much more time
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
add a comment |
I did some benchmarks about this using a 512x512
byte image (10x upscale):
a = np.empty((512, 512), 'B')
Repeat Twice
>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Repeat Once + Reshape
>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The two methods above all involve copying twice, while two methods below all copies once.
Fancy Indexing
Since t
can be repeatedly used (and pre-computed), it is not timed.
>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Viewing + Reshape
>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048
byte image is the following where view + reshape still wins
2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
while the result for 2048x2048
float64
image is
3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which, though the itemsize is 8 times larger, didn't take much more time
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
add a comment |
I did some benchmarks about this using a 512x512
byte image (10x upscale):
a = np.empty((512, 512), 'B')
Repeat Twice
>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Repeat Once + Reshape
>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The two methods above all involve copying twice, while two methods below all copies once.
Fancy Indexing
Since t
can be repeatedly used (and pre-computed), it is not timed.
>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Viewing + Reshape
>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048
byte image is the following where view + reshape still wins
2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
while the result for 2048x2048
float64
image is
3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which, though the itemsize is 8 times larger, didn't take much more time
I did some benchmarks about this using a 512x512
byte image (10x upscale):
a = np.empty((512, 512), 'B')
Repeat Twice
>>> %timeit a.repeat(10, 0).repeat(10, 1)
127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Repeat Once + Reshape
>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)
150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The two methods above all involve copying twice, while two methods below all copies once.
Fancy Indexing
Since t
can be repeatedly used (and pre-computed), it is not timed.
>>> t = np.arange(512, dtype='B').repeat(10)
>>> %timeit a[t[:,None], t]
143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Viewing + Reshape
>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)
29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048
byte image is the following where view + reshape still wins
2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
while the result for 2048x2048
float64
image is
3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which, though the itemsize is 8 times larger, didn't take much more time
edited Nov 16 '18 at 5:58
answered Nov 16 '18 at 4:49
ZisIsNotZisZisIsNotZis
725619
725619
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
add a comment |
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!
– user2403531
Nov 19 '18 at 19:43
add a comment |
Some new functions which show that order of operations is important :
import numpy as np
from numba import jit
A=np.random.rand(2048,2048)
@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B
def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)
def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)
def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)
And runs :
In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshapingview
. I guess that's because the firstrepeat
takes negligible time compares to the secondrepeat
. If the secondrepeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)
– ZisIsNotZis
Nov 19 '18 at 2:26
add a comment |
Some new functions which show that order of operations is important :
import numpy as np
from numba import jit
A=np.random.rand(2048,2048)
@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B
def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)
def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)
def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)
And runs :
In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshapingview
. I guess that's because the firstrepeat
takes negligible time compares to the secondrepeat
. If the secondrepeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)
– ZisIsNotZis
Nov 19 '18 at 2:26
add a comment |
Some new functions which show that order of operations is important :
import numpy as np
from numba import jit
A=np.random.rand(2048,2048)
@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B
def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)
def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)
def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)
And runs :
In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""
Some new functions which show that order of operations is important :
import numpy as np
from numba import jit
A=np.random.rand(2048,2048)
@jit
def reg1(A,factor):
factor2=factor**2
a,b = [factor*s for s in A.shape]
B=np.empty((a,b),A.dtype)
Bf=B.ravel()
k=0
for i in range(A.shape[0]):
Ai=A[i]
for _ in range(factor):
for j in range(A.shape[1]):
x=Ai[j]/factor2
for _ in range(factor):
Bf[k]=x
k += 1
return B
def reg2(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)
def reg3(A,factor):
return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)
def reg4(A,factor):
shx,shy=A.shape
stx,sty=A.strides
B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),
shape=(shx,factor,shy,factor))
return B.reshape(shx*factor,shy*factor)
And runs :
In [47]: %timeit _=Regridder1(A,5)
672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit _=reg1(A,5)
522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [49]: %timeit _=reg2(A,5)
1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [50]: %timeit _=reg3(A,5)
782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [51]: %timeit _=reg4(A,5)
860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
"""
edited Nov 16 '18 at 17:57
answered Nov 16 '18 at 17:35
B. M.B. M.
13.1k11934
13.1k11934
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshapingview
. I guess that's because the firstrepeat
takes negligible time compares to the secondrepeat
. If the secondrepeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)
– ZisIsNotZis
Nov 19 '18 at 2:26
add a comment |
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshapingview
. I guess that's because the firstrepeat
takes negligible time compares to the secondrepeat
. If the secondrepeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)
– ZisIsNotZis
Nov 19 '18 at 2:26
1
1
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping
view
. I guess that's because the first repeat
takes negligible time compares to the second repeat
. If the second repeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)– ZisIsNotZis
Nov 19 '18 at 2:26
Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping
view
. I guess that's because the first repeat
takes negligible time compares to the second repeat
. If the second repeat
is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)– ZisIsNotZis
Nov 19 '18 at 2:26
add a comment |
Is the
[4,4]
array example the desired output?, or... i am confused?– U9-Forward
Nov 16 '18 at 3:20
It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.
– user2403531
Nov 16 '18 at 3:48
It's a teensy bit faster to do the division first, before calling
repeat
in Regridder2 (as you already did in Regridder1). i.e.outArray = (inArray/outBlockSize).repeat(...)...
– unutbu
Nov 16 '18 at 3:56
No need to compute
outSize
in Regridder2.– unutbu
Nov 16 '18 at 4:00
outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))
is a marginally faster way to computeoutArray
in Regridder2, but nowhere near as fast as Regridder1.– unutbu
Nov 16 '18 at 4:01