Python - Quick Upscaling of Array with Numpy, No Image Libary Allowed [duplicate]

This question already has an answer here:

How to repeat elements of an array along two axes?

4 answers

Note on duplicate message:

Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.

Goal:

Upscale an array from [small,small] to [big,big] by a factor quickly, don't use an image library. Very simple scaling, one small value will become several big values, after it is normalized for the several big values it becomes. In other words, this is "flux conserving" from an astronomical wording - a value of 16 from the small array spread into a big array's 4 values (factor of 2) would be 4 4's so the amount of the value has been retained.

Problem:

I've got some working codes to do the upscaling, but they don't work very fast compared to downscaling. Upscaling is actually easier than downscaling (which requires many sums, in this basic case) - upscaling just requires already-known data to be put in big chunks of a preallocated array.

For a working example, a [2,2] array of [16,24;8,16]:

16 , 24

8 , 16

Multiplied by a factor of 2 for a [4,4] array would have the values:

4 , 4 , 6 , 6

4 , 4 , 6 , 6

2 , 2 , 4 , 4

2 , 2 , 4 , 4

The fastest implementation is a for loop accelerated by numba's jit & prange. I'd like to better leverage Numpy's pre-compiled functions to get this job done. I'll also entertain Scipy stuff - but not its resizing functions.

It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.

Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.

Code examples:

Check more optimized calls below Be warned, the case I have here makes a 20480x20480 float64 array that can take up a fair bit of memory - but can show off if a method is too memory intensive (as matrices can be).

Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.

% MAIN: To run these

import timeit



timeitSetup = ''' 

from Regridder1 import Regridder1

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));



timeitSetup = ''' 

from Regridder2 import Regridder2

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));

% FUN: Regridder 1 - for loop

import numpy as np

from numba import prange, jit



@jit(nogil=True)

def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.zeros(outSize); #preallcoate

    outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places



    return outArray;

% FUN: Regridder 2 - numpy

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;



    return outArray;

Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.

Current best solution:

On my comp, the numba's jit for loop implementation (Regridder1) with jit applied to only what needs it can run the timeit test at 18.0 sec, while the numpy only implementation (Regridder2) runs the timeit test at 18.5 sec. The bonus is that on the first call, the numpy only implementation doesn't need to wait for jit to compile the code. Jit's cache=True lets it not compile on subsequent runs. The other calls (nogil, nopython, prange) don't seem to help but also don't seem to hurt. Maybe in future numba updates they'll do better or something.

For simplicity and portability, Regridder2 is the best option. It's nearly as fast, and doesn't need numba installed (which for my Anaconda install required me to go install it) - so it'll help portability.

% FUN: Regridder 1 - for loop

import numpy as np



def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.empty(outSize) #preallcoate

    outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster

    factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)



    outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop



    return outArray;

#END def Regridder1



from numba import jit, prange

@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True

def RegridderUpscale(inSize, factor, outArray, outBlocks ):

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];

        #END for j

    #END for i

    #scales the original data up, note for other languages you need i*factor+factor-1 because slicing

    return outArray; #return success

#END def RegridderUpscale

% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    #outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done



    return outArray;

#END def Regridder2

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

marked as duplicate by unutbu arrays
Users with the arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20

It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48

It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56

No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

|
show 3 more comments

This question already has an answer here:

How to repeat elements of an array along two axes?

4 answers

Note on duplicate message:

Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.

Goal:

Problem:

For a working example, a [2,2] array of [16,24;8,16]:

16 , 24

8 , 16

Multiplied by a factor of 2 for a [4,4] array would have the values:

4 , 4 , 6 , 6

4 , 4 , 6 , 6

2 , 2 , 4 , 4

2 , 2 , 4 , 4

It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.

Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.

Code examples:

Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.

% MAIN: To run these

import timeit



timeitSetup = ''' 

from Regridder1 import Regridder1

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));



timeitSetup = ''' 

from Regridder2 import Regridder2

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));

% FUN: Regridder 1 - for loop

import numpy as np

from numba import prange, jit



@jit(nogil=True)

def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.zeros(outSize); #preallcoate

    outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places



    return outArray;

% FUN: Regridder 2 - numpy

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;



    return outArray;

Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.

Current best solution:

% FUN: Regridder 1 - for loop

import numpy as np



def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.empty(outSize) #preallcoate

    outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster

    factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)



    outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop



    return outArray;

#END def Regridder1



from numba import jit, prange

@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True

def RegridderUpscale(inSize, factor, outArray, outBlocks ):

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];

        #END for j

    #END for i

    #scales the original data up, note for other languages you need i*factor+factor-1 because slicing

    return outArray; #return success

#END def RegridderUpscale

% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    #outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done



    return outArray;

#END def Regridder2

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

marked as duplicate by unutbu arrays
Users with the arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20

It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48

It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56

No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

|
show 3 more comments

This question already has an answer here:

How to repeat elements of an array along two axes?

4 answers

Note on duplicate message:

Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.

Goal:

Problem:

For a working example, a [2,2] array of [16,24;8,16]:

16 , 24

8 , 16

Multiplied by a factor of 2 for a [4,4] array would have the values:

4 , 4 , 6 , 6

4 , 4 , 6 , 6

2 , 2 , 4 , 4

2 , 2 , 4 , 4

It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.

Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.

Code examples:

Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.

% MAIN: To run these

import timeit



timeitSetup = ''' 

from Regridder1 import Regridder1

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));



timeitSetup = ''' 

from Regridder2 import Regridder2

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));

% FUN: Regridder 1 - for loop

import numpy as np

from numba import prange, jit



@jit(nogil=True)

def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.zeros(outSize); #preallcoate

    outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places



    return outArray;

% FUN: Regridder 2 - numpy

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;



    return outArray;

Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.

Current best solution:

% FUN: Regridder 1 - for loop

import numpy as np



def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.empty(outSize) #preallcoate

    outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster

    factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)



    outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop



    return outArray;

#END def Regridder1



from numba import jit, prange

@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True

def RegridderUpscale(inSize, factor, outArray, outBlocks ):

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];

        #END for j

    #END for i

    #scales the original data up, note for other languages you need i*factor+factor-1 because slicing

    return outArray; #return success

#END def RegridderUpscale

% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    #outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done



    return outArray;

#END def Regridder2

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

This question already has an answer here:

How to repeat elements of an array along two axes?

4 answers

Note on duplicate message:

Similar themes, not exactly a duplicate. Esp. since the loop is still the fastest method. Thanks.

Goal:

Problem:

For a working example, a [2,2] array of [16,24;8,16]:

16 , 24

8 , 16

Multiplied by a factor of 2 for a [4,4] array would have the values:

4 , 4 , 6 , 6

4 , 4 , 6 , 6

2 , 2 , 4 , 4

2 , 2 , 4 , 4

It seems like a perfect problem for strong matrix manipulation functions, but I just haven't managed to make it happen quickly.

Additionally, the single-line numpy call is way funky, so don't be surprized. But it's what it took to get it to align correctly.

Code examples:

Environment: Python 3, Windows, i5-4960K @ 4.5 GHz. Time to run for loop code is ~18.9 sec, time to run numpy code is ~52.5 sec on the shown examples.

% MAIN: To run these

import timeit



timeitSetup = ''' 

from Regridder1 import Regridder1

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 1: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder1(inArray, factor,)", number = 10) ));



timeitSetup = ''' 

from Regridder2 import Regridder2

import numpy as np



factor = 10;



inArrayX = np.float64(np.arange(0,2048,1));

inArrayY = np.float64(np.arange(0,2048,1));

[inArray, _] = np.meshgrid(inArrayX,inArrayY);

''';



print("Time to run 2: {}".format( timeit.timeit(setup=timeitSetup,stmt="Regridder2(inArray, factor,)", number = 10) ));

% FUN: Regridder 1 - for loop

import numpy as np

from numba import prange, jit



@jit(nogil=True)

def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.zeros(outSize); #preallcoate

    outBlocks = inArray/outBlockSize; #precalc the resized blocks to go faster

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j]; #puts normalized value in a bunch of places



    return outArray;

% FUN: Regridder 2 - numpy

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = inArray.repeat(factor).reshape(inSize[0],factor*inSize[1]).T.repeat(factor).reshape(inSize[0]*factor,inSize[1]*factor).T/outBlockSize;



    return outArray;

Would greatly appreciate insight into speeding this up. Hopefully code is good, formulated it in the text box.

Current best solution:

% FUN: Regridder 1 - for loop

import numpy as np



def Regridder1(inArray,factor):

    inSize = np.shape(inArray);

    outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))];



    outBlockSize = factor*factor #the block size where 1 inArray pixel is spread across # outArray pixels

    outArray = np.empty(outSize) #preallcoate

    outBlocks = inArray/outBlockSize #precalc the resized blocks to go faster

    factor = np.int64(factor) #convert to an integer to be safe (in case it's a 1.0 float)



    outArray = RegridderUpscale(inSize, factor, outArray, outBlocks) #call a function that has just the loop



    return outArray;

#END def Regridder1



from numba import jit, prange

@jit(nogil=True, nopython=True, cache=True) #nopython=True, nogil=True, parallel=True, cache=True

def RegridderUpscale(inSize, factor, outArray, outBlocks ):

    for i in prange(0,inSize[0]):

        for j in prange(0,inSize[1]):

            outArray[i*factor:(i*factor+factor),j*factor:(j*factor+factor)] = outBlocks[i,j];

        #END for j

    #END for i

    #scales the original data up, note for other languages you need i*factor+factor-1 because slicing

    return outArray; #return success

#END def RegridderUpscale

% FUN: Regridder 2 - numpy based on @ZisIsNotZis's answer

import numpy as np



def Regridder2(inArray,factor):

    inSize = np.shape(inArray);

    #outSize = [np.int64(np.round(inSize[0] * factor)), np.int64(np.round(inSize[1] * factor))]; #whoops



    outBlockSize = factor*factor; #the block size where 1 inArray pixel is spread across # outArray pixels



    outArray = np.broadcast_to( inArray[:,None,:,None]/outBlockSize, (inSize[0], factor, inSize[1], factor)).reshape(np.int64(factor*inSize[0]), np.int64(factor*inSize[1])); #single line call that gets the job done



    return outArray;

#END def Regridder2

This question already has an answer here:

How to repeat elements of an array along two axes?

4 answers

python arrays numpy scaling

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

edited Dec 3 '18 at 23:35

asked Nov 16 '18 at 3:15

user2403531

487

asked Nov 16 '18 at 3:15

user2403531

487

asked Nov 16 '18 at 3:15

user2403531

487

marked as duplicate by unutbu arrays
Users with the arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by unutbu arrays
Users with the arrays badge can single-handedly close arrays questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 19:03

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20

It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48

It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56

No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

|
show 3 more comments

Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20

It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48

It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56

No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor)) is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

Is the [4,4] array example the desired output?, or... i am confused?

– U9-Forward
Nov 16 '18 at 3:20

It could be an output (I included it as a visual example of what type of scaling is desired), but the 2048 -> 20480 in the code shows real world speed limitations much better.

– user2403531
Nov 16 '18 at 3:48

It's a teensy bit faster to do the division first, before calling repeat in Regridder2 (as you already did in Regridder1). i.e. outArray = (inArray/outBlockSize).repeat(...)...

– unutbu
Nov 16 '18 at 3:56

No need to compute outSize in Regridder2.

– unutbu
Nov 16 '18 at 4:00

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))

is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

outArray = ((inArray/outBlockSize).repeat(outBlockSize).reshape(inSize[0],inSize[1],factor,factor).swapaxes(1,2).reshape(inSize[0]*factor,inSize[1]*factor))

is a marginally faster way to compute outArray in Regridder2, but nowhere near as fast as Regridder1.

– unutbu
Nov 16 '18 at 4:01

|
show 3 more comments

2 Answers
2

active

oldest

votes

I did some benchmarks about this using a 512x512 byte image (10x upscale):

a = np.empty((512, 512), 'B')

Repeat Twice

>>> %timeit a.repeat(10, 0).repeat(10, 1)

127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Repeat Once + Reshape

>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)

150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The two methods above all involve copying twice, while two methods below all copies once.

Fancy Indexing

Since t can be repeatedly used (and pre-computed), it is not timed.

>>> t = np.arange(512, dtype='B').repeat(10)

>>> %timeit a[t[:,None], t]

143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing + Reshape

>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)

29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins

2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while the result for 2048x2048 float64 image is

3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which, though the itemsize is 8 times larger, didn't take much more time

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

add a comment |

Some new functions which show that order of operations is important :

import numpy as np

from numba import jit



A=np.random.rand(2048,2048)



@jit

def reg1(A,factor):

    factor2=factor**2

    a,b = [factor*s for s in A.shape]

    B=np.empty((a,b),A.dtype)

    Bf=B.ravel()

    k=0

    for i in range(A.shape[0]):

        Ai=A[i]

        for _ in range(factor):

            for j in range(A.shape[1]):

                x=Ai[j]/factor2

                for _ in range(factor):

                    Bf[k]=x

                    k += 1

    return B   



def reg2(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)



def reg3(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)



def reg4(A,factor):

    shx,shy=A.shape

    stx,sty=A.strides

    B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),

    shape=(shx,factor,shy,factor))

    return B.reshape(shx*factor,shy*factor)

And runs :

In [47]: %timeit _=Regridder1(A,5)

672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [48]: %timeit _=reg1(A,5)

522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [49]: %timeit _=reg2(A,5)

1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [50]: %timeit _=reg3(A,5)

782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [51]: %timeit _=reg4(A,5)

860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

"""

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

1

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I did some benchmarks about this using a 512x512 byte image (10x upscale):

a = np.empty((512, 512), 'B')

Repeat Twice

>>> %timeit a.repeat(10, 0).repeat(10, 1)

127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Repeat Once + Reshape

>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)

150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The two methods above all involve copying twice, while two methods below all copies once.

Fancy Indexing

Since t can be repeatedly used (and pre-computed), it is not timed.

>>> t = np.arange(512, dtype='B').repeat(10)

>>> %timeit a[t[:,None], t]

143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing + Reshape

>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)

29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins

2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while the result for 2048x2048 float64 image is

3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which, though the itemsize is 8 times larger, didn't take much more time

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

add a comment |

I did some benchmarks about this using a 512x512 byte image (10x upscale):

a = np.empty((512, 512), 'B')

Repeat Twice

>>> %timeit a.repeat(10, 0).repeat(10, 1)

127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Repeat Once + Reshape

>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)

150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The two methods above all involve copying twice, while two methods below all copies once.

Fancy Indexing

Since t can be repeatedly used (and pre-computed), it is not timed.

>>> t = np.arange(512, dtype='B').repeat(10)

>>> %timeit a[t[:,None], t]

143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing + Reshape

>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)

29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins

2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while the result for 2048x2048 float64 image is

3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which, though the itemsize is 8 times larger, didn't take much more time

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

add a comment |

I did some benchmarks about this using a 512x512 byte image (10x upscale):

a = np.empty((512, 512), 'B')

Repeat Twice

>>> %timeit a.repeat(10, 0).repeat(10, 1)

127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Repeat Once + Reshape

>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)

150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The two methods above all involve copying twice, while two methods below all copies once.

Fancy Indexing

Since t can be repeatedly used (and pre-computed), it is not timed.

>>> t = np.arange(512, dtype='B').repeat(10)

>>> %timeit a[t[:,None], t]

143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing + Reshape

>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)

29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins

2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while the result for 2048x2048 float64 image is

3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which, though the itemsize is 8 times larger, didn't take much more time

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

I did some benchmarks about this using a 512x512 byte image (10x upscale):

a = np.empty((512, 512), 'B')

Repeat Twice

>>> %timeit a.repeat(10, 0).repeat(10, 1)

127 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Repeat Once + Reshape

>>> %timeit a.repeat(100).reshape(512, 512, 10, 10).swapaxes(1, 2).reshape(5120, 5120)

150 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The two methods above all involve copying twice, while two methods below all copies once.

Fancy Indexing

Since t can be repeatedly used (and pre-computed), it is not timed.

>>> t = np.arange(512, dtype='B').repeat(10)

>>> %timeit a[t[:,None], t]

143 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing + Reshape

>>> %timeit np.broadcast_to(a[:,None,:,None], (512, 10, 512, 10)).reshape(5120, 5120)

29.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It seems that viewing + reshape wins (at least on my machine). The test result on 2048x2048 byte image is the following where view + reshape still wins

2.04 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.4 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.3 s ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

424 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while the result for 2048x2048 float64 image is

3.14 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

5.07 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.56 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1.8 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which, though the itemsize is 8 times larger, didn't take much more time

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

edited Nov 16 '18 at 5:58

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

answered Nov 16 '18 at 4:49

ZisIsNotZis

725619

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

add a comment |

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

Viewing + Reshape was indeed fast - fast enough to meet jit (at 2048->20480 jit for loop is 18.0 sec and this is 18.5 sec on my comp)! It also had the bonus that it can handle non-square arrays, while my .repeat.repeat function doesn't. It's only a hair slower, but it removes the need to rely on numba's at-times finicky jit. Thank you for the insight!

– user2403531
Nov 19 '18 at 19:43

add a comment |

Some new functions which show that order of operations is important :

import numpy as np

from numba import jit



A=np.random.rand(2048,2048)



@jit

def reg1(A,factor):

    factor2=factor**2

    a,b = [factor*s for s in A.shape]

    B=np.empty((a,b),A.dtype)

    Bf=B.ravel()

    k=0

    for i in range(A.shape[0]):

        Ai=A[i]

        for _ in range(factor):

            for j in range(A.shape[1]):

                x=Ai[j]/factor2

                for _ in range(factor):

                    Bf[k]=x

                    k += 1

    return B   



def reg2(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)



def reg3(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)



def reg4(A,factor):

    shx,shy=A.shape

    stx,sty=A.strides

    B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),

    shape=(shx,factor,shy,factor))

    return B.reshape(shx*factor,shy*factor)

And runs :

In [47]: %timeit _=Regridder1(A,5)

672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [48]: %timeit _=reg1(A,5)

522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [49]: %timeit _=reg2(A,5)

1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [50]: %timeit _=reg3(A,5)

782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [51]: %timeit _=reg4(A,5)

860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

"""

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

1

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

add a comment |

Some new functions which show that order of operations is important :

import numpy as np

from numba import jit



A=np.random.rand(2048,2048)



@jit

def reg1(A,factor):

    factor2=factor**2

    a,b = [factor*s for s in A.shape]

    B=np.empty((a,b),A.dtype)

    Bf=B.ravel()

    k=0

    for i in range(A.shape[0]):

        Ai=A[i]

        for _ in range(factor):

            for j in range(A.shape[1]):

                x=Ai[j]/factor2

                for _ in range(factor):

                    Bf[k]=x

                    k += 1

    return B   



def reg2(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)



def reg3(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)



def reg4(A,factor):

    shx,shy=A.shape

    stx,sty=A.strides

    B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),

    shape=(shx,factor,shy,factor))

    return B.reshape(shx*factor,shy*factor)

And runs :

In [47]: %timeit _=Regridder1(A,5)

672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [48]: %timeit _=reg1(A,5)

522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [49]: %timeit _=reg2(A,5)

1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [50]: %timeit _=reg3(A,5)

782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [51]: %timeit _=reg4(A,5)

860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

"""

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

1

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

add a comment |

Some new functions which show that order of operations is important :

import numpy as np

from numba import jit



A=np.random.rand(2048,2048)



@jit

def reg1(A,factor):

    factor2=factor**2

    a,b = [factor*s for s in A.shape]

    B=np.empty((a,b),A.dtype)

    Bf=B.ravel()

    k=0

    for i in range(A.shape[0]):

        Ai=A[i]

        for _ in range(factor):

            for j in range(A.shape[1]):

                x=Ai[j]/factor2

                for _ in range(factor):

                    Bf[k]=x

                    k += 1

    return B   



def reg2(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)



def reg3(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)



def reg4(A,factor):

    shx,shy=A.shape

    stx,sty=A.strides

    B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),

    shape=(shx,factor,shy,factor))

    return B.reshape(shx*factor,shy*factor)

And runs :

In [47]: %timeit _=Regridder1(A,5)

672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [48]: %timeit _=reg1(A,5)

522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [49]: %timeit _=reg2(A,5)

1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [50]: %timeit _=reg3(A,5)

782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [51]: %timeit _=reg4(A,5)

860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

"""

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

Some new functions which show that order of operations is important :

import numpy as np

from numba import jit



A=np.random.rand(2048,2048)



@jit

def reg1(A,factor):

    factor2=factor**2

    a,b = [factor*s for s in A.shape]

    B=np.empty((a,b),A.dtype)

    Bf=B.ravel()

    k=0

    for i in range(A.shape[0]):

        Ai=A[i]

        for _ in range(factor):

            for j in range(A.shape[1]):

                x=Ai[j]/factor2

                for _ in range(factor):

                    Bf[k]=x

                    k += 1

    return B   



def reg2(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,0),factor,1)



def reg3(A,factor):

    return np.repeat(np.repeat(A/factor**2,factor,1),factor,0)



def reg4(A,factor):

    shx,shy=A.shape

    stx,sty=A.strides

    B=np.broadcast_to((A/factor**2).reshape(shx,1,shy,1),

    shape=(shx,factor,shy,factor))

    return B.reshape(shx*factor,shy*factor)

And runs :

In [47]: %timeit _=Regridder1(A,5)

672 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [48]: %timeit _=reg1(A,5)

522 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [49]: %timeit _=reg2(A,5)

1.23 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [50]: %timeit _=reg3(A,5)

782 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [51]: %timeit _=reg4(A,5)

860 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

"""

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

edited Nov 16 '18 at 17:57

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

answered Nov 16 '18 at 17:35

B. M.

13.1k11934

1

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

add a comment |

1

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

Wow repeating horizontally first do is much faster than repeating vertically first, similar to reshaping view. I guess that's because the first repeat takes negligible time compares to the second repeat. If the second repeat is done "vertically", numpy is basically copying large continuous memory (i.e. 20480-len row) multiple times, while if done "horizontally", numpy have to copy continuous 2048-len rows repeatedly (CPU don't native-ly support non-continuous array)

– ZisIsNotZis
Nov 19 '18 at 2:26

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk