Numpy view contiguous part of non-contiguous array as dtype of bigger size
I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:
# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')
Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:
tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)
This generates a trigram list with shape (2**28-2, 3)
where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3
) so that numpy displays it more "reasonably" (instead of individual chars).
tri = tri.view('S3')
It gives the exception:
ValueError: To change to a dtype of a different size, the array must be C-contiguous
I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.
So I'm wondering how to view
contiguous part in non-contiguous np.ndarray
as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape
and stride
freely with np.lib.stride_tricks.as_strided
, but I can't force the dtype
to be something, which is the problem here.
EDIT
Non-contiguous array can be made by simple slicing. For example:
np.empty((8, 4), 'uint32')[:, :2].view('uint64')
will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.
python arrays numpy memory-layout
add a comment |
I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:
# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')
Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:
tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)
This generates a trigram list with shape (2**28-2, 3)
where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3
) so that numpy displays it more "reasonably" (instead of individual chars).
tri = tri.view('S3')
It gives the exception:
ValueError: To change to a dtype of a different size, the array must be C-contiguous
I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.
So I'm wondering how to view
contiguous part in non-contiguous np.ndarray
as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape
and stride
freely with np.lib.stride_tricks.as_strided
, but I can't force the dtype
to be something, which is the problem here.
EDIT
Non-contiguous array can be made by simple slicing. For example:
np.empty((8, 4), 'uint32')[:, :2].view('uint64')
will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.
python arrays numpy memory-layout
What aboutnp.ascontiguousarray(tri).view('S3')
?
– AndyK
Nov 14 '18 at 9:44
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at theflags
. Evidentlyview
isn't going the extra step of verifying that the 8 bytes it needs for eachuint64
are contiguous.
– hpaulj
Nov 14 '18 at 17:43
add a comment |
I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:
# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')
Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:
tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)
This generates a trigram list with shape (2**28-2, 3)
where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3
) so that numpy displays it more "reasonably" (instead of individual chars).
tri = tri.view('S3')
It gives the exception:
ValueError: To change to a dtype of a different size, the array must be C-contiguous
I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.
So I'm wondering how to view
contiguous part in non-contiguous np.ndarray
as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape
and stride
freely with np.lib.stride_tricks.as_strided
, but I can't force the dtype
to be something, which is the problem here.
EDIT
Non-contiguous array can be made by simple slicing. For example:
np.empty((8, 4), 'uint32')[:, :2].view('uint64')
will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.
python arrays numpy memory-layout
I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:
# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')
Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:
tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)
This generates a trigram list with shape (2**28-2, 3)
where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3
) so that numpy displays it more "reasonably" (instead of individual chars).
tri = tri.view('S3')
It gives the exception:
ValueError: To change to a dtype of a different size, the array must be C-contiguous
I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.
So I'm wondering how to view
contiguous part in non-contiguous np.ndarray
as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape
and stride
freely with np.lib.stride_tricks.as_strided
, but I can't force the dtype
to be something, which is the problem here.
EDIT
Non-contiguous array can be made by simple slicing. For example:
np.empty((8, 4), 'uint32')[:, :2].view('uint64')
will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.
python arrays numpy memory-layout
python arrays numpy memory-layout
edited Nov 14 '18 at 9:11
ZisIsNotZis
asked Nov 14 '18 at 9:03
ZisIsNotZisZisIsNotZis
723619
723619
What aboutnp.ascontiguousarray(tri).view('S3')
?
– AndyK
Nov 14 '18 at 9:44
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at theflags
. Evidentlyview
isn't going the extra step of verifying that the 8 bytes it needs for eachuint64
are contiguous.
– hpaulj
Nov 14 '18 at 17:43
add a comment |
What aboutnp.ascontiguousarray(tri).view('S3')
?
– AndyK
Nov 14 '18 at 9:44
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at theflags
. Evidentlyview
isn't going the extra step of verifying that the 8 bytes it needs for eachuint64
are contiguous.
– hpaulj
Nov 14 '18 at 17:43
What about
np.ascontiguousarray(tri).view('S3')
?– AndyK
Nov 14 '18 at 9:44
What about
np.ascontiguousarray(tri).view('S3')
?– AndyK
Nov 14 '18 at 9:44
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the
[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags
. Evidently view
isn't going the extra step of verifying that the 8 bytes it needs for each uint64
are contiguous.– hpaulj
Nov 14 '18 at 17:43
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the
[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags
. Evidently view
isn't going the extra step of verifying that the 8 bytes it needs for each uint64
are contiguous.– hpaulj
Nov 14 '18 at 17:43
add a comment |
1 Answer
1
active
oldest
votes
If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.
For example your trigrams can be obtained like so:
>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')
In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided
does not do many checks we are essentially free to do whatever we like.
It seems we can always get such a stub by slicing to a size 0 array. For your second example:
>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!
– ZisIsNotZis
Nov 15 '18 at 1:34
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296394%2fnumpy-view-contiguous-part-of-non-contiguous-array-as-dtype-of-bigger-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.
For example your trigrams can be obtained like so:
>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')
In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided
does not do many checks we are essentially free to do whatever we like.
It seems we can always get such a stub by slicing to a size 0 array. For your second example:
>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!
– ZisIsNotZis
Nov 15 '18 at 1:34
add a comment |
If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.
For example your trigrams can be obtained like so:
>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')
In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided
does not do many checks we are essentially free to do whatever we like.
It seems we can always get such a stub by slicing to a size 0 array. For your second example:
>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!
– ZisIsNotZis
Nov 15 '18 at 1:34
add a comment |
If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.
For example your trigrams can be obtained like so:
>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')
In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided
does not do many checks we are essentially free to do whatever we like.
It seems we can always get such a stub by slicing to a size 0 array. For your second example:
>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)
If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.
For example your trigrams can be obtained like so:
>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')
In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided
does not do many checks we are essentially free to do whatever we like.
It seems we can always get such a stub by slicing to a size 0 array. For your second example:
>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)
edited Nov 14 '18 at 10:02
answered Nov 14 '18 at 9:45
Paul PanzerPaul Panzer
30k21240
30k21240
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!
– ZisIsNotZis
Nov 15 '18 at 1:34
add a comment |
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!
– ZisIsNotZis
Nov 15 '18 at 1:34
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
That's interesting, although quite difficult to understand why it works. +1
– AndyK
Nov 14 '18 at 10:06
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!– ZisIsNotZis
Nov 15 '18 at 1:34
view
ing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!– ZisIsNotZis
Nov 15 '18 at 1:34
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296394%2fnumpy-view-contiguous-part-of-non-contiguous-array-as-dtype-of-bigger-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What about
np.ascontiguousarray(tri).view('S3')
?– AndyK
Nov 14 '18 at 9:44
@AndyK I believe OP wants to avoid the copy that this forces.
– Paul Panzer
Nov 14 '18 at 9:55
The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the
[:,:2]
case there are 2 elements, then a gap, 2 more elements, etc. Look at theflags
. Evidentlyview
isn't going the extra step of verifying that the 8 bytes it needs for eachuint64
are contiguous.– hpaulj
Nov 14 '18 at 17:43