Read specific sections of a binary file containing 32-bit floats
I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.
In pseudocode:
new_list =
with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]
What is the most efficient/least confusing way to do this?
python python-3.x file-io
add a comment |
I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.
In pseudocode:
new_list =
with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]
What is the most efficient/least confusing way to do this?
python python-3.x file-io
Usenumpy.memmapto memory-map the file as a numpy array with dtypenumpy.float32.
– Warren Weckesser
Nov 21 '18 at 19:08
add a comment |
I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.
In pseudocode:
new_list =
with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]
What is the most efficient/least confusing way to do this?
python python-3.x file-io
I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.
In pseudocode:
new_list =
with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]
What is the most efficient/least confusing way to do this?
python python-3.x file-io
python python-3.x file-io
asked Nov 21 '18 at 18:58
questionable_codequestionable_code
14110
14110
Usenumpy.memmapto memory-map the file as a numpy array with dtypenumpy.float32.
– Warren Weckesser
Nov 21 '18 at 19:08
add a comment |
Usenumpy.memmapto memory-map the file as a numpy array with dtypenumpy.float32.
– Warren Weckesser
Nov 21 '18 at 19:08
Use
numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.– Warren Weckesser
Nov 21 '18 at 19:08
Use
numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.– Warren Weckesser
Nov 21 '18 at 19:08
add a comment |
2 Answers
2
active
oldest
votes
You can convert bytes to and from 32-bit float values using the struct module:
import random
import struct
FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'
# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)
# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))
# Read sections of file back at offset locations.
new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)
print('new_list =', new_list)
Sample output:
writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]
Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use aforloop containingstruct.unpack('f', buf)[0]to run thestruct.unpackoperation as many times as values I need from the line?
– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in aforloop, but it would be much more efficient to use thestruct.unpack()function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e.'4f'for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after aseek()to the beginning of the group, you would then need to read in the desired number ofFLOAT_SIZEbytes into thebufbuffer.
– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
add a comment |
The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).
import mmap
import numpy as np
import os
new_list =
with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8
I tested this with an n=10000 array of random floats, converted to bytes:
import random
import struct
a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))
Then I read this "a" variable into the numpy array, as you would with the binary information from file.
>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418878%2fread-specific-sections-of-a-binary-file-containing-32-bit-floats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can convert bytes to and from 32-bit float values using the struct module:
import random
import struct
FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'
# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)
# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))
# Read sections of file back at offset locations.
new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)
print('new_list =', new_list)
Sample output:
writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]
Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use aforloop containingstruct.unpack('f', buf)[0]to run thestruct.unpackoperation as many times as values I need from the line?
– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in aforloop, but it would be much more efficient to use thestruct.unpack()function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e.'4f'for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after aseek()to the beginning of the group, you would then need to read in the desired number ofFLOAT_SIZEbytes into thebufbuffer.
– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
add a comment |
You can convert bytes to and from 32-bit float values using the struct module:
import random
import struct
FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'
# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)
# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))
# Read sections of file back at offset locations.
new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)
print('new_list =', new_list)
Sample output:
writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]
Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use aforloop containingstruct.unpack('f', buf)[0]to run thestruct.unpackoperation as many times as values I need from the line?
– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in aforloop, but it would be much more efficient to use thestruct.unpack()function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e.'4f'for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after aseek()to the beginning of the group, you would then need to read in the desired number ofFLOAT_SIZEbytes into thebufbuffer.
– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
add a comment |
You can convert bytes to and from 32-bit float values using the struct module:
import random
import struct
FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'
# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)
# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))
# Read sections of file back at offset locations.
new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)
print('new_list =', new_list)
Sample output:
writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]
Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.
You can convert bytes to and from 32-bit float values using the struct module:
import random
import struct
FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'
# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)
# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))
# Read sections of file back at offset locations.
new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)
print('new_list =', new_list)
Sample output:
writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]
Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.
edited Nov 21 '18 at 19:57
answered Nov 21 '18 at 19:44
martineaumartineau
69k1091186
69k1091186
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use aforloop containingstruct.unpack('f', buf)[0]to run thestruct.unpackoperation as many times as values I need from the line?
– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in aforloop, but it would be much more efficient to use thestruct.unpack()function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e.'4f'for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after aseek()to the beginning of the group, you would then need to read in the desired number ofFLOAT_SIZEbytes into thebufbuffer.
– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
add a comment |
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use aforloop containingstruct.unpack('f', buf)[0]to run thestruct.unpackoperation as many times as values I need from the line?
– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in aforloop, but it would be much more efficient to use thestruct.unpack()function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e.'4f'for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after aseek()to the beginning of the group, you would then need to read in the desired number ofFLOAT_SIZEbytes into thebufbuffer.
– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a
for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?– questionable_code
Nov 22 '18 at 20:19
This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a
for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?– questionable_code
Nov 22 '18 at 20:19
@questionable_code: Yes, you could do it in a
for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.– martineau
Nov 22 '18 at 20:37
@questionable_code: Yes, you could do it in a
for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.– martineau
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
What if the number of values I need is variable? How would I write the format string for that?
– questionable_code
Nov 22 '18 at 20:37
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
@questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.
– martineau
Nov 22 '18 at 20:41
add a comment |
The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).
import mmap
import numpy as np
import os
new_list =
with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8
I tested this with an n=10000 array of random floats, converted to bytes:
import random
import struct
a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))
Then I read this "a" variable into the numpy array, as you would with the binary information from file.
>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408
add a comment |
The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).
import mmap
import numpy as np
import os
new_list =
with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8
I tested this with an n=10000 array of random floats, converted to bytes:
import random
import struct
a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))
Then I read this "a" variable into the numpy array, as you would with the binary information from file.
>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408
add a comment |
The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).
import mmap
import numpy as np
import os
new_list =
with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8
I tested this with an n=10000 array of random floats, converted to bytes:
import random
import struct
a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))
Then I read this "a" variable into the numpy array, as you would with the binary information from file.
>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408
The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).
import mmap
import numpy as np
import os
new_list =
with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8
I tested this with an n=10000 array of random floats, converted to bytes:
import random
import struct
a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))
Then I read this "a" variable into the numpy array, as you would with the binary information from file.
>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408
edited Nov 21 '18 at 20:18
answered Nov 21 '18 at 19:43
AlecZAlecZ
1555
1555
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418878%2fread-specific-sections-of-a-binary-file-containing-32-bit-floats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Use
numpy.memmapto memory-map the file as a numpy array with dtypenumpy.float32.– Warren Weckesser
Nov 21 '18 at 19:08