Read specific sections of a binary file containing 32-bit floats












2















I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.



In pseudocode:



new_list = 

with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]


What is the most efficient/least confusing way to do this?










share|improve this question























  • Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

    – Warren Weckesser
    Nov 21 '18 at 19:08


















2















I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.



In pseudocode:



new_list = 

with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]


What is the most efficient/least confusing way to do this?










share|improve this question























  • Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

    – Warren Weckesser
    Nov 21 '18 at 19:08
















2












2








2








I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.



In pseudocode:



new_list = 

with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]


What is the most efficient/least confusing way to do this?










share|improve this question














I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.



In pseudocode:



new_list = 

with open('my_file.data', 'rb') as file_in:
for idx, offset in enumerate(offset_values):
# seek in the file by the offset
# read n float32 values into new_list[idx][:]


What is the most efficient/least confusing way to do this?







python python-3.x file-io






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 18:58









questionable_codequestionable_code

14110




14110













  • Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

    – Warren Weckesser
    Nov 21 '18 at 19:08





















  • Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

    – Warren Weckesser
    Nov 21 '18 at 19:08



















Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

– Warren Weckesser
Nov 21 '18 at 19:08







Use numpy.memmap to memory-map the file as a numpy array with dtype numpy.float32.

– Warren Weckesser
Nov 21 '18 at 19:08














2 Answers
2






active

oldest

votes


















2














You can convert bytes to and from 32-bit float values using the struct module:



import random
import struct

FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'

# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)

# Create a test file
with open(filename, 'wb') as file:
for offset in offset_values:
file.seek(offset)
value = random.random()
print('writing value:', value, 'at offset', offset)
file.write(struct.pack('f', value))

# Read sections of file back at offset locations.

new_list =
with open(filename, 'rb') as file:
for offset in offset_values:
file.seek(offset)
buf = file.read(FLOAT_SIZE)
value = struct.unpack('f', buf)[0]
print('read value:', value, 'at offset', offset)
new_list.append(value)

print('new_list =', new_list)


Sample output:



writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
0.9733690023422241, 0.3199199438095093]


Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.






share|improve this answer


























  • This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

    – questionable_code
    Nov 22 '18 at 20:19













  • @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

    – martineau
    Nov 22 '18 at 20:37











  • What if the number of values I need is variable? How would I write the format string for that?

    – questionable_code
    Nov 22 '18 at 20:37











  • @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

    – martineau
    Nov 22 '18 at 20:41





















0














The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).



import mmap
import numpy as np
import os

new_list =

with open('my_file.data', 'rb') as file_in:
size_bytes = os.fstat(file_in.fileno()).st_size
m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
arr = np.frombuffer(m, np.dtype('float32'), offset=0)
for idx, offset in enumerate(offset_values):
new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8


I tested this with an n=10000 array of random floats, converted to bytes:



import random
import struct

a = ''
for i in range(10000):
a += struct.pack('<f', random.uniform(0, 1000))


Then I read this "a" variable into the numpy array, as you would with the binary information from file.



>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418878%2fread-specific-sections-of-a-binary-file-containing-32-bit-floats%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    You can convert bytes to and from 32-bit float values using the struct module:



    import random
    import struct

    FLOAT_SIZE = 4
    NUM_OFFSETS = 5
    filename = 'my_file.data'

    # Create some random offsets.
    offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
    random.shuffle(offset_values)

    # Create a test file
    with open(filename, 'wb') as file:
    for offset in offset_values:
    file.seek(offset)
    value = random.random()
    print('writing value:', value, 'at offset', offset)
    file.write(struct.pack('f', value))

    # Read sections of file back at offset locations.

    new_list =
    with open(filename, 'rb') as file:
    for offset in offset_values:
    file.seek(offset)
    buf = file.read(FLOAT_SIZE)
    value = struct.unpack('f', buf)[0]
    print('read value:', value, 'at offset', offset)
    new_list.append(value)

    print('new_list =', new_list)


    Sample output:



    writing value: 0.0687244786128608 at offset 8
    writing value: 0.34336034914481284 at offset 16
    writing value: 0.03658244351244533 at offset 4
    writing value: 0.9733690320097427 at offset 12
    writing value: 0.31991994765615206 at offset 0
    read value: 0.06872447580099106 at offset 8
    read value: 0.3433603346347809 at offset 16
    read value: 0.03658244386315346 at offset 4
    read value: 0.9733690023422241 at offset 12
    read value: 0.3199199438095093 at offset 0
    new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
    0.9733690023422241, 0.3199199438095093]


    Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.






    share|improve this answer


























    • This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

      – questionable_code
      Nov 22 '18 at 20:19













    • @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

      – martineau
      Nov 22 '18 at 20:37











    • What if the number of values I need is variable? How would I write the format string for that?

      – questionable_code
      Nov 22 '18 at 20:37











    • @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

      – martineau
      Nov 22 '18 at 20:41


















    2














    You can convert bytes to and from 32-bit float values using the struct module:



    import random
    import struct

    FLOAT_SIZE = 4
    NUM_OFFSETS = 5
    filename = 'my_file.data'

    # Create some random offsets.
    offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
    random.shuffle(offset_values)

    # Create a test file
    with open(filename, 'wb') as file:
    for offset in offset_values:
    file.seek(offset)
    value = random.random()
    print('writing value:', value, 'at offset', offset)
    file.write(struct.pack('f', value))

    # Read sections of file back at offset locations.

    new_list =
    with open(filename, 'rb') as file:
    for offset in offset_values:
    file.seek(offset)
    buf = file.read(FLOAT_SIZE)
    value = struct.unpack('f', buf)[0]
    print('read value:', value, 'at offset', offset)
    new_list.append(value)

    print('new_list =', new_list)


    Sample output:



    writing value: 0.0687244786128608 at offset 8
    writing value: 0.34336034914481284 at offset 16
    writing value: 0.03658244351244533 at offset 4
    writing value: 0.9733690320097427 at offset 12
    writing value: 0.31991994765615206 at offset 0
    read value: 0.06872447580099106 at offset 8
    read value: 0.3433603346347809 at offset 16
    read value: 0.03658244386315346 at offset 4
    read value: 0.9733690023422241 at offset 12
    read value: 0.3199199438095093 at offset 0
    new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
    0.9733690023422241, 0.3199199438095093]


    Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.






    share|improve this answer


























    • This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

      – questionable_code
      Nov 22 '18 at 20:19













    • @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

      – martineau
      Nov 22 '18 at 20:37











    • What if the number of values I need is variable? How would I write the format string for that?

      – questionable_code
      Nov 22 '18 at 20:37











    • @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

      – martineau
      Nov 22 '18 at 20:41
















    2












    2








    2







    You can convert bytes to and from 32-bit float values using the struct module:



    import random
    import struct

    FLOAT_SIZE = 4
    NUM_OFFSETS = 5
    filename = 'my_file.data'

    # Create some random offsets.
    offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
    random.shuffle(offset_values)

    # Create a test file
    with open(filename, 'wb') as file:
    for offset in offset_values:
    file.seek(offset)
    value = random.random()
    print('writing value:', value, 'at offset', offset)
    file.write(struct.pack('f', value))

    # Read sections of file back at offset locations.

    new_list =
    with open(filename, 'rb') as file:
    for offset in offset_values:
    file.seek(offset)
    buf = file.read(FLOAT_SIZE)
    value = struct.unpack('f', buf)[0]
    print('read value:', value, 'at offset', offset)
    new_list.append(value)

    print('new_list =', new_list)


    Sample output:



    writing value: 0.0687244786128608 at offset 8
    writing value: 0.34336034914481284 at offset 16
    writing value: 0.03658244351244533 at offset 4
    writing value: 0.9733690320097427 at offset 12
    writing value: 0.31991994765615206 at offset 0
    read value: 0.06872447580099106 at offset 8
    read value: 0.3433603346347809 at offset 16
    read value: 0.03658244386315346 at offset 4
    read value: 0.9733690023422241 at offset 12
    read value: 0.3199199438095093 at offset 0
    new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
    0.9733690023422241, 0.3199199438095093]


    Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.






    share|improve this answer















    You can convert bytes to and from 32-bit float values using the struct module:



    import random
    import struct

    FLOAT_SIZE = 4
    NUM_OFFSETS = 5
    filename = 'my_file.data'

    # Create some random offsets.
    offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
    random.shuffle(offset_values)

    # Create a test file
    with open(filename, 'wb') as file:
    for offset in offset_values:
    file.seek(offset)
    value = random.random()
    print('writing value:', value, 'at offset', offset)
    file.write(struct.pack('f', value))

    # Read sections of file back at offset locations.

    new_list =
    with open(filename, 'rb') as file:
    for offset in offset_values:
    file.seek(offset)
    buf = file.read(FLOAT_SIZE)
    value = struct.unpack('f', buf)[0]
    print('read value:', value, 'at offset', offset)
    new_list.append(value)

    print('new_list =', new_list)


    Sample output:



    writing value: 0.0687244786128608 at offset 8
    writing value: 0.34336034914481284 at offset 16
    writing value: 0.03658244351244533 at offset 4
    writing value: 0.9733690320097427 at offset 12
    writing value: 0.31991994765615206 at offset 0
    read value: 0.06872447580099106 at offset 8
    read value: 0.3433603346347809 at offset 16
    read value: 0.03658244386315346 at offset 4
    read value: 0.9733690023422241 at offset 12
    read value: 0.3199199438095093 at offset 0
    new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
    0.9733690023422241, 0.3199199438095093]


    Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 21 '18 at 19:57

























    answered Nov 21 '18 at 19:44









    martineaumartineau

    69k1091186




    69k1091186













    • This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

      – questionable_code
      Nov 22 '18 at 20:19













    • @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

      – martineau
      Nov 22 '18 at 20:37











    • What if the number of values I need is variable? How would I write the format string for that?

      – questionable_code
      Nov 22 '18 at 20:37











    • @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

      – martineau
      Nov 22 '18 at 20:41





















    • This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

      – questionable_code
      Nov 22 '18 at 20:19













    • @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

      – martineau
      Nov 22 '18 at 20:37











    • What if the number of values I need is variable? How would I write the format string for that?

      – questionable_code
      Nov 22 '18 at 20:37











    • @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

      – martineau
      Nov 22 '18 at 20:41



















    This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

    – questionable_code
    Nov 22 '18 at 20:19







    This looks very promising. What if I need to read multiple floats at a time (i.e. a whole line of values into a line of my list)? Would I use a for loop containing struct.unpack('f', buf)[0] to run the struct.unpack operation as many times as values I need from the line?

    – questionable_code
    Nov 22 '18 at 20:19















    @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

    – martineau
    Nov 22 '18 at 20:37





    @questionable_code: Yes, you could do it in a for loop, but it would be much more efficient to use the struct.unpack() function to do it since it's capable of unpacking multiple values each time it's called if you give it the proper format string (i.e. '4f' for four of them). Note that strictly-speaking there are no "lines" in a binary file, so to use it that way after a seek() to the beginning of the group, you would then need to read in the desired number of FLOAT_SIZE bytes into the buf buffer.

    – martineau
    Nov 22 '18 at 20:37













    What if the number of values I need is variable? How would I write the format string for that?

    – questionable_code
    Nov 22 '18 at 20:37





    What if the number of values I need is variable? How would I write the format string for that?

    – questionable_code
    Nov 22 '18 at 20:37













    @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

    – martineau
    Nov 22 '18 at 20:41







    @questionable_code: The required format string could easily be constructed on-the-fly if you know the number of 32-bit floats expected at each offset.

    – martineau
    Nov 22 '18 at 20:41















    0














    The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).



    import mmap
    import numpy as np
    import os

    new_list =

    with open('my_file.data', 'rb') as file_in:
    size_bytes = os.fstat(file_in.fileno()).st_size
    m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
    arr = np.frombuffer(m, np.dtype('float32'), offset=0)
    for idx, offset in enumerate(offset_values):
    new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8


    I tested this with an n=10000 array of random floats, converted to bytes:



    import random
    import struct

    a = ''
    for i in range(10000):
    a += struct.pack('<f', random.uniform(0, 1000))


    Then I read this "a" variable into the numpy array, as you would with the binary information from file.



    >>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
    >>> arr[500]
    634.24408





    share|improve this answer






























      0














      The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).



      import mmap
      import numpy as np
      import os

      new_list =

      with open('my_file.data', 'rb') as file_in:
      size_bytes = os.fstat(file_in.fileno()).st_size
      m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
      arr = np.frombuffer(m, np.dtype('float32'), offset=0)
      for idx, offset in enumerate(offset_values):
      new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8


      I tested this with an n=10000 array of random floats, converted to bytes:



      import random
      import struct

      a = ''
      for i in range(10000):
      a += struct.pack('<f', random.uniform(0, 1000))


      Then I read this "a" variable into the numpy array, as you would with the binary information from file.



      >>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
      >>> arr[500]
      634.24408





      share|improve this answer




























        0












        0








        0







        The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).



        import mmap
        import numpy as np
        import os

        new_list =

        with open('my_file.data', 'rb') as file_in:
        size_bytes = os.fstat(file_in.fileno()).st_size
        m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
        arr = np.frombuffer(m, np.dtype('float32'), offset=0)
        for idx, offset in enumerate(offset_values):
        new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8


        I tested this with an n=10000 array of random floats, converted to bytes:



        import random
        import struct

        a = ''
        for i in range(10000):
        a += struct.pack('<f', random.uniform(0, 1000))


        Then I read this "a" variable into the numpy array, as you would with the binary information from file.



        >>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
        >>> arr[500]
        634.24408





        share|improve this answer















        The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).



        import mmap
        import numpy as np
        import os

        new_list =

        with open('my_file.data', 'rb') as file_in:
        size_bytes = os.fstat(file_in.fileno()).st_size
        m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
        arr = np.frombuffer(m, np.dtype('float32'), offset=0)
        for idx, offset in enumerate(offset_values):
        new_list.append(arr[offset//4]) #For unsigned 32bit floats, divide by 8


        I tested this with an n=10000 array of random floats, converted to bytes:



        import random
        import struct

        a = ''
        for i in range(10000):
        a += struct.pack('<f', random.uniform(0, 1000))


        Then I read this "a" variable into the numpy array, as you would with the binary information from file.



        >>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
        >>> arr[500]
        634.24408






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 21 '18 at 20:18

























        answered Nov 21 '18 at 19:43









        AlecZAlecZ

        1555




        1555






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418878%2fread-specific-sections-of-a-binary-file-containing-32-bit-floats%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings