How can I recover a corrupted, partially pickled file?












0















My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.



Is it possible to partially or fully recover the data? If so, how?



Here's what I've tried:



>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
obj = pik.load()
EOFError: Ran out of input
>>>


The file is not empty:



>>> os.stat(filename).st_size
31110059


Note: all data in the dictionary was comprised of python built-in types.










share|improve this question





























    0















    My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.



    Is it possible to partially or fully recover the data? If so, how?



    Here's what I've tried:



    >>> dill.load(open(filename, 'rb'))
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
    EOFError: Ran out of input
    >>>


    The file is not empty:



    >>> os.stat(filename).st_size
    31110059


    Note: all data in the dictionary was comprised of python built-in types.










    share|improve this question



























      0












      0








      0








      My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.



      Is it possible to partially or fully recover the data? If so, how?



      Here's what I've tried:



      >>> dill.load(open(filename, 'rb'))
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
      obj = pik.load()
      EOFError: Ran out of input
      >>>


      The file is not empty:



      >>> os.stat(filename).st_size
      31110059


      Note: all data in the dictionary was comprised of python built-in types.










      share|improve this question
















      My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.



      Is it possible to partially or fully recover the data? If so, how?



      Here's what I've tried:



      >>> dill.load(open(filename, 'rb'))
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
      obj = pik.load()
      EOFError: Ran out of input
      >>>


      The file is not empty:



      >>> os.stat(filename).st_size
      31110059


      Note: all data in the dictionary was comprised of python built-in types.







      python pickle corruption dill






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 11 '18 at 17:38







      eqzx

















      asked Mar 11 '18 at 17:17









      eqzxeqzx

      2,4512038




      2,4512038
























          1 Answer
          1






          active

          oldest

          votes


















          3














          The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:



          import io
          import pickle

          # Use the pure-Python version, we can't see the internal state of the C version
          pickle.Unpickler = pickle._Unpickler

          import dill

          if __name__ == '__main__':
          obj = [1, 2, {3: 4, "5": ('6',)}]
          data = dill.dumps(obj)

          handle = io.BytesIO(data[:-5]) # cut it off

          unpickler = dill.Unpickler(handle)

          try:
          unpickler.load()
          except EOFError:
          pass

          print(unpickler.stack)


          I get the following output:



          [3, 4, '5', ('6',)]


          The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.






          share|improve this answer



















          • 2





            I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

            – Mike McKerns
            Mar 12 '18 at 12:56













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49222838%2fhow-can-i-recover-a-corrupted-partially-pickled-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:



          import io
          import pickle

          # Use the pure-Python version, we can't see the internal state of the C version
          pickle.Unpickler = pickle._Unpickler

          import dill

          if __name__ == '__main__':
          obj = [1, 2, {3: 4, "5": ('6',)}]
          data = dill.dumps(obj)

          handle = io.BytesIO(data[:-5]) # cut it off

          unpickler = dill.Unpickler(handle)

          try:
          unpickler.load()
          except EOFError:
          pass

          print(unpickler.stack)


          I get the following output:



          [3, 4, '5', ('6',)]


          The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.






          share|improve this answer



















          • 2





            I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

            – Mike McKerns
            Mar 12 '18 at 12:56


















          3














          The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:



          import io
          import pickle

          # Use the pure-Python version, we can't see the internal state of the C version
          pickle.Unpickler = pickle._Unpickler

          import dill

          if __name__ == '__main__':
          obj = [1, 2, {3: 4, "5": ('6',)}]
          data = dill.dumps(obj)

          handle = io.BytesIO(data[:-5]) # cut it off

          unpickler = dill.Unpickler(handle)

          try:
          unpickler.load()
          except EOFError:
          pass

          print(unpickler.stack)


          I get the following output:



          [3, 4, '5', ('6',)]


          The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.






          share|improve this answer



















          • 2





            I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

            – Mike McKerns
            Mar 12 '18 at 12:56
















          3












          3








          3







          The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:



          import io
          import pickle

          # Use the pure-Python version, we can't see the internal state of the C version
          pickle.Unpickler = pickle._Unpickler

          import dill

          if __name__ == '__main__':
          obj = [1, 2, {3: 4, "5": ('6',)}]
          data = dill.dumps(obj)

          handle = io.BytesIO(data[:-5]) # cut it off

          unpickler = dill.Unpickler(handle)

          try:
          unpickler.load()
          except EOFError:
          pass

          print(unpickler.stack)


          I get the following output:



          [3, 4, '5', ('6',)]


          The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.






          share|improve this answer













          The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:



          import io
          import pickle

          # Use the pure-Python version, we can't see the internal state of the C version
          pickle.Unpickler = pickle._Unpickler

          import dill

          if __name__ == '__main__':
          obj = [1, 2, {3: 4, "5": ('6',)}]
          data = dill.dumps(obj)

          handle = io.BytesIO(data[:-5]) # cut it off

          unpickler = dill.Unpickler(handle)

          try:
          unpickler.load()
          except EOFError:
          pass

          print(unpickler.stack)


          I get the following output:



          [3, 4, '5', ('6',)]


          The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 11 '18 at 17:46









          BlenderBlender

          210k36341404




          210k36341404








          • 2





            I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

            – Mike McKerns
            Mar 12 '18 at 12:56
















          • 2





            I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

            – Mike McKerns
            Mar 12 '18 at 12:56










          2




          2





          I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

          – Mike McKerns
          Mar 12 '18 at 12:56







          I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".

          – Mike McKerns
          Mar 12 '18 at 12:56






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f49222838%2fhow-can-i-recover-a-corrupted-partially-pickled-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Academy of Television Arts & Sciences

          L'Équipe

          1995 France bombings