what is a fastest way to remove nifi flowfile content?












0















I have a workflow where I am getting json files as a response of rest api. I am getting approximately 100k files in a session. total size of all the files is 15GB. I have to save each file to file system, which i am doing. at the end of the process I have to wait for all the files to be present before I send a success message.



Once I save the file in FS, I am calling notify+wait. but I dont need 15 gb data in flowfile anymore. So to release some space, I thought of using either replaceText or ModifyByte to clear content. so notify+wait runs smoothly. Total wait for this process is 3 hrs.



But process is taking too long in both (replaceText or ModifyByte) case.



Can you suggest, fastest way to clear flowfile data.I do not need any attributes too. so is thr a way I can abandon old flowfile and generate kb flowfile, midway?



what i want is something like generateflowfile, but in middle, so for each of my existing flowfile, i can drop old one, and generate blank flowfile for notify and wait.



Thanks










share|improve this question



























    0















    I have a workflow where I am getting json files as a response of rest api. I am getting approximately 100k files in a session. total size of all the files is 15GB. I have to save each file to file system, which i am doing. at the end of the process I have to wait for all the files to be present before I send a success message.



    Once I save the file in FS, I am calling notify+wait. but I dont need 15 gb data in flowfile anymore. So to release some space, I thought of using either replaceText or ModifyByte to clear content. so notify+wait runs smoothly. Total wait for this process is 3 hrs.



    But process is taking too long in both (replaceText or ModifyByte) case.



    Can you suggest, fastest way to clear flowfile data.I do not need any attributes too. so is thr a way I can abandon old flowfile and generate kb flowfile, midway?



    what i want is something like generateflowfile, but in middle, so for each of my existing flowfile, i can drop old one, and generate blank flowfile for notify and wait.



    Thanks










    share|improve this question

























      0












      0








      0








      I have a workflow where I am getting json files as a response of rest api. I am getting approximately 100k files in a session. total size of all the files is 15GB. I have to save each file to file system, which i am doing. at the end of the process I have to wait for all the files to be present before I send a success message.



      Once I save the file in FS, I am calling notify+wait. but I dont need 15 gb data in flowfile anymore. So to release some space, I thought of using either replaceText or ModifyByte to clear content. so notify+wait runs smoothly. Total wait for this process is 3 hrs.



      But process is taking too long in both (replaceText or ModifyByte) case.



      Can you suggest, fastest way to clear flowfile data.I do not need any attributes too. so is thr a way I can abandon old flowfile and generate kb flowfile, midway?



      what i want is something like generateflowfile, but in middle, so for each of my existing flowfile, i can drop old one, and generate blank flowfile for notify and wait.



      Thanks










      share|improve this question














      I have a workflow where I am getting json files as a response of rest api. I am getting approximately 100k files in a session. total size of all the files is 15GB. I have to save each file to file system, which i am doing. at the end of the process I have to wait for all the files to be present before I send a success message.



      Once I save the file in FS, I am calling notify+wait. but I dont need 15 gb data in flowfile anymore. So to release some space, I thought of using either replaceText or ModifyByte to clear content. so notify+wait runs smoothly. Total wait for this process is 3 hrs.



      But process is taking too long in both (replaceText or ModifyByte) case.



      Can you suggest, fastest way to clear flowfile data.I do not need any attributes too. so is thr a way I can abandon old flowfile and generate kb flowfile, midway?



      what i want is something like generateflowfile, but in middle, so for each of my existing flowfile, i can drop old one, and generate blank flowfile for notify and wait.



      Thanks







      apache-nifi






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 15 '18 at 3:37









      Rakesh PrasadRakesh Prasad

      898




      898
























          1 Answer
          1






          active

          oldest

          votes


















          1














          NiFi's Content Repository and FlowFile Repository are based on a copy-on-write mechanism, so if you don't change the contents or metadata, then you are not necessarily "keeping" the 15GB across those processors.



          Having said that, if all you need is the existence of such flow files on disk (but not contents or metadata), try ExecuteScript with the following Groovy script:



          def flowFiles = session.get(1000)
          flowFiles.each {
          session.transfer(session.create(), REL_SUCCESS)
          }
          session.remove(flowFiles)


          This script will grab up to 1000 flow files at a time, and for each one, send an empty flow file downstream. It then removes all the original incoming flow files.



          Note that this (i.e. your use case) will "break" the provenance/lineage chain, so if something goes wrong in your flow, you won't be able to tell which flow files came from which parent flow files, etc. This limitation is one reason why you don't see a full processor that performs this kind of function.






          share|improve this answer
























          • thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

            – Rakesh Prasad
            Nov 15 '18 at 17:08











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312069%2fwhat-is-a-fastest-way-to-remove-nifi-flowfile-content%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          NiFi's Content Repository and FlowFile Repository are based on a copy-on-write mechanism, so if you don't change the contents or metadata, then you are not necessarily "keeping" the 15GB across those processors.



          Having said that, if all you need is the existence of such flow files on disk (but not contents or metadata), try ExecuteScript with the following Groovy script:



          def flowFiles = session.get(1000)
          flowFiles.each {
          session.transfer(session.create(), REL_SUCCESS)
          }
          session.remove(flowFiles)


          This script will grab up to 1000 flow files at a time, and for each one, send an empty flow file downstream. It then removes all the original incoming flow files.



          Note that this (i.e. your use case) will "break" the provenance/lineage chain, so if something goes wrong in your flow, you won't be able to tell which flow files came from which parent flow files, etc. This limitation is one reason why you don't see a full processor that performs this kind of function.






          share|improve this answer
























          • thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

            – Rakesh Prasad
            Nov 15 '18 at 17:08
















          1














          NiFi's Content Repository and FlowFile Repository are based on a copy-on-write mechanism, so if you don't change the contents or metadata, then you are not necessarily "keeping" the 15GB across those processors.



          Having said that, if all you need is the existence of such flow files on disk (but not contents or metadata), try ExecuteScript with the following Groovy script:



          def flowFiles = session.get(1000)
          flowFiles.each {
          session.transfer(session.create(), REL_SUCCESS)
          }
          session.remove(flowFiles)


          This script will grab up to 1000 flow files at a time, and for each one, send an empty flow file downstream. It then removes all the original incoming flow files.



          Note that this (i.e. your use case) will "break" the provenance/lineage chain, so if something goes wrong in your flow, you won't be able to tell which flow files came from which parent flow files, etc. This limitation is one reason why you don't see a full processor that performs this kind of function.






          share|improve this answer
























          • thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

            – Rakesh Prasad
            Nov 15 '18 at 17:08














          1












          1








          1







          NiFi's Content Repository and FlowFile Repository are based on a copy-on-write mechanism, so if you don't change the contents or metadata, then you are not necessarily "keeping" the 15GB across those processors.



          Having said that, if all you need is the existence of such flow files on disk (but not contents or metadata), try ExecuteScript with the following Groovy script:



          def flowFiles = session.get(1000)
          flowFiles.each {
          session.transfer(session.create(), REL_SUCCESS)
          }
          session.remove(flowFiles)


          This script will grab up to 1000 flow files at a time, and for each one, send an empty flow file downstream. It then removes all the original incoming flow files.



          Note that this (i.e. your use case) will "break" the provenance/lineage chain, so if something goes wrong in your flow, you won't be able to tell which flow files came from which parent flow files, etc. This limitation is one reason why you don't see a full processor that performs this kind of function.






          share|improve this answer













          NiFi's Content Repository and FlowFile Repository are based on a copy-on-write mechanism, so if you don't change the contents or metadata, then you are not necessarily "keeping" the 15GB across those processors.



          Having said that, if all you need is the existence of such flow files on disk (but not contents or metadata), try ExecuteScript with the following Groovy script:



          def flowFiles = session.get(1000)
          flowFiles.each {
          session.transfer(session.create(), REL_SUCCESS)
          }
          session.remove(flowFiles)


          This script will grab up to 1000 flow files at a time, and for each one, send an empty flow file downstream. It then removes all the original incoming flow files.



          Note that this (i.e. your use case) will "break" the provenance/lineage chain, so if something goes wrong in your flow, you won't be able to tell which flow files came from which parent flow files, etc. This limitation is one reason why you don't see a full processor that performs this kind of function.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 15 '18 at 3:56









          mattybmattyb

          7,1311018




          7,1311018













          • thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

            – Rakesh Prasad
            Nov 15 '18 at 17:08



















          • thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

            – Rakesh Prasad
            Nov 15 '18 at 17:08

















          thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

          – Rakesh Prasad
          Nov 15 '18 at 17:08





          thanks, i think after comparing pros/cons I think i can slower replaceText processor and not loose lineage.

          – Rakesh Prasad
          Nov 15 '18 at 17:08


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312069%2fwhat-is-a-fastest-way-to-remove-nifi-flowfile-content%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini