Remove re-occuring text strings [closed]












0















I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










share|improve this question















closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.





















    0















    I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



    My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



    AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


    But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










    share|improve this question















    closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

    If this question can be reworded to fit the rules in the help center, please edit the question.



















      0












      0








      0








      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










      share|improve this question
















      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?







      r regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 8:24









      snoram

      7,011832




      7,011832










      asked Nov 16 '18 at 7:57









      VictorVictor

      11




      11




      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

      If this question can be reworded to fit the rules in the help center, please edit the question.







      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

      If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You need to use a non-greedy regular expression.



          Try



          AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


          The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






          share|improve this answer






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            You need to use a non-greedy regular expression.



            Try



            AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


            The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






            share|improve this answer




























              1














              You need to use a non-greedy regular expression.



              Try



              AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


              The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






              share|improve this answer


























                1












                1








                1







                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






                share|improve this answer













                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 8:05









                LAPLAP

                5,4002622




                5,4002622















                    這個網誌中的熱門文章

                    Xamarin.form Move up view when keyboard appear

                    Post-Redirect-Get with Spring WebFlux and Thymeleaf

                    Anylogic : not able to use stopDelay()