My regex is matching too much. How do I make it stop?











up vote
53
down vote

favorite
6












J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM  J0000010: Project name: E:foo.pf  J0000011: Job name: MBiek Direct Mail Test  J0000100: Machine name: DEV  J0000100: Project file: E:mbiekfoo.pf  J0000100: Template file: E:mbiekfoot.xdt  J0000100: Job name: MBiek  J0000100: Output folder: E:fooA0001401  J0000100: Temp folder: E:fooOutputA0001401  J0000100: Document 1 - Starting document  J0005000: Document 1 - Text overflowed on page 1 (warning)  J0000101: Document 1 - 1 page(s) composed  J0000102: Document 1 - 1 page(s) rendered at 500 x 647 pixels  J0000100: Document 1 - Completed successfully  J0000020:


I have this gigantic ugly string and I'm trying to extract pieces from it using regex.



In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).



Here's the regex I've been playing with



Project name:s+(.*)s+J[0-9]{7}:


The problem is that it doesn't stop until it hits the J0000020: at the end.



How do I make the regex stop at the first occurrence of J[0-9]{7}?










share|improve this question
























  • @Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
    – Alan Moore
    May 22 '12 at 10:24










  • sorry, I step back
    – Jav_Rock
    May 22 '12 at 10:57















up vote
53
down vote

favorite
6












J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM  J0000010: Project name: E:foo.pf  J0000011: Job name: MBiek Direct Mail Test  J0000100: Machine name: DEV  J0000100: Project file: E:mbiekfoo.pf  J0000100: Template file: E:mbiekfoot.xdt  J0000100: Job name: MBiek  J0000100: Output folder: E:fooA0001401  J0000100: Temp folder: E:fooOutputA0001401  J0000100: Document 1 - Starting document  J0005000: Document 1 - Text overflowed on page 1 (warning)  J0000101: Document 1 - 1 page(s) composed  J0000102: Document 1 - 1 page(s) rendered at 500 x 647 pixels  J0000100: Document 1 - Completed successfully  J0000020:


I have this gigantic ugly string and I'm trying to extract pieces from it using regex.



In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).



Here's the regex I've been playing with



Project name:s+(.*)s+J[0-9]{7}:


The problem is that it doesn't stop until it hits the J0000020: at the end.



How do I make the regex stop at the first occurrence of J[0-9]{7}?










share|improve this question
























  • @Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
    – Alan Moore
    May 22 '12 at 10:24










  • sorry, I step back
    – Jav_Rock
    May 22 '12 at 10:57













up vote
53
down vote

favorite
6









up vote
53
down vote

favorite
6






6





J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM  J0000010: Project name: E:foo.pf  J0000011: Job name: MBiek Direct Mail Test  J0000100: Machine name: DEV  J0000100: Project file: E:mbiekfoo.pf  J0000100: Template file: E:mbiekfoot.xdt  J0000100: Job name: MBiek  J0000100: Output folder: E:fooA0001401  J0000100: Temp folder: E:fooOutputA0001401  J0000100: Document 1 - Starting document  J0005000: Document 1 - Text overflowed on page 1 (warning)  J0000101: Document 1 - 1 page(s) composed  J0000102: Document 1 - 1 page(s) rendered at 500 x 647 pixels  J0000100: Document 1 - Completed successfully  J0000020:


I have this gigantic ugly string and I'm trying to extract pieces from it using regex.



In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).



Here's the regex I've been playing with



Project name:s+(.*)s+J[0-9]{7}:


The problem is that it doesn't stop until it hits the J0000020: at the end.



How do I make the regex stop at the first occurrence of J[0-9]{7}?










share|improve this question















J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM  J0000010: Project name: E:foo.pf  J0000011: Job name: MBiek Direct Mail Test  J0000100: Machine name: DEV  J0000100: Project file: E:mbiekfoo.pf  J0000100: Template file: E:mbiekfoot.xdt  J0000100: Job name: MBiek  J0000100: Output folder: E:fooA0001401  J0000100: Temp folder: E:fooOutputA0001401  J0000100: Document 1 - Starting document  J0005000: Document 1 - Text overflowed on page 1 (warning)  J0000101: Document 1 - 1 page(s) composed  J0000102: Document 1 - 1 page(s) rendered at 500 x 647 pixels  J0000100: Document 1 - Completed successfully  J0000020:


I have this gigantic ugly string and I'm trying to extract pieces from it using regex.



In this case, I want to grab everything after "Project Name" up to the part where it says "J0000011:" (the 11 is going to be a different number every time).



Here's the regex I've been playing with



Project name:s+(.*)s+J[0-9]{7}:


The problem is that it doesn't stop until it hits the J0000020: at the end.



How do I make the regex stop at the first occurrence of J[0-9]{7}?







regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 16 '17 at 22:32









eLRuLL

11.7k74173




11.7k74173










asked Aug 22 '08 at 14:10









Mark Biek

90.6k50144193




90.6k50144193












  • @Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
    – Alan Moore
    May 22 '12 at 10:24










  • sorry, I step back
    – Jav_Rock
    May 22 '12 at 10:57


















  • @Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
    – Alan Moore
    May 22 '12 at 10:24










  • sorry, I step back
    – Jav_Rock
    May 22 '12 at 10:57
















@Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
– Alan Moore
May 22 '12 at 10:24




@Jav_Rock: By reformatting the data you've changed the question. The OP's original regex works as desired now because . doesn't match the newlines you added.
– Alan Moore
May 22 '12 at 10:24












sorry, I step back
– Jav_Rock
May 22 '12 at 10:57




sorry, I step back
– Jav_Rock
May 22 '12 at 10:57












5 Answers
5






active

oldest

votes

















up vote
76
down vote



accepted










Make .* non-greedy by adding '?' after it:



Project name:s+(.*?)s+J[0-9]{7}:





share|improve this answer






























    up vote
    11
    down vote













    Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.



    However, consider using a negative character class instead:



    Project name:s+(S*)s+J[0-9]{7}:


    S means “everything except a whitespace and this is exactly what you want.






    share|improve this answer























    • When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
      – CertainPerformance
      Oct 30 at 9:26


















    up vote
    3
    down vote













    Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.



    Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.



    string m = Regex.Match(s, @"Project name: (?<name>.*?) Jd+").Groups["name"].Value;





    share|improve this answer






























      up vote
      1
      down vote













      I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.



      One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.



      For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.



      Available for download at their site:
      http://www.ultrapico.com/Expresso.htm



      Express download:
      http://www.ultrapico.com/ExpressoDownload.htm






      share|improve this answer























      • There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
        – Matt M.
        Nov 18 at 4:08


















      up vote
      0
      down vote













      (Project name:s+[A-Z]:(?:\w+)+.[a-zA-Z]+s+J[0-9]{7})(?=:)



      This will work for you.



      Adding (?:\w+)+.[a-zA-Z]+ will be more restrictive instead of .*






      share|improve this answer






















        protected by zx8754 Sep 13 '17 at 10:43



        Thank you for your interest in this question.
        Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



        Would you like to answer one of these unanswered questions instead?














        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        76
        down vote



        accepted










        Make .* non-greedy by adding '?' after it:



        Project name:s+(.*?)s+J[0-9]{7}:





        share|improve this answer



























          up vote
          76
          down vote



          accepted










          Make .* non-greedy by adding '?' after it:



          Project name:s+(.*?)s+J[0-9]{7}:





          share|improve this answer

























            up vote
            76
            down vote



            accepted







            up vote
            76
            down vote



            accepted






            Make .* non-greedy by adding '?' after it:



            Project name:s+(.*?)s+J[0-9]{7}:





            share|improve this answer














            Make .* non-greedy by adding '?' after it:



            Project name:s+(.*?)s+J[0-9]{7}:






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 15 '16 at 9:27









            shA.t

            12.8k43662




            12.8k43662










            answered Aug 22 '08 at 14:12









            jj33

            5,78813141




            5,78813141
























                up vote
                11
                down vote













                Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.



                However, consider using a negative character class instead:



                Project name:s+(S*)s+J[0-9]{7}:


                S means “everything except a whitespace and this is exactly what you want.






                share|improve this answer























                • When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                  – CertainPerformance
                  Oct 30 at 9:26















                up vote
                11
                down vote













                Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.



                However, consider using a negative character class instead:



                Project name:s+(S*)s+J[0-9]{7}:


                S means “everything except a whitespace and this is exactly what you want.






                share|improve this answer























                • When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                  – CertainPerformance
                  Oct 30 at 9:26













                up vote
                11
                down vote










                up vote
                11
                down vote









                Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.



                However, consider using a negative character class instead:



                Project name:s+(S*)s+J[0-9]{7}:


                S means “everything except a whitespace and this is exactly what you want.






                share|improve this answer














                Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.



                However, consider using a negative character class instead:



                Project name:s+(S*)s+J[0-9]{7}:


                S means “everything except a whitespace and this is exactly what you want.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Aug 13 at 18:11









                JGFMK

                3,10232854




                3,10232854










                answered Aug 22 '08 at 14:15









                Konrad Rudolph

                393k1017731022




                393k1017731022












                • When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                  – CertainPerformance
                  Oct 30 at 9:26


















                • When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                  – CertainPerformance
                  Oct 30 at 9:26
















                When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                – CertainPerformance
                Oct 30 at 9:26




                When possible to implement, a greedy negative (or positive) character class will usually perform notably better than a lazy quantifier. Laziness requires the engine to forward-track character by character, checking the pattern that follows each time until it matches; a greedy character class can mindlessly repeat just the desired characters, which can be a lot quicker. So, you might consider making a stronger case for a negative character class, seeing as this is the greedy-vs-lazy canonical.
                – CertainPerformance
                Oct 30 at 9:26










                up vote
                3
                down vote













                Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.



                Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.



                string m = Regex.Match(s, @"Project name: (?<name>.*?) Jd+").Groups["name"].Value;





                share|improve this answer



























                  up vote
                  3
                  down vote













                  Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.



                  Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.



                  string m = Regex.Match(s, @"Project name: (?<name>.*?) Jd+").Groups["name"].Value;





                  share|improve this answer

























                    up vote
                    3
                    down vote










                    up vote
                    3
                    down vote









                    Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.



                    Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.



                    string m = Regex.Match(s, @"Project name: (?<name>.*?) Jd+").Groups["name"].Value;





                    share|improve this answer














                    Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.



                    Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.



                    string m = Regex.Match(s, @"Project name: (?<name>.*?) Jd+").Groups["name"].Value;






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jan 2 '17 at 16:36









                    Ani Menon

                    15.4k65572




                    15.4k65572










                    answered Aug 22 '08 at 14:24









                    Svend

                    5,85732141




                    5,85732141






















                        up vote
                        1
                        down vote













                        I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.



                        One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.



                        For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.



                        Available for download at their site:
                        http://www.ultrapico.com/Expresso.htm



                        Express download:
                        http://www.ultrapico.com/ExpressoDownload.htm






                        share|improve this answer























                        • There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                          – Matt M.
                          Nov 18 at 4:08















                        up vote
                        1
                        down vote













                        I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.



                        One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.



                        For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.



                        Available for download at their site:
                        http://www.ultrapico.com/Expresso.htm



                        Express download:
                        http://www.ultrapico.com/ExpressoDownload.htm






                        share|improve this answer























                        • There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                          – Matt M.
                          Nov 18 at 4:08













                        up vote
                        1
                        down vote










                        up vote
                        1
                        down vote









                        I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.



                        One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.



                        For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.



                        Available for download at their site:
                        http://www.ultrapico.com/Expresso.htm



                        Express download:
                        http://www.ultrapico.com/ExpressoDownload.htm






                        share|improve this answer














                        I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.



                        One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.



                        For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.



                        Available for download at their site:
                        http://www.ultrapico.com/Expresso.htm



                        Express download:
                        http://www.ultrapico.com/ExpressoDownload.htm







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Aug 22 '08 at 14:22

























                        answered Aug 22 '08 at 14:17









                        Hershi

                        1,66021521




                        1,66021521












                        • There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                          – Matt M.
                          Nov 18 at 4:08


















                        • There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                          – Matt M.
                          Nov 18 at 4:08
















                        There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                        – Matt M.
                        Nov 18 at 4:08




                        There are a few great websites out there already. I'd rather visit a bookmark than have another program on my computer.
                        – Matt M.
                        Nov 18 at 4:08










                        up vote
                        0
                        down vote













                        (Project name:s+[A-Z]:(?:\w+)+.[a-zA-Z]+s+J[0-9]{7})(?=:)



                        This will work for you.



                        Adding (?:\w+)+.[a-zA-Z]+ will be more restrictive instead of .*






                        share|improve this answer



























                          up vote
                          0
                          down vote













                          (Project name:s+[A-Z]:(?:\w+)+.[a-zA-Z]+s+J[0-9]{7})(?=:)



                          This will work for you.



                          Adding (?:\w+)+.[a-zA-Z]+ will be more restrictive instead of .*






                          share|improve this answer

























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            (Project name:s+[A-Z]:(?:\w+)+.[a-zA-Z]+s+J[0-9]{7})(?=:)



                            This will work for you.



                            Adding (?:\w+)+.[a-zA-Z]+ will be more restrictive instead of .*






                            share|improve this answer














                            (Project name:s+[A-Z]:(?:\w+)+.[a-zA-Z]+s+J[0-9]{7})(?=:)



                            This will work for you.



                            Adding (?:\w+)+.[a-zA-Z]+ will be more restrictive instead of .*







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jul 16 at 10:44

























                            answered Jul 16 at 8:05









                            Shailendra

                            1762210




                            1762210

















                                protected by zx8754 Sep 13 '17 at 10:43



                                Thank you for your interest in this question.
                                Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                Would you like to answer one of these unanswered questions instead?



                                這個網誌中的熱門文章

                                Hercules Kyvelos

                                Tangent Lines Diagram Along Smooth Curve

                                Yusuf al-Mu'taman ibn Hud