Python regex, matching pattern over multiple lines.. why isn't this working?












13















I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated



This returns no results at all:



string='
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch









share|improve this question


















  • 1





    It won't run period because you're not using multi-line string symbols ''' or """

    – Nick T
    Aug 20 '10 at 20:13











  • ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

    – Rick
    Aug 20 '10 at 20:15






  • 3





    Your assignment to string is a syntax error. Did you mean to use '''?

    – msw
    Aug 20 '10 at 20:15











  • no I'm new to python so I didn't know about the mutline string delimiter

    – Rick
    Aug 20 '10 at 20:20
















13















I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated



This returns no results at all:



string='
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch









share|improve this question


















  • 1





    It won't run period because you're not using multi-line string symbols ''' or """

    – Nick T
    Aug 20 '10 at 20:13











  • ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

    – Rick
    Aug 20 '10 at 20:15






  • 3





    Your assignment to string is a syntax error. Did you mean to use '''?

    – msw
    Aug 20 '10 at 20:15











  • no I'm new to python so I didn't know about the mutline string delimiter

    – Rick
    Aug 20 '10 at 20:20














13












13








13


1






I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated



This returns no results at all:



string='
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch









share|improve this question














I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated



This returns no results at all:



string='
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch






python regex parsing






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 20 '10 at 20:09









RickRick

7,1233298159




7,1233298159








  • 1





    It won't run period because you're not using multi-line string symbols ''' or """

    – Nick T
    Aug 20 '10 at 20:13











  • ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

    – Rick
    Aug 20 '10 at 20:15






  • 3





    Your assignment to string is a syntax error. Did you mean to use '''?

    – msw
    Aug 20 '10 at 20:15











  • no I'm new to python so I didn't know about the mutline string delimiter

    – Rick
    Aug 20 '10 at 20:20














  • 1





    It won't run period because you're not using multi-line string symbols ''' or """

    – Nick T
    Aug 20 '10 at 20:13











  • ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

    – Rick
    Aug 20 '10 at 20:15






  • 3





    Your assignment to string is a syntax error. Did you mean to use '''?

    – msw
    Aug 20 '10 at 20:15











  • no I'm new to python so I didn't know about the mutline string delimiter

    – Rick
    Aug 20 '10 at 20:20








1




1





It won't run period because you're not using multi-line string symbols ''' or """

– Nick T
Aug 20 '10 at 20:13





It won't run period because you're not using multi-line string symbols ''' or """

– Nick T
Aug 20 '10 at 20:13













ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

– Rick
Aug 20 '10 at 20:15





ok, I missed this concept completely then, i will dig through the re documentation to find where it mentions this.. thanks

– Rick
Aug 20 '10 at 20:15




3




3





Your assignment to string is a syntax error. Did you mean to use '''?

– msw
Aug 20 '10 at 20:15





Your assignment to string is a syntax error. Did you mean to use '''?

– msw
Aug 20 '10 at 20:15













no I'm new to python so I didn't know about the mutline string delimiter

– Rick
Aug 20 '10 at 20:20





no I'm new to python so I didn't know about the mutline string delimiter

– Rick
Aug 20 '10 at 20:20












2 Answers
2






active

oldest

votes


















14














Try re.findall(r"####(.*?)s(.*?)s####", string, re.DOTALL) (works with re.compile too, of course).



This regexp will return tuples containing the number of the section and the section content.



For your example, this will return [('1', 'ttteest'), ('2', ' nnttest')].



(BTW: your example won't run, for multiline strings, use ''' or """)






share|improve this answer
























  • thanks, this works

    – Rick
    Aug 20 '10 at 20:21



















22














Multiline doesn't mean . will match line return, it means that ^ and $ are limited to lines only




re.M
re.MULTILINE



When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.




re.S or re.DOTALL makes . match even new lines.



Source



http://docs.python.org/






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3534507%2fpython-regex-matching-pattern-over-multiple-lines-why-isnt-this-working%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    14














    Try re.findall(r"####(.*?)s(.*?)s####", string, re.DOTALL) (works with re.compile too, of course).



    This regexp will return tuples containing the number of the section and the section content.



    For your example, this will return [('1', 'ttteest'), ('2', ' nnttest')].



    (BTW: your example won't run, for multiline strings, use ''' or """)






    share|improve this answer
























    • thanks, this works

      – Rick
      Aug 20 '10 at 20:21
















    14














    Try re.findall(r"####(.*?)s(.*?)s####", string, re.DOTALL) (works with re.compile too, of course).



    This regexp will return tuples containing the number of the section and the section content.



    For your example, this will return [('1', 'ttteest'), ('2', ' nnttest')].



    (BTW: your example won't run, for multiline strings, use ''' or """)






    share|improve this answer
























    • thanks, this works

      – Rick
      Aug 20 '10 at 20:21














    14












    14








    14







    Try re.findall(r"####(.*?)s(.*?)s####", string, re.DOTALL) (works with re.compile too, of course).



    This regexp will return tuples containing the number of the section and the section content.



    For your example, this will return [('1', 'ttteest'), ('2', ' nnttest')].



    (BTW: your example won't run, for multiline strings, use ''' or """)






    share|improve this answer













    Try re.findall(r"####(.*?)s(.*?)s####", string, re.DOTALL) (works with re.compile too, of course).



    This regexp will return tuples containing the number of the section and the section content.



    For your example, this will return [('1', 'ttteest'), ('2', ' nnttest')].



    (BTW: your example won't run, for multiline strings, use ''' or """)







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 20 '10 at 20:16









    leolukleoluk

    9,10643546




    9,10643546













    • thanks, this works

      – Rick
      Aug 20 '10 at 20:21



















    • thanks, this works

      – Rick
      Aug 20 '10 at 20:21

















    thanks, this works

    – Rick
    Aug 20 '10 at 20:21





    thanks, this works

    – Rick
    Aug 20 '10 at 20:21













    22














    Multiline doesn't mean . will match line return, it means that ^ and $ are limited to lines only




    re.M
    re.MULTILINE



    When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.




    re.S or re.DOTALL makes . match even new lines.



    Source



    http://docs.python.org/






    share|improve this answer




























      22














      Multiline doesn't mean . will match line return, it means that ^ and $ are limited to lines only




      re.M
      re.MULTILINE



      When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.




      re.S or re.DOTALL makes . match even new lines.



      Source



      http://docs.python.org/






      share|improve this answer


























        22












        22








        22







        Multiline doesn't mean . will match line return, it means that ^ and $ are limited to lines only




        re.M
        re.MULTILINE



        When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.




        re.S or re.DOTALL makes . match even new lines.



        Source



        http://docs.python.org/






        share|improve this answer













        Multiline doesn't mean . will match line return, it means that ^ and $ are limited to lines only




        re.M
        re.MULTILINE



        When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.




        re.S or re.DOTALL makes . match even new lines.



        Source



        http://docs.python.org/







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Aug 20 '10 at 20:16









        Colin HebertColin Hebert

        75.9k12137137




        75.9k12137137






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3534507%2fpython-regex-matching-pattern-over-multiple-lines-why-isnt-this-working%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini