PHP regex email addres(ses) from text, sometimes right before a full stop












0















I have texts that can contain one email address or multiple ones. I use regex to match these. First I used: (from this previous question)



[A-Za-z0-9_-]+@[A-Za-z0-9_-]+.([A-Za-z0-9_-][A-Za-z0-9_]+)


This caused two problems. In the case a . was used before the @ this was problematic, but also if an email address ended in two or more domain extensions (for example ...@domain.co.uk) it did not work. So I changed this expression to



^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})


This solves both first problems, but creates a new one. If in the text the email address is right before a full stop, this is now included in the address! So this text gives me problems:



Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc...


Is there a way to exclude this last . if it is followed by either a blank space or a line break?



ps.
I do not need to validate email addresses, I need to make sure my expression knows where an email address (or multiple) are in a text and when they stop.










share|improve this question




















  • 1





    You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

    – The fourth bird
    Nov 22 '18 at 21:06








  • 1





    @Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

    – Magnus Eriksson
    Nov 22 '18 at 21:15








  • 1





    I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

    – Jeff
    Nov 22 '18 at 21:23






  • 1





    @DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

    – Jeff
    Nov 22 '18 at 21:25






  • 1





    Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

    – Magnus Eriksson
    Nov 22 '18 at 21:26


















0















I have texts that can contain one email address or multiple ones. I use regex to match these. First I used: (from this previous question)



[A-Za-z0-9_-]+@[A-Za-z0-9_-]+.([A-Za-z0-9_-][A-Za-z0-9_]+)


This caused two problems. In the case a . was used before the @ this was problematic, but also if an email address ended in two or more domain extensions (for example ...@domain.co.uk) it did not work. So I changed this expression to



^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})


This solves both first problems, but creates a new one. If in the text the email address is right before a full stop, this is now included in the address! So this text gives me problems:



Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc...


Is there a way to exclude this last . if it is followed by either a blank space or a line break?



ps.
I do not need to validate email addresses, I need to make sure my expression knows where an email address (or multiple) are in a text and when they stop.










share|improve this question




















  • 1





    You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

    – The fourth bird
    Nov 22 '18 at 21:06








  • 1





    @Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

    – Magnus Eriksson
    Nov 22 '18 at 21:15








  • 1





    I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

    – Jeff
    Nov 22 '18 at 21:23






  • 1





    @DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

    – Jeff
    Nov 22 '18 at 21:25






  • 1





    Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

    – Magnus Eriksson
    Nov 22 '18 at 21:26
















0












0








0








I have texts that can contain one email address or multiple ones. I use regex to match these. First I used: (from this previous question)



[A-Za-z0-9_-]+@[A-Za-z0-9_-]+.([A-Za-z0-9_-][A-Za-z0-9_]+)


This caused two problems. In the case a . was used before the @ this was problematic, but also if an email address ended in two or more domain extensions (for example ...@domain.co.uk) it did not work. So I changed this expression to



^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})


This solves both first problems, but creates a new one. If in the text the email address is right before a full stop, this is now included in the address! So this text gives me problems:



Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc...


Is there a way to exclude this last . if it is followed by either a blank space or a line break?



ps.
I do not need to validate email addresses, I need to make sure my expression knows where an email address (or multiple) are in a text and when they stop.










share|improve this question
















I have texts that can contain one email address or multiple ones. I use regex to match these. First I used: (from this previous question)



[A-Za-z0-9_-]+@[A-Za-z0-9_-]+.([A-Za-z0-9_-][A-Za-z0-9_]+)


This caused two problems. In the case a . was used before the @ this was problematic, but also if an email address ended in two or more domain extensions (for example ...@domain.co.uk) it did not work. So I changed this expression to



^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})


This solves both first problems, but creates a new one. If in the text the email address is right before a full stop, this is now included in the address! So this text gives me problems:



Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc...


Is there a way to exclude this last . if it is followed by either a blank space or a line break?



ps.
I do not need to validate email addresses, I need to make sure my expression knows where an email address (or multiple) are in a text and when they stop.







php regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 22:17







Dirk J. Faber

















asked Nov 22 '18 at 21:00









Dirk J. FaberDirk J. Faber

1,3561317




1,3561317








  • 1





    You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

    – The fourth bird
    Nov 22 '18 at 21:06








  • 1





    @Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

    – Magnus Eriksson
    Nov 22 '18 at 21:15








  • 1





    I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

    – Jeff
    Nov 22 '18 at 21:23






  • 1





    @DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

    – Jeff
    Nov 22 '18 at 21:25






  • 1





    Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

    – Magnus Eriksson
    Nov 22 '18 at 21:26
















  • 1





    You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

    – The fourth bird
    Nov 22 '18 at 21:06








  • 1





    @Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

    – Magnus Eriksson
    Nov 22 '18 at 21:15








  • 1





    I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

    – Jeff
    Nov 22 '18 at 21:23






  • 1





    @DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

    – Jeff
    Nov 22 '18 at 21:25






  • 1





    Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

    – Magnus Eriksson
    Nov 22 '18 at 21:26










1




1





You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

– The fourth bird
Nov 22 '18 at 21:06







You could for example change your regex to ([a-z0-9_.-]+)@((?:[da-z.-]+).)+([a-z]{2,6}) demo. That will repeat the part after the @ sign including the first dot 1+ times. Then omit the dot in the last part.

– The fourth bird
Nov 22 '18 at 21:06






1




1





@Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

– Magnus Eriksson
Nov 22 '18 at 21:15







@Jeff - That regexe doesn't allow emails with foreign characters, dashes or numbers, like: hello@åä-ö.com, while åä-ö.com actually is a valid domain. You should also be able to have dots in the name-part. When matching email addresses (and URL's), you shouldn't be too strict.

– Magnus Eriksson
Nov 22 '18 at 21:15






1




1





I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

– Jeff
Nov 22 '18 at 21:23





I had them in there: ..@[A-Za-z0-9_-].. - but of course I forgot about subdomains like spam@sub-domain.my-host.com

– Jeff
Nov 22 '18 at 21:23




1




1





@DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

– Jeff
Nov 22 '18 at 21:25





@DirkJ.Faber absolutely right. Maybe take Magnus' comments about mine (for `ä') into that aswell.

– Jeff
Nov 22 '18 at 21:25




1




1





Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

– Magnus Eriksson
Nov 22 '18 at 21:26







Basically, look for a good already made regex for this online. Trying to do it yourself is usually really painful. If you check regexes that takes most rules into account, they are huge...

– Magnus Eriksson
Nov 22 '18 at 21:26














1 Answer
1






active

oldest

votes


















1














You may use



/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u


See the regex demo. Or, to only start matching from a letter or digit:



/[p{L}0-9][p{L}0-9_.-]*@[0-9p{L}.-]+.[a-z.]{2,6}b/u


p{L} will match all Unicode base letters (add p{M} if you need to also match diacritics, though I doubt there are any here) and add a word boundary at the end to stop before a dot. Remove all unnecessary groupings that you are not using.



See the PHP demo:



$re = '/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u';
$str = 'Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc... hello@åä-ö.com
example@so.il.uk';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}


Output:



Array
(
[0] => some@example.com
[1] => some@example.co.uk
[2] => hello@åä-ö.com
[3] => example@so.il.uk
)





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437933%2fphp-regex-email-addresses-from-text-sometimes-right-before-a-full-stop%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You may use



    /[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u


    See the regex demo. Or, to only start matching from a letter or digit:



    /[p{L}0-9][p{L}0-9_.-]*@[0-9p{L}.-]+.[a-z.]{2,6}b/u


    p{L} will match all Unicode base letters (add p{M} if you need to also match diacritics, though I doubt there are any here) and add a word boundary at the end to stop before a dot. Remove all unnecessary groupings that you are not using.



    See the PHP demo:



    $re = '/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u';
    $str = 'Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc... hello@åä-ö.com
    example@so.il.uk';
    if (preg_match_all($re, $str, $matches)) {
    print_r($matches[0]);
    }


    Output:



    Array
    (
    [0] => some@example.com
    [1] => some@example.co.uk
    [2] => hello@åä-ö.com
    [3] => example@so.il.uk
    )





    share|improve this answer




























      1














      You may use



      /[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u


      See the regex demo. Or, to only start matching from a letter or digit:



      /[p{L}0-9][p{L}0-9_.-]*@[0-9p{L}.-]+.[a-z.]{2,6}b/u


      p{L} will match all Unicode base letters (add p{M} if you need to also match diacritics, though I doubt there are any here) and add a word boundary at the end to stop before a dot. Remove all unnecessary groupings that you are not using.



      See the PHP demo:



      $re = '/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u';
      $str = 'Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc... hello@åä-ö.com
      example@so.il.uk';
      if (preg_match_all($re, $str, $matches)) {
      print_r($matches[0]);
      }


      Output:



      Array
      (
      [0] => some@example.com
      [1] => some@example.co.uk
      [2] => hello@åä-ö.com
      [3] => example@so.il.uk
      )





      share|improve this answer


























        1












        1








        1







        You may use



        /[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u


        See the regex demo. Or, to only start matching from a letter or digit:



        /[p{L}0-9][p{L}0-9_.-]*@[0-9p{L}.-]+.[a-z.]{2,6}b/u


        p{L} will match all Unicode base letters (add p{M} if you need to also match diacritics, though I doubt there are any here) and add a word boundary at the end to stop before a dot. Remove all unnecessary groupings that you are not using.



        See the PHP demo:



        $re = '/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u';
        $str = 'Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc... hello@åä-ö.com
        example@so.il.uk';
        if (preg_match_all($re, $str, $matches)) {
        print_r($matches[0]);
        }


        Output:



        Array
        (
        [0] => some@example.com
        [1] => some@example.co.uk
        [2] => hello@åä-ö.com
        [3] => example@so.il.uk
        )





        share|improve this answer













        You may use



        /[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u


        See the regex demo. Or, to only start matching from a letter or digit:



        /[p{L}0-9][p{L}0-9_.-]*@[0-9p{L}.-]+.[a-z.]{2,6}b/u


        p{L} will match all Unicode base letters (add p{M} if you need to also match diacritics, though I doubt there are any here) and add a word boundary at the end to stop before a dot. Remove all unnecessary groupings that you are not using.



        See the PHP demo:



        $re = '/[p{L}0-9_.-]+@[0-9p{L}.-]+.[a-z.]{2,6}b/u';
        $str = 'Please email us at: some@example.com. You can also mail us at some@example.co.uk. Etc... hello@åä-ö.com
        example@so.il.uk';
        if (preg_match_all($re, $str, $matches)) {
        print_r($matches[0]);
        }


        Output:



        Array
        (
        [0] => some@example.com
        [1] => some@example.co.uk
        [2] => hello@åä-ö.com
        [3] => example@so.il.uk
        )






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 '18 at 22:16









        Wiktor StribiżewWiktor Stribiżew

        325k16146226




        325k16146226
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437933%2fphp-regex-email-addresses-from-text-sometimes-right-before-a-full-stop%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini