Get all unnamed groups in a Python match object












1















I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.



For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"



I tried the following:



match_obj.groups()


under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.



Then, I decided to write my own generator for it:



def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1


Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?










share|improve this question























  • As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

    – abarnert
    May 17 '15 at 23:24











  • Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

    – abarnert
    May 17 '15 at 23:26











  • @abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

    – potato
    May 17 '15 at 23:49













  • @abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

    – potato
    May 17 '15 at 23:52











  • If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

    – abarnert
    May 17 '15 at 23:54
















1















I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.



For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"



I tried the following:



match_obj.groups()


under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.



Then, I decided to write my own generator for it:



def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1


Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?










share|improve this question























  • As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

    – abarnert
    May 17 '15 at 23:24











  • Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

    – abarnert
    May 17 '15 at 23:26











  • @abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

    – potato
    May 17 '15 at 23:49













  • @abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

    – potato
    May 17 '15 at 23:52











  • If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

    – abarnert
    May 17 '15 at 23:54














1












1








1








I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.



For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"



I tried the following:



match_obj.groups()


under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.



Then, I decided to write my own generator for it:



def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1


Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?










share|improve this question














I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.



For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"



I tried the following:



match_obj.groups()


under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.



Then, I decided to write my own generator for it:



def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1


Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?







python regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 17 '15 at 23:10









potatopotato

1428




1428













  • As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

    – abarnert
    May 17 '15 at 23:24











  • Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

    – abarnert
    May 17 '15 at 23:26











  • @abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

    – potato
    May 17 '15 at 23:49













  • @abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

    – potato
    May 17 '15 at 23:52











  • If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

    – abarnert
    May 17 '15 at 23:54



















  • As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

    – abarnert
    May 17 '15 at 23:24











  • Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

    – abarnert
    May 17 '15 at 23:26











  • @abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

    – potato
    May 17 '15 at 23:49













  • @abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

    – potato
    May 17 '15 at 23:52











  • If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

    – abarnert
    May 17 '15 at 23:54

















As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24





As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24













Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26





Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26













@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49







@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49















@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52





@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52













If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54





If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54












2 Answers
2






active

oldest

votes


















1














There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:



named = dict()
unnamed = dict()
all = mo.groups()

# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v

# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v

print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}


The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb



Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict






share|improve this answer


























  • Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

    – potato
    May 18 '15 at 0:07











  • @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

    – Asad Saeeduddin
    May 18 '15 at 0:10











  • @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

    – Asad Saeeduddin
    May 18 '15 at 0:13











  • Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

    – potato
    May 18 '15 at 0:29











  • @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

    – Asad Saeeduddin
    May 18 '15 at 0:33



















0














Here is a clean version, using re.regex.groupindex:




A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.






TL;DR: Short copy & paste function:



def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]


m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")

unnamed = grouplist(m)
print(unnamed)


Full example



With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:



import re

# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |

m = regex.match("Pinkamena Diane Pie")


This are the values we want to use, for your convenience:



assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]

assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}

assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}


Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():



named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for

print(unnamed)
print(named)


So in the end we get:



# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]

# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}


Try the example yourself: https://ideone.com/pDMjpP






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30293064%2fget-all-unnamed-groups-in-a-python-match-object%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:



    named = dict()
    unnamed = dict()
    all = mo.groups()

    # Index every named group by its span
    for k,v in mo.groupdict().items():
    named[mo.span(k)] = v

    # Index every other group by its span, skipping groups with same
    # span as a named group
    for i,v in enumerate(all):
    sp = mo.span(i + 1)
    if sp not in named:
    unnamed[sp] = v

    print(named) # {(8, 9): '%'}
    print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}


    The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb



    Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict






    share|improve this answer


























    • Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

      – potato
      May 18 '15 at 0:07











    • @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

      – Asad Saeeduddin
      May 18 '15 at 0:10











    • @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

      – Asad Saeeduddin
      May 18 '15 at 0:13











    • Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

      – potato
      May 18 '15 at 0:29











    • @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

      – Asad Saeeduddin
      May 18 '15 at 0:33
















    1














    There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:



    named = dict()
    unnamed = dict()
    all = mo.groups()

    # Index every named group by its span
    for k,v in mo.groupdict().items():
    named[mo.span(k)] = v

    # Index every other group by its span, skipping groups with same
    # span as a named group
    for i,v in enumerate(all):
    sp = mo.span(i + 1)
    if sp not in named:
    unnamed[sp] = v

    print(named) # {(8, 9): '%'}
    print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}


    The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb



    Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict






    share|improve this answer


























    • Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

      – potato
      May 18 '15 at 0:07











    • @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

      – Asad Saeeduddin
      May 18 '15 at 0:10











    • @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

      – Asad Saeeduddin
      May 18 '15 at 0:13











    • Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

      – potato
      May 18 '15 at 0:29











    • @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

      – Asad Saeeduddin
      May 18 '15 at 0:33














    1












    1








    1







    There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:



    named = dict()
    unnamed = dict()
    all = mo.groups()

    # Index every named group by its span
    for k,v in mo.groupdict().items():
    named[mo.span(k)] = v

    # Index every other group by its span, skipping groups with same
    # span as a named group
    for i,v in enumerate(all):
    sp = mo.span(i + 1)
    if sp not in named:
    unnamed[sp] = v

    print(named) # {(8, 9): '%'}
    print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}


    The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb



    Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict






    share|improve this answer















    There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:



    named = dict()
    unnamed = dict()
    all = mo.groups()

    # Index every named group by its span
    for k,v in mo.groupdict().items():
    named[mo.span(k)] = v

    # Index every other group by its span, skipping groups with same
    # span as a named group
    for i,v in enumerate(all):
    sp = mo.span(i + 1)
    if sp not in named:
    unnamed[sp] = v

    print(named) # {(8, 9): '%'}
    print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}


    The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb



    Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited May 18 '15 at 0:33

























    answered May 17 '15 at 23:54









    Asad SaeeduddinAsad Saeeduddin

    37.2k460108




    37.2k460108













    • Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

      – potato
      May 18 '15 at 0:07











    • @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

      – Asad Saeeduddin
      May 18 '15 at 0:10











    • @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

      – Asad Saeeduddin
      May 18 '15 at 0:13











    • Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

      – potato
      May 18 '15 at 0:29











    • @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

      – Asad Saeeduddin
      May 18 '15 at 0:33



















    • Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

      – potato
      May 18 '15 at 0:07











    • @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

      – Asad Saeeduddin
      May 18 '15 at 0:10











    • @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

      – Asad Saeeduddin
      May 18 '15 at 0:13











    • Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

      – potato
      May 18 '15 at 0:29











    • @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

      – Asad Saeeduddin
      May 18 '15 at 0:33

















    Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

    – potato
    May 18 '15 at 0:07





    Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

    – potato
    May 18 '15 at 0:07













    @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

    – Asad Saeeduddin
    May 18 '15 at 0:10





    @potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

    – Asad Saeeduddin
    May 18 '15 at 0:10













    @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

    – Asad Saeeduddin
    May 18 '15 at 0:13





    @potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

    – Asad Saeeduddin
    May 18 '15 at 0:13













    Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

    – potato
    May 18 '15 at 0:29





    Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

    – potato
    May 18 '15 at 0:29













    @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

    – Asad Saeeduddin
    May 18 '15 at 0:33





    @potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

    – Asad Saeeduddin
    May 18 '15 at 0:33













    0














    Here is a clean version, using re.regex.groupindex:




    A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.






    TL;DR: Short copy & paste function:



    def grouplist(match):
    named = match.groupdict()
    ignored_groups = set()
    for name, index in match.re.groupindex.items():
    if name in named: # check twice, if it is really the named attribute.
    ignored_groups.add(index)
    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]


    m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")

    unnamed = grouplist(m)
    print(unnamed)


    Full example



    With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:



    import re

    # ===================================================================================
    # This are the current matching groups:
    # ===================================================================================
    regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
    # |-------------------- #1 ------------------|
    # |------- #2 -------|
    # |------ #3 ------|
    # |------- #4 -------|
    # |------ #5 ------|
    # ===================================================================================
    # But we want to have the following groups instead (regex line is identical):
    # ===================================================================================
    regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
    # |---------------- #1 (#1) -----------------|
    # |- first_name (#2) -|
    # |---- #2 (#3) ----|
    # |- middle_name (#4)-|
    # | last_name (#5) |

    m = regex.match("Pinkamena Diane Pie")


    This are the values we want to use, for your convenience:



    assert list(m.groups()) == [
    'Pinkamena Diane', # group #1
    'Pinkamena', # group #2 (first_name)
    'Pinkamena', # group #3
    'Diane', # group #4 (middle_name)
    'Pie', # group #5 (last_name)
    ]

    assert dict(m.groupdict()) == {
    'first_name': 'Pinkamena', # group #2
    'middle_name': 'Diane', # group #4
    'last_name': 'Pie', # group #5
    }

    assert dict(m.re.groupindex) == {
    'first_name': 2, # Pinkamena
    'middle_name': 4, # Diane
    'last_name': 5, # Pie
    }


    Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():



    named = m.groupdict()
    ignored_groups = set()
    for name, index in m.re.groupindex.items():
    if name in named: # check twice, if it is really the named attribute.
    ignored_groups.add(index)
    # end if
    unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
    # end for

    print(unnamed)
    print(named)


    So in the end we get:



    # unnamed = grouplist(m)
    assert unnamed == [
    'Pinkamena Diane', # group #1 (#1)
    'Pinkamena', # group #2 (#3)
    ]

    # named = m.groupdict()
    assert named == {
    'first_name': 'Pinkamena', # group #2
    'middle_name': 'Diane', # group #4
    'last_name': 'Pie', # group #5
    }


    Try the example yourself: https://ideone.com/pDMjpP






    share|improve this answer






























      0














      Here is a clean version, using re.regex.groupindex:




      A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.






      TL;DR: Short copy & paste function:



      def grouplist(match):
      named = match.groupdict()
      ignored_groups = set()
      for name, index in match.re.groupindex.items():
      if name in named: # check twice, if it is really the named attribute.
      ignored_groups.add(index)
      return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]


      m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")

      unnamed = grouplist(m)
      print(unnamed)


      Full example



      With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:



      import re

      # ===================================================================================
      # This are the current matching groups:
      # ===================================================================================
      regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
      # |-------------------- #1 ------------------|
      # |------- #2 -------|
      # |------ #3 ------|
      # |------- #4 -------|
      # |------ #5 ------|
      # ===================================================================================
      # But we want to have the following groups instead (regex line is identical):
      # ===================================================================================
      regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
      # |---------------- #1 (#1) -----------------|
      # |- first_name (#2) -|
      # |---- #2 (#3) ----|
      # |- middle_name (#4)-|
      # | last_name (#5) |

      m = regex.match("Pinkamena Diane Pie")


      This are the values we want to use, for your convenience:



      assert list(m.groups()) == [
      'Pinkamena Diane', # group #1
      'Pinkamena', # group #2 (first_name)
      'Pinkamena', # group #3
      'Diane', # group #4 (middle_name)
      'Pie', # group #5 (last_name)
      ]

      assert dict(m.groupdict()) == {
      'first_name': 'Pinkamena', # group #2
      'middle_name': 'Diane', # group #4
      'last_name': 'Pie', # group #5
      }

      assert dict(m.re.groupindex) == {
      'first_name': 2, # Pinkamena
      'middle_name': 4, # Diane
      'last_name': 5, # Pie
      }


      Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():



      named = m.groupdict()
      ignored_groups = set()
      for name, index in m.re.groupindex.items():
      if name in named: # check twice, if it is really the named attribute.
      ignored_groups.add(index)
      # end if
      unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
      # end for

      print(unnamed)
      print(named)


      So in the end we get:



      # unnamed = grouplist(m)
      assert unnamed == [
      'Pinkamena Diane', # group #1 (#1)
      'Pinkamena', # group #2 (#3)
      ]

      # named = m.groupdict()
      assert named == {
      'first_name': 'Pinkamena', # group #2
      'middle_name': 'Diane', # group #4
      'last_name': 'Pie', # group #5
      }


      Try the example yourself: https://ideone.com/pDMjpP






      share|improve this answer




























        0












        0








        0







        Here is a clean version, using re.regex.groupindex:




        A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.






        TL;DR: Short copy & paste function:



        def grouplist(match):
        named = match.groupdict()
        ignored_groups = set()
        for name, index in match.re.groupindex.items():
        if name in named: # check twice, if it is really the named attribute.
        ignored_groups.add(index)
        return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]


        m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")

        unnamed = grouplist(m)
        print(unnamed)


        Full example



        With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:



        import re

        # ===================================================================================
        # This are the current matching groups:
        # ===================================================================================
        regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
        # |-------------------- #1 ------------------|
        # |------- #2 -------|
        # |------ #3 ------|
        # |------- #4 -------|
        # |------ #5 ------|
        # ===================================================================================
        # But we want to have the following groups instead (regex line is identical):
        # ===================================================================================
        regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
        # |---------------- #1 (#1) -----------------|
        # |- first_name (#2) -|
        # |---- #2 (#3) ----|
        # |- middle_name (#4)-|
        # | last_name (#5) |

        m = regex.match("Pinkamena Diane Pie")


        This are the values we want to use, for your convenience:



        assert list(m.groups()) == [
        'Pinkamena Diane', # group #1
        'Pinkamena', # group #2 (first_name)
        'Pinkamena', # group #3
        'Diane', # group #4 (middle_name)
        'Pie', # group #5 (last_name)
        ]

        assert dict(m.groupdict()) == {
        'first_name': 'Pinkamena', # group #2
        'middle_name': 'Diane', # group #4
        'last_name': 'Pie', # group #5
        }

        assert dict(m.re.groupindex) == {
        'first_name': 2, # Pinkamena
        'middle_name': 4, # Diane
        'last_name': 5, # Pie
        }


        Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():



        named = m.groupdict()
        ignored_groups = set()
        for name, index in m.re.groupindex.items():
        if name in named: # check twice, if it is really the named attribute.
        ignored_groups.add(index)
        # end if
        unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
        # end for

        print(unnamed)
        print(named)


        So in the end we get:



        # unnamed = grouplist(m)
        assert unnamed == [
        'Pinkamena Diane', # group #1 (#1)
        'Pinkamena', # group #2 (#3)
        ]

        # named = m.groupdict()
        assert named == {
        'first_name': 'Pinkamena', # group #2
        'middle_name': 'Diane', # group #4
        'last_name': 'Pie', # group #5
        }


        Try the example yourself: https://ideone.com/pDMjpP






        share|improve this answer















        Here is a clean version, using re.regex.groupindex:




        A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.






        TL;DR: Short copy & paste function:



        def grouplist(match):
        named = match.groupdict()
        ignored_groups = set()
        for name, index in match.re.groupindex.items():
        if name in named: # check twice, if it is really the named attribute.
        ignored_groups.add(index)
        return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]


        m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")

        unnamed = grouplist(m)
        print(unnamed)


        Full example



        With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:



        import re

        # ===================================================================================
        # This are the current matching groups:
        # ===================================================================================
        regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
        # |-------------------- #1 ------------------|
        # |------- #2 -------|
        # |------ #3 ------|
        # |------- #4 -------|
        # |------ #5 ------|
        # ===================================================================================
        # But we want to have the following groups instead (regex line is identical):
        # ===================================================================================
        regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
        # |---------------- #1 (#1) -----------------|
        # |- first_name (#2) -|
        # |---- #2 (#3) ----|
        # |- middle_name (#4)-|
        # | last_name (#5) |

        m = regex.match("Pinkamena Diane Pie")


        This are the values we want to use, for your convenience:



        assert list(m.groups()) == [
        'Pinkamena Diane', # group #1
        'Pinkamena', # group #2 (first_name)
        'Pinkamena', # group #3
        'Diane', # group #4 (middle_name)
        'Pie', # group #5 (last_name)
        ]

        assert dict(m.groupdict()) == {
        'first_name': 'Pinkamena', # group #2
        'middle_name': 'Diane', # group #4
        'last_name': 'Pie', # group #5
        }

        assert dict(m.re.groupindex) == {
        'first_name': 2, # Pinkamena
        'middle_name': 4, # Diane
        'last_name': 5, # Pie
        }


        Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():



        named = m.groupdict()
        ignored_groups = set()
        for name, index in m.re.groupindex.items():
        if name in named: # check twice, if it is really the named attribute.
        ignored_groups.add(index)
        # end if
        unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
        # end for

        print(unnamed)
        print(named)


        So in the end we get:



        # unnamed = grouplist(m)
        assert unnamed == [
        'Pinkamena Diane', # group #1 (#1)
        'Pinkamena', # group #2 (#3)
        ]

        # named = m.groupdict()
        assert named == {
        'first_name': 'Pinkamena', # group #2
        'middle_name': 'Diane', # group #4
        'last_name': 'Pie', # group #5
        }


        Try the example yourself: https://ideone.com/pDMjpP







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 19 '18 at 13:45

























        answered Nov 20 '18 at 3:27









        luckydonaldluckydonald

        1,51411329




        1,51411329






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30293064%2fget-all-unnamed-groups-in-a-python-match-object%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini