Get all unnamed groups in a Python match object

I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.

For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"

I tried the following:

match_obj.groups()

under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.

Then, I decided to write my own generator for it:

def get_unnamed_groups(match_obj):

    index = 1

    while True:

        try: yield match_obj.group(index)

        except IndexError: break

        index += 1

Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?

asked May 17 '15 at 23:10

potato

1428

As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24

Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26

@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49

@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52

If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54

|
show 1 more comment

I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.

For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"

I tried the following:

match_obj.groups()

under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.

Then, I decided to write my own generator for it:

def get_unnamed_groups(match_obj):

    index = 1

    while True:

        try: yield match_obj.group(index)

        except IndexError: break

        index += 1

Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?

asked May 17 '15 at 23:10

potato

1428

As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24

Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26

@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49

@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52

If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54

|
show 1 more comment

I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.

For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"

I tried the following:

match_obj.groups()

under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.

Then, I decided to write my own generator for it:

def get_unnamed_groups(match_obj):

    index = 1

    while True:

        try: yield match_obj.group(index)

        except IndexError: break

        index += 1

Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?

asked May 17 '15 at 23:10

potato

1428

I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.

For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#]) matching the string "aaba2321%", I want to get a list containing ["aaba", "2321"], but not "%"

I tried the following:

match_obj.groups()

under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict, for getting only the named groups. Unfortunately, groups included named groups.

Then, I decided to write my own generator for it:

def get_unnamed_groups(match_obj):

    index = 1

    while True:

        try: yield match_obj.group(index)

        except IndexError: break

        index += 1

Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?

python regex

asked May 17 '15 at 23:10

potato

1428

asked May 17 '15 at 23:10

potato

1428

asked May 17 '15 at 23:10

potato

1428

asked May 17 '15 at 23:10

potato

1428

asked May 17 '15 at 23:10

potato

1428

As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24

Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26

@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49

@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52

If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54

|
show 1 more comment

As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24

Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26

@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49

@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52

If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54

As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?

– abarnert
May 17 '15 at 23:24

Also, what did you expect that generator to do? What it's actually going to do the same thing as iter(match_obj.groups()), because groups is just defined as returning a tuple of the same subgroups with the same indices that group uses.

– abarnert
May 17 '15 at 23:26

@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.

– potato
May 17 '15 at 23:49

@abarnert I wasn't sure how match objects work internally. It makes sense for group to use groups, but it would also make sense to have an unnamed groups list.

– potato
May 17 '15 at 23:52

If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).

– abarnert
May 17 '15 at 23:54

|
show 1 more comment

2 Answers
2

active

oldest

votes

There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict and groups:

named = dict()

unnamed = dict()

all = mo.groups()



# Index every named group by its span

for k,v in mo.groupdict().items():

    named[mo.span(k)] = v



# Index every other group by its span, skipping groups with same 

# span as a named group

for i,v in enumerate(all):

    sp = mo.span(i + 1)

    if sp not in named:

        unnamed[sp] = v



print(named)   # {(8, 9): '%'}

print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}

The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb

Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

|
show 1 more comment

Here is a clean version, using re.regex.groupindex:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.

TL;DR: Short copy & paste function:

def grouplist(match):

    named = match.groupdict()

    ignored_groups = set()

    for name, index in match.re.groupindex.items():

        if name in named:  # check twice, if it is really the named attribute.

            ignored_groups.add(index)

    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]





m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")



unnamed = grouplist(m)

print(unnamed)

Full example

With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:

import re



# ===================================================================================

# This are the current matching groups:

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)") 

#                   |-------------------- #1 ------------------|

#                    |------- #2 -------|

#                     |------ #3 ------|

#                                          |------- #4 -------|

#                                                                |------ #5 ------|

# ===================================================================================

# But we want to have the following groups instead (regex line is identical):

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")

#                   |---------------- #1 (#1) -----------------|

#                    |- first_name (#2) -|

#                     |---- #2 (#3) ----|

#                                          |- middle_name (#4)-|

#                                                                | last_name (#5) |



m = regex.match("Pinkamena Diane Pie")

This are the values we want to use, for your convenience:

assert list(m.groups()) == [

    'Pinkamena Diane',  # group #1

    'Pinkamena',        # group #2 (first_name)

    'Pinkamena',        # group #3

    'Diane',            # group #4 (middle_name)

    'Pie',              # group #5 (last_name)

]



assert dict(m.groupdict()) == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}



assert dict(m.re.groupindex) == {

    'first_name':  2,  # Pinkamena

    'middle_name': 4,  # Diane

    'last_name':   5,  # Pie

}

Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():

named = m.groupdict()

ignored_groups = set()

for name, index in m.re.groupindex.items():

    if name in named:  # check twice, if it is really the named attribute.

        ignored_groups.add(index)

    # end if

unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]

# end for



print(unnamed)

print(named)

So in the end we get:

# unnamed = grouplist(m)

assert unnamed == [

    'Pinkamena Diane',  # group #1 (#1)

    'Pinkamena',        # group #2 (#3)

]



# named = m.groupdict()

assert named == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}

Try the example yourself: https://ideone.com/pDMjpP

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30293064%2fget-all-unnamed-groups-in-a-python-match-object%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

named = dict()

unnamed = dict()

all = mo.groups()



# Index every named group by its span

for k,v in mo.groupdict().items():

    named[mo.span(k)] = v



# Index every other group by its span, skipping groups with same 

# span as a named group

for i,v in enumerate(all):

    sp = mo.span(i + 1)

    if sp not in named:

        unnamed[sp] = v



print(named)   # {(8, 9): '%'}

print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

|
show 1 more comment

named = dict()

unnamed = dict()

all = mo.groups()



# Index every named group by its span

for k,v in mo.groupdict().items():

    named[mo.span(k)] = v



# Index every other group by its span, skipping groups with same 

# span as a named group

for i,v in enumerate(all):

    sp = mo.span(i + 1)

    if sp not in named:

        unnamed[sp] = v



print(named)   # {(8, 9): '%'}

print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

|
show 1 more comment

named = dict()

unnamed = dict()

all = mo.groups()



# Index every named group by its span

for k,v in mo.groupdict().items():

    named[mo.span(k)] = v



# Index every other group by its span, skipping groups with same 

# span as a named group

for i,v in enumerate(all):

    sp = mo.span(i + 1)

    if sp not in named:

        unnamed[sp] = v



print(named)   # {(8, 9): '%'}

print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

named = dict()

unnamed = dict()

all = mo.groups()



# Index every named group by its span

for k,v in mo.groupdict().items():

    named[mo.span(k)] = v



# Index every other group by its span, skipping groups with same 

# span as a named group

for i,v in enumerate(all):

    sp = mo.span(i + 1)

    if sp not in named:

        unnamed[sp] = v



print(named)   # {(8, 9): '%'}

print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

edited May 18 '15 at 0:33

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

answered May 17 '15 at 23:54

Asad Saeeduddin

37.2k460108

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

|
show 1 more comment

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.

– potato
May 18 '15 at 0:07

@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.

– Asad Saeeduddin
May 18 '15 at 0:10

@potato For example, I've modified my demo to add a % at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb

– Asad Saeeduddin
May 18 '15 at 0:13

Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.

– potato
May 18 '15 at 0:29

@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.

– Asad Saeeduddin
May 18 '15 at 0:33

|
show 1 more comment

Here is a clean version, using re.regex.groupindex:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.

TL;DR: Short copy & paste function:

def grouplist(match):

    named = match.groupdict()

    ignored_groups = set()

    for name, index in match.re.groupindex.items():

        if name in named:  # check twice, if it is really the named attribute.

            ignored_groups.add(index)

    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]





m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")



unnamed = grouplist(m)

print(unnamed)

Full example

With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:

import re



# ===================================================================================

# This are the current matching groups:

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)") 

#                   |-------------------- #1 ------------------|

#                    |------- #2 -------|

#                     |------ #3 ------|

#                                          |------- #4 -------|

#                                                                |------ #5 ------|

# ===================================================================================

# But we want to have the following groups instead (regex line is identical):

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")

#                   |---------------- #1 (#1) -----------------|

#                    |- first_name (#2) -|

#                     |---- #2 (#3) ----|

#                                          |- middle_name (#4)-|

#                                                                | last_name (#5) |



m = regex.match("Pinkamena Diane Pie")

This are the values we want to use, for your convenience:

assert list(m.groups()) == [

    'Pinkamena Diane',  # group #1

    'Pinkamena',        # group #2 (first_name)

    'Pinkamena',        # group #3

    'Diane',            # group #4 (middle_name)

    'Pie',              # group #5 (last_name)

]



assert dict(m.groupdict()) == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}



assert dict(m.re.groupindex) == {

    'first_name':  2,  # Pinkamena

    'middle_name': 4,  # Diane

    'last_name':   5,  # Pie

}

Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():

named = m.groupdict()

ignored_groups = set()

for name, index in m.re.groupindex.items():

    if name in named:  # check twice, if it is really the named attribute.

        ignored_groups.add(index)

    # end if

unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]

# end for



print(unnamed)

print(named)

So in the end we get:

# unnamed = grouplist(m)

assert unnamed == [

    'Pinkamena Diane',  # group #1 (#1)

    'Pinkamena',        # group #2 (#3)

]



# named = m.groupdict()

assert named == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}

Try the example yourself: https://ideone.com/pDMjpP

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

add a comment |

Here is a clean version, using re.regex.groupindex:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.

TL;DR: Short copy & paste function:

def grouplist(match):

    named = match.groupdict()

    ignored_groups = set()

    for name, index in match.re.groupindex.items():

        if name in named:  # check twice, if it is really the named attribute.

            ignored_groups.add(index)

    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]





m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")



unnamed = grouplist(m)

print(unnamed)

Full example

With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:

import re



# ===================================================================================

# This are the current matching groups:

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)") 

#                   |-------------------- #1 ------------------|

#                    |------- #2 -------|

#                     |------ #3 ------|

#                                          |------- #4 -------|

#                                                                |------ #5 ------|

# ===================================================================================

# But we want to have the following groups instead (regex line is identical):

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")

#                   |---------------- #1 (#1) -----------------|

#                    |- first_name (#2) -|

#                     |---- #2 (#3) ----|

#                                          |- middle_name (#4)-|

#                                                                | last_name (#5) |



m = regex.match("Pinkamena Diane Pie")

This are the values we want to use, for your convenience:

assert list(m.groups()) == [

    'Pinkamena Diane',  # group #1

    'Pinkamena',        # group #2 (first_name)

    'Pinkamena',        # group #3

    'Diane',            # group #4 (middle_name)

    'Pie',              # group #5 (last_name)

]



assert dict(m.groupdict()) == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}



assert dict(m.re.groupindex) == {

    'first_name':  2,  # Pinkamena

    'middle_name': 4,  # Diane

    'last_name':   5,  # Pie

}

Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():

named = m.groupdict()

ignored_groups = set()

for name, index in m.re.groupindex.items():

    if name in named:  # check twice, if it is really the named attribute.

        ignored_groups.add(index)

    # end if

unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]

# end for



print(unnamed)

print(named)

So in the end we get:

# unnamed = grouplist(m)

assert unnamed == [

    'Pinkamena Diane',  # group #1 (#1)

    'Pinkamena',        # group #2 (#3)

]



# named = m.groupdict()

assert named == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}

Try the example yourself: https://ideone.com/pDMjpP

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

add a comment |

Here is a clean version, using re.regex.groupindex:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.

TL;DR: Short copy & paste function:

def grouplist(match):

    named = match.groupdict()

    ignored_groups = set()

    for name, index in match.re.groupindex.items():

        if name in named:  # check twice, if it is really the named attribute.

            ignored_groups.add(index)

    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]





m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")



unnamed = grouplist(m)

print(unnamed)

Full example

With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:

import re



# ===================================================================================

# This are the current matching groups:

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)") 

#                   |-------------------- #1 ------------------|

#                    |------- #2 -------|

#                     |------ #3 ------|

#                                          |------- #4 -------|

#                                                                |------ #5 ------|

# ===================================================================================

# But we want to have the following groups instead (regex line is identical):

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")

#                   |---------------- #1 (#1) -----------------|

#                    |- first_name (#2) -|

#                     |---- #2 (#3) ----|

#                                          |- middle_name (#4)-|

#                                                                | last_name (#5) |



m = regex.match("Pinkamena Diane Pie")

This are the values we want to use, for your convenience:

assert list(m.groups()) == [

    'Pinkamena Diane',  # group #1

    'Pinkamena',        # group #2 (first_name)

    'Pinkamena',        # group #3

    'Diane',            # group #4 (middle_name)

    'Pie',              # group #5 (last_name)

]



assert dict(m.groupdict()) == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}



assert dict(m.re.groupindex) == {

    'first_name':  2,  # Pinkamena

    'middle_name': 4,  # Diane

    'last_name':   5,  # Pie

}

Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():

named = m.groupdict()

ignored_groups = set()

for name, index in m.re.groupindex.items():

    if name in named:  # check twice, if it is really the named attribute.

        ignored_groups.add(index)

    # end if

unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]

# end for



print(unnamed)

print(named)

So in the end we get:

# unnamed = grouplist(m)

assert unnamed == [

    'Pinkamena Diane',  # group #1 (#1)

    'Pinkamena',        # group #2 (#3)

]



# named = m.groupdict()

assert named == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}

Try the example yourself: https://ideone.com/pDMjpP

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

Here is a clean version, using re.regex.groupindex:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.

TL;DR: Short copy & paste function:

def grouplist(match):

    named = match.groupdict()

    ignored_groups = set()

    for name, index in match.re.groupindex.items():

        if name in named:  # check twice, if it is really the named attribute.

            ignored_groups.add(index)

    return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]





m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")



unnamed = grouplist(m)

print(unnamed)

Full example

With groupindex we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed in the code below:

import re



# ===================================================================================

# This are the current matching groups:

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)") 

#                   |-------------------- #1 ------------------|

#                    |------- #2 -------|

#                     |------ #3 ------|

#                                          |------- #4 -------|

#                                                                |------ #5 ------|

# ===================================================================================

# But we want to have the following groups instead (regex line is identical):

# ===================================================================================

regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")

#                   |---------------- #1 (#1) -----------------|

#                    |- first_name (#2) -|

#                     |---- #2 (#3) ----|

#                                          |- middle_name (#4)-|

#                                                                | last_name (#5) |



m = regex.match("Pinkamena Diane Pie")

This are the values we want to use, for your convenience:

assert list(m.groups()) == [

    'Pinkamena Diane',  # group #1

    'Pinkamena',        # group #2 (first_name)

    'Pinkamena',        # group #3

    'Diane',            # group #4 (middle_name)

    'Pie',              # group #5 (last_name)

]



assert dict(m.groupdict()) == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}



assert dict(m.re.groupindex) == {

    'first_name':  2,  # Pinkamena

    'middle_name': 4,  # Diane

    'last_name':   5,  # Pie

}

Therefore we can now store the indices of those named groups in a ignored_groups set, to omit those groups when filling unnamed with m.groups():

named = m.groupdict()

ignored_groups = set()

for name, index in m.re.groupindex.items():

    if name in named:  # check twice, if it is really the named attribute.

        ignored_groups.add(index)

    # end if

unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]

# end for



print(unnamed)

print(named)

So in the end we get:

# unnamed = grouplist(m)

assert unnamed == [

    'Pinkamena Diane',  # group #1 (#1)

    'Pinkamena',        # group #2 (#3)

]



# named = m.groupdict()

assert named == {

    'first_name':  'Pinkamena',  # group #2

    'middle_name': 'Diane',      # group #4

    'last_name':   'Pie',        # group #5

}

Try the example yourself: https://ideone.com/pDMjpP

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

edited Dec 19 '18 at 13:45

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

answered Nov 20 '18 at 3:27

luckydonald

1,51411329

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

aYdIrEZqFECj4HDe,wVZv

搜尋此網誌

Wsrtjtyk