Get all unnamed groups in a Python match object
I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.
For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#])
matching the string "aaba2321%"
, I want to get a list containing ["aaba", "2321"]
, but not "%"
I tried the following:
match_obj.groups()
under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict
, for getting only the named groups. Unfortunately, groups
included named groups.
Then, I decided to write my own generator for it:
def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1
Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?
python regex
|
show 1 more comment
I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.
For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#])
matching the string "aaba2321%"
, I want to get a list containing ["aaba", "2321"]
, but not "%"
I tried the following:
match_obj.groups()
under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict
, for getting only the named groups. Unfortunately, groups
included named groups.
Then, I decided to write my own generator for it:
def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1
Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?
python regex
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
Also, what did you expect that generator to do? What it's actually going to do the same thing asiter(match_obj.groups())
, becausegroups
is just defined as returning a tuple of the same subgroups with the same indices thatgroup
uses.
– abarnert
May 17 '15 at 23:26
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I wasn't sure how match objects work internally. It makes sense forgroup
to usegroups
, but it would also make sense to have an unnamed groups list.
– potato
May 17 '15 at 23:52
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54
|
show 1 more comment
I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.
For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#])
matching the string "aaba2321%"
, I want to get a list containing ["aaba", "2321"]
, but not "%"
I tried the following:
match_obj.groups()
under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict
, for getting only the named groups. Unfortunately, groups
included named groups.
Then, I decided to write my own generator for it:
def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1
Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?
python regex
I have a number of related regular expressions that use both named and unnamed groups. I want to plug the unnamed groups as positional arguments to a function chosen using the named group.
For an example, with the pattern ([abc]+)([123]+)(?P<end>[%#])
matching the string "aaba2321%"
, I want to get a list containing ["aaba", "2321"]
, but not "%"
I tried the following:
match_obj.groups()
under the assumption that it wouldn't capture the named groups as there is a separate method, groupdict
, for getting only the named groups. Unfortunately, groups
included named groups.
Then, I decided to write my own generator for it:
def get_unnamed_groups(match_obj):
index = 1
while True:
try: yield match_obj.group(index)
except IndexError: break
index += 1
Unfortunately the named group can also be accessed as a numbered group. How do I get the numbered groups alone?
python regex
python regex
asked May 17 '15 at 23:10
potatopotato
1428
1428
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
Also, what did you expect that generator to do? What it's actually going to do the same thing asiter(match_obj.groups())
, becausegroups
is just defined as returning a tuple of the same subgroups with the same indices thatgroup
uses.
– abarnert
May 17 '15 at 23:26
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I wasn't sure how match objects work internally. It makes sense forgroup
to usegroups
, but it would also make sense to have an unnamed groups list.
– potato
May 17 '15 at 23:52
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54
|
show 1 more comment
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
Also, what did you expect that generator to do? What it's actually going to do the same thing asiter(match_obj.groups())
, becausegroups
is just defined as returning a tuple of the same subgroups with the same indices thatgroup
uses.
– abarnert
May 17 '15 at 23:26
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I wasn't sure how match objects work internally. It makes sense forgroup
to usegroups
, but it would also make sense to have an unnamed groups list.
– potato
May 17 '15 at 23:52
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
Also, what did you expect that generator to do? What it's actually going to do the same thing as
iter(match_obj.groups())
, because groups
is just defined as returning a tuple of the same subgroups with the same indices that group
uses.– abarnert
May 17 '15 at 23:26
Also, what did you expect that generator to do? What it's actually going to do the same thing as
iter(match_obj.groups())
, because groups
is just defined as returning a tuple of the same subgroups with the same indices that group
uses.– abarnert
May 17 '15 at 23:26
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I wasn't sure how match objects work internally. It makes sense for
group
to use groups
, but it would also make sense to have an unnamed groups list.– potato
May 17 '15 at 23:52
@abarnert I wasn't sure how match objects work internally. It makes sense for
group
to use groups
, but it would also make sense to have an unnamed groups list.– potato
May 17 '15 at 23:52
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54
|
show 1 more comment
2 Answers
2
active
oldest
votes
There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict
and groups
:
named = dict()
unnamed = dict()
all = mo.groups()
# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v
# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v
print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}
The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb
Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb
– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
|
show 1 more comment
Here is a clean version, using re.regex.groupindex
:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.
TL;DR: Short copy & paste function:
def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]
m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")
unnamed = grouplist(m)
print(unnamed)
Full example
With groupindex
we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed
in the code below:
import re
# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |
m = regex.match("Pinkamena Diane Pie")
This are the values we want to use, for your convenience:
assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]
assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}
Therefore we can now store the indices of those named groups in a ignored_groups
set, to omit those groups when filling unnamed
with m.groups()
:
named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for
print(unnamed)
print(named)
So in the end we get:
# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]
# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
Try the example yourself: https://ideone.com/pDMjpP
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30293064%2fget-all-unnamed-groups-in-a-python-match-object%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict
and groups
:
named = dict()
unnamed = dict()
all = mo.groups()
# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v
# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v
print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}
The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb
Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb
– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
|
show 1 more comment
There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict
and groups
:
named = dict()
unnamed = dict()
all = mo.groups()
# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v
# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v
print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}
The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb
Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb
– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
|
show 1 more comment
There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict
and groups
:
named = dict()
unnamed = dict()
all = mo.groups()
# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v
# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v
print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}
The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb
Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict
There is a somewhat horrible way to do what you're asking for. It involves indexing all matches by their span (start and end indices) and removing the ones that occur in both groupdict
and groups
:
named = dict()
unnamed = dict()
all = mo.groups()
# Index every named group by its span
for k,v in mo.groupdict().items():
named[mo.span(k)] = v
# Index every other group by its span, skipping groups with same
# span as a named group
for i,v in enumerate(all):
sp = mo.span(i + 1)
if sp not in named:
unnamed[sp] = v
print(named) # {(8, 9): '%'}
print(unnamed) # {(4, 8): '2321', (0, 4): 'aaba'}
The reason indexing by span is necessary is because unnamed and named groups can have the same value. The only unique identifier of a group is where it starts and ends, so this code works fine even when you have groups with the same value. Here is a demo: http://ideone.com/9O7Hpb
Another way to do it would be to write a function that transforms a regex following the form shown in your question to one where all formerly unnamed regexes are named with some prefix and a number. You could match against this regex and pick out the groups that have a name starting with the prefix from groupdict
edited May 18 '15 at 0:33
answered May 17 '15 at 23:54
Asad SaeeduddinAsad Saeeduddin
37.2k460108
37.2k460108
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb
– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
|
show 1 more comment
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb
– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
Your first solution, as it seems you are aware, would have problems if named groups and unnamed groups have the same value. In my case, this wouldn't be a big problem, but I don't really like it. However, I didn't really think about transforming the regexes, and I probably should look into doing so. I think I'll select your answer.
– potato
May 18 '15 at 0:07
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato I'm pretty sure it wouldn't have problems if named groups and unnamed groups had the same value. That is the whole point of indexing by span.
– Asad Saeeduddin
May 18 '15 at 0:10
@potato For example, I've modified my demo to add a
%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb– Asad Saeeduddin
May 18 '15 at 0:13
@potato For example, I've modified my demo to add a
%
at the beginning of the string, and a corresponding group to capture it. Now there is an unnamed group that has the same value as the named group, but they are appropriately separated. See ideone.com/9O7Hpb– Asad Saeeduddin
May 18 '15 at 0:13
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
Oh, I misunderstood what your code was doing. Yeah, apart from some really weird usage like having a group and an unnamed group in the same place it works great, and is simple and readable unless you are me a few minutes ago. Sorry about that.
– potato
May 18 '15 at 0:29
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
@potato That's probably because there weren't any comments. I've added a couple to hopefully clarify what the code is doing.
– Asad Saeeduddin
May 18 '15 at 0:33
|
show 1 more comment
Here is a clean version, using re.regex.groupindex
:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.
TL;DR: Short copy & paste function:
def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]
m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")
unnamed = grouplist(m)
print(unnamed)
Full example
With groupindex
we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed
in the code below:
import re
# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |
m = regex.match("Pinkamena Diane Pie")
This are the values we want to use, for your convenience:
assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]
assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}
Therefore we can now store the indices of those named groups in a ignored_groups
set, to omit those groups when filling unnamed
with m.groups()
:
named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for
print(unnamed)
print(named)
So in the end we get:
# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]
# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
Try the example yourself: https://ideone.com/pDMjpP
add a comment |
Here is a clean version, using re.regex.groupindex
:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.
TL;DR: Short copy & paste function:
def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]
m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")
unnamed = grouplist(m)
print(unnamed)
Full example
With groupindex
we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed
in the code below:
import re
# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |
m = regex.match("Pinkamena Diane Pie")
This are the values we want to use, for your convenience:
assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]
assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}
Therefore we can now store the indices of those named groups in a ignored_groups
set, to omit those groups when filling unnamed
with m.groups()
:
named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for
print(unnamed)
print(named)
So in the end we get:
# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]
# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
Try the example yourself: https://ideone.com/pDMjpP
add a comment |
Here is a clean version, using re.regex.groupindex
:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.
TL;DR: Short copy & paste function:
def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]
m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")
unnamed = grouplist(m)
print(unnamed)
Full example
With groupindex
we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed
in the code below:
import re
# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |
m = regex.match("Pinkamena Diane Pie")
This are the values we want to use, for your convenience:
assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]
assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}
Therefore we can now store the indices of those named groups in a ignored_groups
set, to omit those groups when filling unnamed
with m.groups()
:
named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for
print(unnamed)
print(named)
So in the end we get:
# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]
# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
Try the example yourself: https://ideone.com/pDMjpP
Here is a clean version, using re.regex.groupindex
:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.
TL;DR: Short copy & paste function:
def grouplist(match):
named = match.groupdict()
ignored_groups = set()
for name, index in match.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
return [group for i, group in enumerate(match.groups()) if i+1 not in ignored_groups]
m = re.match('([abc]+)([123]+)(?P<end>[%#])', "aaba2321%")
unnamed = grouplist(m)
print(unnamed)
Full example
With groupindex
we get the indexes of the named matches, and can exclude them when building our final list of groups, called unnamed
in the code below:
import re
# ===================================================================================
# This are the current matching groups:
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |-------------------- #1 ------------------|
# |------- #2 -------|
# |------ #3 ------|
# |------- #4 -------|
# |------ #5 ------|
# ===================================================================================
# But we want to have the following groups instead (regex line is identical):
# ===================================================================================
regex = re.compile("(((?P<first_name>w+)) (?P<middle_name>w+)) (?P<last_name>w+)")
# |---------------- #1 (#1) -----------------|
# |- first_name (#2) -|
# |---- #2 (#3) ----|
# |- middle_name (#4)-|
# | last_name (#5) |
m = regex.match("Pinkamena Diane Pie")
This are the values we want to use, for your convenience:
assert list(m.groups()) == [
'Pinkamena Diane', # group #1
'Pinkamena', # group #2 (first_name)
'Pinkamena', # group #3
'Diane', # group #4 (middle_name)
'Pie', # group #5 (last_name)
]
assert dict(m.groupdict()) == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
assert dict(m.re.groupindex) == {
'first_name': 2, # Pinkamena
'middle_name': 4, # Diane
'last_name': 5, # Pie
}
Therefore we can now store the indices of those named groups in a ignored_groups
set, to omit those groups when filling unnamed
with m.groups()
:
named = m.groupdict()
ignored_groups = set()
for name, index in m.re.groupindex.items():
if name in named: # check twice, if it is really the named attribute.
ignored_groups.add(index)
# end if
unnamed = [group for i, group in enumerate(m.groups()) if i+1 not in ignored_groups]
# end for
print(unnamed)
print(named)
So in the end we get:
# unnamed = grouplist(m)
assert unnamed == [
'Pinkamena Diane', # group #1 (#1)
'Pinkamena', # group #2 (#3)
]
# named = m.groupdict()
assert named == {
'first_name': 'Pinkamena', # group #2
'middle_name': 'Diane', # group #4
'last_name': 'Pie', # group #5
}
Try the example yourself: https://ideone.com/pDMjpP
edited Dec 19 '18 at 13:45
answered Nov 20 '18 at 3:27
luckydonaldluckydonald
1,51411329
1,51411329
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30293064%2fget-all-unnamed-groups-in-a-python-match-object%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
As far as I know, there's no way to do this. Of course you could iterate the groups in the pattern or the start and end indexes of the groups in the match and skip over the ones that are also in named groups if you really need to, but… why do you want this?
– abarnert
May 17 '15 at 23:24
Also, what did you expect that generator to do? What it's actually going to do the same thing as
iter(match_obj.groups())
, becausegroups
is just defined as returning a tuple of the same subgroups with the same indices thatgroup
uses.– abarnert
May 17 '15 at 23:26
@abarnert I want it because it would be an inconvenient, redundant, and create potential problems when changing things if I required the data that includes the related regular expressions to also include the indices of the unnamed groups. I guess I could require all groups to be named, and have the names of the unnamed groups be numbers, but that's what unnamed groups are for.
– potato
May 17 '15 at 23:49
@abarnert I wasn't sure how match objects work internally. It makes sense for
group
to usegroups
, but it would also make sense to have an unnamed groups list.– potato
May 17 '15 at 23:52
If you're not sure, just read the documentation. It seems very clear to me, but if it's not clear to you, someone should probably file a documentation bug (because the docs should be clear to everyone, not just one random person…).
– abarnert
May 17 '15 at 23:54