Python regex remove everything except strings from list
I have string:
bdv. mot. g. vns. kilm.
And knowing list of strings like
important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
I want to get regex
selection like:
bdv. mot. g.
I joined list and tried: idea from here
regex = re.compile(r'b(?!bdv.|dktv.|mot. g.|vyr. g.)w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')
Got
'bdv. mot. . . .'
Changing places in regex with s
also didn't work out. How to do it?
I could use something like [x for x in important_strings_lst if x in my_string]
but I need good performance as this will be used with million rows of pandas dataframe with str.replace
python regex list replace
add a comment |
I have string:
bdv. mot. g. vns. kilm.
And knowing list of strings like
important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
I want to get regex
selection like:
bdv. mot. g.
I joined list and tried: idea from here
regex = re.compile(r'b(?!bdv.|dktv.|mot. g.|vyr. g.)w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')
Got
'bdv. mot. . . .'
Changing places in regex with s
also didn't work out. How to do it?
I could use something like [x for x in important_strings_lst if x in my_string]
but I need good performance as this will be used with million rows of pandas dataframe with str.replace
python regex list replace
add a comment |
I have string:
bdv. mot. g. vns. kilm.
And knowing list of strings like
important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
I want to get regex
selection like:
bdv. mot. g.
I joined list and tried: idea from here
regex = re.compile(r'b(?!bdv.|dktv.|mot. g.|vyr. g.)w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')
Got
'bdv. mot. . . .'
Changing places in regex with s
also didn't work out. How to do it?
I could use something like [x for x in important_strings_lst if x in my_string]
but I need good performance as this will be used with million rows of pandas dataframe with str.replace
python regex list replace
I have string:
bdv. mot. g. vns. kilm.
And knowing list of strings like
important_strings_lst=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
I want to get regex
selection like:
bdv. mot. g.
I joined list and tried: idea from here
regex = re.compile(r'b(?!bdv.|dktv.|mot. g.|vyr. g.)w+', re.UNICODE)
regex.sub("", 'bdv. mot. g. vns. kilm.')
Got
'bdv. mot. . . .'
Changing places in regex with s
also didn't work out. How to do it?
I could use something like [x for x in important_strings_lst if x in my_string]
but I need good performance as this will be used with million rows of pandas dataframe with str.replace
python regex list replace
python regex list replace
edited Nov 10 at 17:38
Sandeep Kadapa
5,667427
5,667427
asked Nov 10 at 17:37
Lukas
355
355
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
The .
character has special meaning in regular expressions. You can use re.escape
to make a string "safe" for use in a regular expression.
>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']
Pandas has its own findall
which should work like re.findall
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed hasstr.findall
method. Andre.escape
removes dots. What is left is list instead of string. But may I get out with this.
– Lukas
Nov 10 at 18:17
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
add a comment |
Maybe split string
bdv. mot. g. vns. kilm.
using your list and remove from oryginal string what left after spliting.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241664%2fpython-regex-remove-everything-except-strings-from-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The .
character has special meaning in regular expressions. You can use re.escape
to make a string "safe" for use in a regular expression.
>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']
Pandas has its own findall
which should work like re.findall
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed hasstr.findall
method. Andre.escape
removes dots. What is left is list instead of string. But may I get out with this.
– Lukas
Nov 10 at 18:17
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
add a comment |
The .
character has special meaning in regular expressions. You can use re.escape
to make a string "safe" for use in a regular expression.
>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']
Pandas has its own findall
which should work like re.findall
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed hasstr.findall
method. Andre.escape
removes dots. What is left is list instead of string. But may I get out with this.
– Lukas
Nov 10 at 18:17
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
add a comment |
The .
character has special meaning in regular expressions. You can use re.escape
to make a string "safe" for use in a regular expression.
>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']
Pandas has its own findall
which should work like re.findall
The .
character has special meaning in regular expressions. You can use re.escape
to make a string "safe" for use in a regular expression.
>>> import re
... important_strings=['bdv.', 'dktv.', 'mot. g.', 'vyr. g.']
... regex = re.compile('|'.join(re.escape(s) for s in important_strings))
... regex.findall('bdv. mot. g. vns. kilm.')
['bdv.', 'mot. g.']
Pandas has its own findall
which should work like re.findall
edited Nov 10 at 18:17
answered Nov 10 at 17:51
Håken Lid
10.5k62441
10.5k62441
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed hasstr.findall
method. Andre.escape
removes dots. What is left is list instead of string. But may I get out with this.
– Lukas
Nov 10 at 18:17
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
add a comment |
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed hasstr.findall
method. Andre.escape
removes dots. What is left is list instead of string. But may I get out with this.
– Lukas
Nov 10 at 18:17
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
1
1
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
@perreal, your comment above is not clear, can your pls make it clear.
– pygo
Nov 10 at 18:02
Pandas series indeed has
str.findall
method. And re.escape
removes dots. What is left is list instead of string. But may I get out with this.– Lukas
Nov 10 at 18:17
Pandas series indeed has
str.findall
method. And re.escape
removes dots. What is left is list instead of string. But may I get out with this.– Lukas
Nov 10 at 18:17
1
1
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
.str.findall('|'.join(re.escape(s) for s in important_strings)).str.join(' ')
– Lukas
Nov 10 at 18:23
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
You can benchmark to test if findall is faster than your original negative lookahead. I try to avoid using lookaround assertions in my regular expressions because they are often hard to read/understand and in some cases they can be very slow, if the regex engine is forced to do a lot of backtracking.
– Håken Lid
Nov 10 at 18:38
add a comment |
Maybe split string
bdv. mot. g. vns. kilm.
using your list and remove from oryginal string what left after spliting.
add a comment |
Maybe split string
bdv. mot. g. vns. kilm.
using your list and remove from oryginal string what left after spliting.
add a comment |
Maybe split string
bdv. mot. g. vns. kilm.
using your list and remove from oryginal string what left after spliting.
Maybe split string
bdv. mot. g. vns. kilm.
using your list and remove from oryginal string what left after spliting.
answered Nov 10 at 18:08
user10403681
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241664%2fpython-regex-remove-everything-except-strings-from-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown