Regular expressions: Ensuring b doesn't come between a and c
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
add a comment |
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
Here's something I'm trying to do with regular expressions, and I can't figure out how. I have a big file, and strings abc
, 123
and xyz
that appear multiple times throughout the file.
I want a regular expression to match a substring of the big file that begins with abc
, contains 123
somewhere in the middle, ends with xyz
, and there are no other instances of abc
or xyz
in the substring besides the start and the end.
Is this possible with regular expressions?
regex python-2.7
regex python-2.7
edited May 15 '16 at 16:16
Jorge Campos
17.1k63766
17.1k63766
asked May 15 '16 at 15:53
Ram RachumRam Rachum
25.7k61183300
25.7k61183300
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
5
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16
add a comment |
4 Answers
4
active
oldest
votes
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
|
show 3 more comments
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37240408%2fregular-expressions-ensuring-b-doesnt-come-between-a-and-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
|
show 3 more comments
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
|
show 3 more comments
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
You need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- matchabc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for aabc
,xyz
or123
character sequences
123
- a literal string123
(?:(?!abc|xyz).)*
- any character that is not the starting point for aabc
orxyz
character sequences
xyz
- a trailing substringxyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyznabc abc 123 xyznabc text 123 xyznabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
edited Dec 15 '17 at 9:14
Eric Leschinski
88.1k39323276
88.1k39323276
answered May 15 '16 at 16:20
Wiktor StribiżewWiktor Stribiżew
318k16139221
318k16139221
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
|
show 3 more comments
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the|123
?
– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.
– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
Can you please link the site from where you generated that state machine? I know a site exists with similar UI but can't find it though. Sorry for the irrelevant comment. I'll delete it soon :)
– rafid059
Mar 18 '17 at 18:24
3
3
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
See jex.im/regulex
– Wiktor Stribiżew
Mar 18 '17 at 20:10
Why the
|123
?– Stefan Pochmann
Jan 24 '18 at 20:22
Why the
|123
?– Stefan Pochmann
Jan 24 '18 at 20:22
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:
r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.– Wiktor Stribiżew
Jan 24 '18 at 20:26
@StefanPochmann Well, you may get rid of that if you use a lazy quantifier in the first case:
r'abc(?:(?!abc|xyz).)*?123(?:(?!abc|xyz).)*xyz'
. It will work similarly.– Wiktor Stribiżew
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
Both are just optimizations, though, right? Not necessary?
– Stefan Pochmann
Jan 24 '18 at 20:26
|
show 3 more comments
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
add a comment |
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
add a comment |
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
answered May 15 '16 at 16:15
Jorge CamposJorge Campos
17.1k63766
17.1k63766
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
add a comment |
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
which software is used to draw diagram
– Er. Amit Joshi
Jun 3 '18 at 10:15
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
@Er.AmitJoshi It is not a software, it is the Debuggex Site there is a link in the answer
– Jorge Campos
Jun 4 '18 at 1:50
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
add a comment |
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.
edited May 15 '16 at 16:11
Jonathan Leffler
568k916801031
568k916801031
answered May 15 '16 at 16:01
Gordon LinoffGordon Linoff
778k35307410
778k35307410
add a comment |
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
add a comment |
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
edited May 15 '16 at 16:13
answered May 15 '16 at 15:56
Kenny LauKenny Lau
321210
321210
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37240408%2fregular-expressions-ensuring-b-doesnt-come-between-a-and-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. That said, is there any particular reason you want to use regular expressions here? It's possible, but in most environments, it's more complicated than not using regexes.
– user743382
May 15 '16 at 15:56
Should line breaks be considered or not? The big file will be read line by line or as one big string?
– Jorge Campos
May 15 '16 at 15:59
Regex flavor is python 2.7, newlines should be included.
– Ram Rachum
May 15 '16 at 16:16