Filter elements with a regex, only if they are in a certain block











up vote
2
down vote

favorite
1












In a string (in reality it's much bigger):



s = """
BeginA
Qwerty
Element 11 35
EndA

BeginB
Element 12 38
...
Element 198 38
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element 198 38
EndB
"""


how to replace every Element <anythinghere> 38 which is inside a BeginB...EndB block (and only those!) by Element ABC?



I was trying with:



s = re.sub(r'Element .* 38', 'Element ABC', s)


but this doesn't detect if it's in a BeginB...EndB block. How to do this?










share|improve this question






















  • Your code is actually working. I don't see how the output is different from what you want.
    – ninesalt
    Nov 8 at 17:08










  • @ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
    – Basj
    Nov 8 at 17:10















up vote
2
down vote

favorite
1












In a string (in reality it's much bigger):



s = """
BeginA
Qwerty
Element 11 35
EndA

BeginB
Element 12 38
...
Element 198 38
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element 198 38
EndB
"""


how to replace every Element <anythinghere> 38 which is inside a BeginB...EndB block (and only those!) by Element ABC?



I was trying with:



s = re.sub(r'Element .* 38', 'Element ABC', s)


but this doesn't detect if it's in a BeginB...EndB block. How to do this?










share|improve this question






















  • Your code is actually working. I don't see how the output is different from what you want.
    – ninesalt
    Nov 8 at 17:08










  • @ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
    – Basj
    Nov 8 at 17:10













up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





In a string (in reality it's much bigger):



s = """
BeginA
Qwerty
Element 11 35
EndA

BeginB
Element 12 38
...
Element 198 38
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element 198 38
EndB
"""


how to replace every Element <anythinghere> 38 which is inside a BeginB...EndB block (and only those!) by Element ABC?



I was trying with:



s = re.sub(r'Element .* 38', 'Element ABC', s)


but this doesn't detect if it's in a BeginB...EndB block. How to do this?










share|improve this question













In a string (in reality it's much bigger):



s = """
BeginA
Qwerty
Element 11 35
EndA

BeginB
Element 12 38
...
Element 198 38
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element 198 38
EndB
"""


how to replace every Element <anythinghere> 38 which is inside a BeginB...EndB block (and only those!) by Element ABC?



I was trying with:



s = re.sub(r'Element .* 38', 'Element ABC', s)


but this doesn't detect if it's in a BeginB...EndB block. How to do this?







python regex string






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 8 at 17:03









Basj

5,23028102218




5,23028102218












  • Your code is actually working. I don't see how the output is different from what you want.
    – ninesalt
    Nov 8 at 17:08










  • @ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
    – Basj
    Nov 8 at 17:10


















  • Your code is actually working. I don't see how the output is different from what you want.
    – ninesalt
    Nov 8 at 17:08










  • @ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
    – Basj
    Nov 8 at 17:10
















Your code is actually working. I don't see how the output is different from what you want.
– ninesalt
Nov 8 at 17:08




Your code is actually working. I don't see how the output is different from what you want.
– ninesalt
Nov 8 at 17:08












@ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
– Basj
Nov 8 at 17:10




@ninesalt I want to replace only the elements which are inside a BeginB...EndB block, not those which are in BeginA...EndA blocks.
– Basj
Nov 8 at 17:10












2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










Use two expressions:



block = re.compile(r'BeginB[sS]+?EndB')
element = re.compile(r'Element.*?b38b')

def repl(match):
return element.sub('Element ABC', match.group(0))

nstring = block.sub(repl, string)
print(nstring)


This yields



BeginA
Qwerty
Element 11 35
EndA

BeginB
Element ABC
...
Element ABC
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element ABC
EndB


See a demo on ideone.com.





Without re.compile (just to get the idea):



def repl(match):
return re.sub(r'Element.*?b38b', 'Element ABC', match.group(0))

print re.sub(r'BeginB[sS]+?EndB', repl, s)


The important idea here is the fact that re.sub's second parameter can be a function.





You could very well do it without a function but you'd need the newer regex module which supports G and K:



rx = re.compile(r'''
(?:G(?!A)|BeginB)
(?:(?!EndB)[sS])+?K
Element.+?b38b''', re.VERBOSE)

string = rx.sub('Element ABC', string)
print(string)


See another demo for this one on regex101.com as well.






share|improve this answer























  • Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
    – Basj
    Nov 8 at 22:19


















up vote
2
down vote













Try the following:



r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)'


You can test it here.



For your example, I would echo @Jan's answer and use two separate regular expressions:



import re

restrict = re.compile(r'(?s)(?<=BeginB).*?(?=EndB)')
pattern = re.compile(r'Elements+(d+)s+38')

def repl(block):

return pattern.sub('Element ABC', block.group(0))

out = restrict.sub(repl, s)





share|improve this answer























  • How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
    – Basj
    Nov 8 at 17:28












  • Ok thanks @rahlf23 !
    – Basj
    Nov 8 at 17:36










  • To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
    – rahlf23
    Nov 8 at 17:37










  • Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
    – Basj
    Nov 8 at 17:40













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53212715%2ffilter-elements-with-a-regex-only-if-they-are-in-a-certain-block%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










Use two expressions:



block = re.compile(r'BeginB[sS]+?EndB')
element = re.compile(r'Element.*?b38b')

def repl(match):
return element.sub('Element ABC', match.group(0))

nstring = block.sub(repl, string)
print(nstring)


This yields



BeginA
Qwerty
Element 11 35
EndA

BeginB
Element ABC
...
Element ABC
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element ABC
EndB


See a demo on ideone.com.





Without re.compile (just to get the idea):



def repl(match):
return re.sub(r'Element.*?b38b', 'Element ABC', match.group(0))

print re.sub(r'BeginB[sS]+?EndB', repl, s)


The important idea here is the fact that re.sub's second parameter can be a function.





You could very well do it without a function but you'd need the newer regex module which supports G and K:



rx = re.compile(r'''
(?:G(?!A)|BeginB)
(?:(?!EndB)[sS])+?K
Element.+?b38b''', re.VERBOSE)

string = rx.sub('Element ABC', string)
print(string)


See another demo for this one on regex101.com as well.






share|improve this answer























  • Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
    – Basj
    Nov 8 at 22:19















up vote
2
down vote



accepted










Use two expressions:



block = re.compile(r'BeginB[sS]+?EndB')
element = re.compile(r'Element.*?b38b')

def repl(match):
return element.sub('Element ABC', match.group(0))

nstring = block.sub(repl, string)
print(nstring)


This yields



BeginA
Qwerty
Element 11 35
EndA

BeginB
Element ABC
...
Element ABC
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element ABC
EndB


See a demo on ideone.com.





Without re.compile (just to get the idea):



def repl(match):
return re.sub(r'Element.*?b38b', 'Element ABC', match.group(0))

print re.sub(r'BeginB[sS]+?EndB', repl, s)


The important idea here is the fact that re.sub's second parameter can be a function.





You could very well do it without a function but you'd need the newer regex module which supports G and K:



rx = re.compile(r'''
(?:G(?!A)|BeginB)
(?:(?!EndB)[sS])+?K
Element.+?b38b''', re.VERBOSE)

string = rx.sub('Element ABC', string)
print(string)


See another demo for this one on regex101.com as well.






share|improve this answer























  • Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
    – Basj
    Nov 8 at 22:19













up vote
2
down vote



accepted







up vote
2
down vote



accepted






Use two expressions:



block = re.compile(r'BeginB[sS]+?EndB')
element = re.compile(r'Element.*?b38b')

def repl(match):
return element.sub('Element ABC', match.group(0))

nstring = block.sub(repl, string)
print(nstring)


This yields



BeginA
Qwerty
Element 11 35
EndA

BeginB
Element ABC
...
Element ABC
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element ABC
EndB


See a demo on ideone.com.





Without re.compile (just to get the idea):



def repl(match):
return re.sub(r'Element.*?b38b', 'Element ABC', match.group(0))

print re.sub(r'BeginB[sS]+?EndB', repl, s)


The important idea here is the fact that re.sub's second parameter can be a function.





You could very well do it without a function but you'd need the newer regex module which supports G and K:



rx = re.compile(r'''
(?:G(?!A)|BeginB)
(?:(?!EndB)[sS])+?K
Element.+?b38b''', re.VERBOSE)

string = rx.sub('Element ABC', string)
print(string)


See another demo for this one on regex101.com as well.






share|improve this answer














Use two expressions:



block = re.compile(r'BeginB[sS]+?EndB')
element = re.compile(r'Element.*?b38b')

def repl(match):
return element.sub('Element ABC', match.group(0))

nstring = block.sub(repl, string)
print(nstring)


This yields



BeginA
Qwerty
Element 11 35
EndA

BeginB
Element ABC
...
Element ABC
EndB

BeginA
Element 81132 38
SomethingElse
EndA

BeginB
Element 12 39
Element ABC
EndB


See a demo on ideone.com.





Without re.compile (just to get the idea):



def repl(match):
return re.sub(r'Element.*?b38b', 'Element ABC', match.group(0))

print re.sub(r'BeginB[sS]+?EndB', repl, s)


The important idea here is the fact that re.sub's second parameter can be a function.





You could very well do it without a function but you'd need the newer regex module which supports G and K:



rx = re.compile(r'''
(?:G(?!A)|BeginB)
(?:(?!EndB)[sS])+?K
Element.+?b38b''', re.VERBOSE)

string = rx.sub('Element ABC', string)
print(string)


See another demo for this one on regex101.com as well.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 9 at 7:42

























answered Nov 8 at 18:19









Jan

24k52347




24k52347












  • Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
    – Basj
    Nov 8 at 22:19


















  • Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
    – Basj
    Nov 8 at 22:19
















Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
– Basj
Nov 8 at 22:19




Wonderful, I forgot that we could use a function as the second parameter of re.sub! I edited your answer to add these details, I hope you don't mind @Jan.
– Basj
Nov 8 at 22:19












up vote
2
down vote













Try the following:



r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)'


You can test it here.



For your example, I would echo @Jan's answer and use two separate regular expressions:



import re

restrict = re.compile(r'(?s)(?<=BeginB).*?(?=EndB)')
pattern = re.compile(r'Elements+(d+)s+38')

def repl(block):

return pattern.sub('Element ABC', block.group(0))

out = restrict.sub(repl, s)





share|improve this answer























  • How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
    – Basj
    Nov 8 at 17:28












  • Ok thanks @rahlf23 !
    – Basj
    Nov 8 at 17:36










  • To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
    – rahlf23
    Nov 8 at 17:37










  • Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
    – Basj
    Nov 8 at 17:40

















up vote
2
down vote













Try the following:



r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)'


You can test it here.



For your example, I would echo @Jan's answer and use two separate regular expressions:



import re

restrict = re.compile(r'(?s)(?<=BeginB).*?(?=EndB)')
pattern = re.compile(r'Elements+(d+)s+38')

def repl(block):

return pattern.sub('Element ABC', block.group(0))

out = restrict.sub(repl, s)





share|improve this answer























  • How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
    – Basj
    Nov 8 at 17:28












  • Ok thanks @rahlf23 !
    – Basj
    Nov 8 at 17:36










  • To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
    – rahlf23
    Nov 8 at 17:37










  • Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
    – Basj
    Nov 8 at 17:40















up vote
2
down vote










up vote
2
down vote









Try the following:



r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)'


You can test it here.



For your example, I would echo @Jan's answer and use two separate regular expressions:



import re

restrict = re.compile(r'(?s)(?<=BeginB).*?(?=EndB)')
pattern = re.compile(r'Elements+(d+)s+38')

def repl(block):

return pattern.sub('Element ABC', block.group(0))

out = restrict.sub(repl, s)





share|improve this answer














Try the following:



r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)'


You can test it here.



For your example, I would echo @Jan's answer and use two separate regular expressions:



import re

restrict = re.compile(r'(?s)(?<=BeginB).*?(?=EndB)')
pattern = re.compile(r'Elements+(d+)s+38')

def repl(block):

return pattern.sub('Element ABC', block.group(0))

out = restrict.sub(repl, s)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 8 at 19:08

























answered Nov 8 at 17:24









rahlf23

4,8501629




4,8501629












  • How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
    – Basj
    Nov 8 at 17:28












  • Ok thanks @rahlf23 !
    – Basj
    Nov 8 at 17:36










  • To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
    – rahlf23
    Nov 8 at 17:37










  • Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
    – Basj
    Nov 8 at 17:40




















  • How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
    – Basj
    Nov 8 at 17:28












  • Ok thanks @rahlf23 !
    – Basj
    Nov 8 at 17:36










  • To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
    – rahlf23
    Nov 8 at 17:37










  • Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
    – Basj
    Nov 8 at 17:40


















How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
– Basj
Nov 8 at 17:28






How do you do the replace? s = re.sub(r'(?s)(?<=BeginB)s+Elements+(d+)s+d+.*?(?=EndB)', r'Element ABC', s) doesn't work directly, so I guess we should modify the replacement string r'Element ABC', how?
– Basj
Nov 8 at 17:28














Ok thanks @rahlf23 !
– Basj
Nov 8 at 17:36




Ok thanks @rahlf23 !
– Basj
Nov 8 at 17:36












To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
– rahlf23
Nov 8 at 17:37




To clarify, from your example, you would want to replace 12, 198 and 12 again correct?
– rahlf23
Nov 8 at 17:37












Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
– Basj
Nov 8 at 17:40






Yes indeed. All the lines inside a BeginB...EndB block which are of the form Element .... 38. IRL I have a few (but not many) BeginB...EndB blocks, thousands of elements inside them, and other blocks BeginA...EndA, BeginC...EndC, etc.
– Basj
Nov 8 at 17:40




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53212715%2ffilter-elements-with-a-regex-only-if-they-are-in-a-certain-block%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud