Python regular expression, ignoring characters until some charater is matched a number of times

i'm renaming a batch of files i downloaded from a torrent and wanted to get the episode's name,so i figured regex would do the trick. I'm kinda new to regex so I'd appreciate the help. This is what i could come up to:

i have a class related to other renaming functions so the function defined here is within this class, that initializes with the path to the files directory, the expression to rename to and the file extension.

im using glob to access all files with the extension ".mkv"

for debugging i printed out all the file names:

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv









def strip_ep_name(self):

    for i, f in enumerate(self.files):

        f_list = f.split("\")

        name, ext = os.path.splitext(f_list[-1])

        ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')

        print(ep_name)

for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.

and the output is:

r.Robot.S02E01.eps2.0_unm4sk-pt1.t

r.Robot.S02E02.eps2.0_unm4sk-pt2.t

r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks

r.Robot.S02E04.eps2.2_init_1.as

r.Robot.S02E05.eps2.3.logic-b0mb.h

r.Robot.S02E06.eps2.4.m4ster-s1ave.aes

r.Robot.S02E07.eps2.5_h4ndshake.sm

r.Robot.S02E08.eps2.6.succ3ss0r.p1

r.Robot.S02E09.eps2.7_init_5.fv

r.Robot.S02E10.eps2.8_h1dden-pr0cess.a

r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z

r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z

I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.

Now I don't know how to move on from here. can anyone help?

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

add a comment |

im using glob to access all files with the extension ".mkv"

for debugging i printed out all the file names:

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv









def strip_ep_name(self):

    for i, f in enumerate(self.files):

        f_list = f.split("\")

        name, ext = os.path.splitext(f_list[-1])

        ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')

        print(ep_name)

for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.

and the output is:

r.Robot.S02E01.eps2.0_unm4sk-pt1.t

r.Robot.S02E02.eps2.0_unm4sk-pt2.t

r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks

r.Robot.S02E04.eps2.2_init_1.as

r.Robot.S02E05.eps2.3.logic-b0mb.h

r.Robot.S02E06.eps2.4.m4ster-s1ave.aes

r.Robot.S02E07.eps2.5_h4ndshake.sm

r.Robot.S02E08.eps2.6.succ3ss0r.p1

r.Robot.S02E09.eps2.7_init_5.fv

r.Robot.S02E10.eps2.8_h1dden-pr0cess.a

r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z

r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z

I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.

Now I don't know how to move on from here. can anyone help?

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

add a comment |

im using glob to access all files with the extension ".mkv"

for debugging i printed out all the file names:

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv









def strip_ep_name(self):

    for i, f in enumerate(self.files):

        f_list = f.split("\")

        name, ext = os.path.splitext(f_list[-1])

        ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')

        print(ep_name)

for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.

and the output is:

r.Robot.S02E01.eps2.0_unm4sk-pt1.t

r.Robot.S02E02.eps2.0_unm4sk-pt2.t

r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks

r.Robot.S02E04.eps2.2_init_1.as

r.Robot.S02E05.eps2.3.logic-b0mb.h

r.Robot.S02E06.eps2.4.m4ster-s1ave.aes

r.Robot.S02E07.eps2.5_h4ndshake.sm

r.Robot.S02E08.eps2.6.succ3ss0r.p1

r.Robot.S02E09.eps2.7_init_5.fv

r.Robot.S02E10.eps2.8_h1dden-pr0cess.a

r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z

r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z

I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.

Now I don't know how to move on from here. can anyone help?

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

im using glob to access all files with the extension ".mkv"

for debugging i printed out all the file names:

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv









def strip_ep_name(self):

    for i, f in enumerate(self.files):

        f_list = f.split("\")

        name, ext = os.path.splitext(f_list[-1])

        ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')

        print(ep_name)

for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.

and the output is:

r.Robot.S02E01.eps2.0_unm4sk-pt1.t

r.Robot.S02E02.eps2.0_unm4sk-pt2.t

r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks

r.Robot.S02E04.eps2.2_init_1.as

r.Robot.S02E05.eps2.3.logic-b0mb.h

r.Robot.S02E06.eps2.4.m4ster-s1ave.aes

r.Robot.S02E07.eps2.5_h4ndshake.sm

r.Robot.S02E08.eps2.6.succ3ss0r.p1

r.Robot.S02E09.eps2.7_init_5.fv

r.Robot.S02E10.eps2.8_h1dden-pr0cess.a

r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z

r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z

I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.

Now I don't know how to move on from here. can anyone help?

python regex regex-group

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

edited Nov 18 '18 at 21:00

Jan

24.8k52448

edited Nov 18 '18 at 21:00

Jan

24.8k52448

edited Nov 18 '18 at 21:00

Jan

24.8k52448

asked Nov 18 '18 at 14:22

Gustavo Barros

485

asked Nov 18 '18 at 14:22

Gustavo Barros

485

asked Nov 18 '18 at 14:22

Gustavo Barros

485

add a comment |

3 Answers
3

active

oldest

votes

Do it all in one step:

.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$

Broken down, this reads:

.epsd+.d+ # ".eps", followed by digits, a dot and other digits

[-_.]         # one of -, _ or .

(.+?)         # anything else lazily afterwards

(?:.720p.+)  # until .720p is found (might need some tweaking)

.            # a dot

(w+)$        # some word characters (aka the file extension) at the end

This needs to be replaced by .1.2 to get your desired format in the end.

Everything in Python:

import re



filenames = """

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

"""



rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)



filenames = rx.sub(r".1.2", filenames)

print(filenames)

Which yields

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

See a demo on regex101.com.

answered Nov 18 '18 at 21:00

Jan

24.8k52448

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

add a comment |

Firstly import the regex module of Python:

import re

Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :

ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)

use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.

Output will be like:

r.Robot.S02E01.unm4sk-pt1.t

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

add a comment |

I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?

for f in files:

    print(f[23:-35].split('.')[0])

results in

unm4sk-pt1

unm4sk-pt2

k3rnel-pan1c                                                

init_1                                                      

logic-b0mb                                                  

m4ster-s1ave                                                

h4ndshake                                                   

succ3ss0r                                                  

init_5                                                      

h1dden-pr0cess                                              

pyth0n-pt1                                                  

pyth0n-pt2

Edit:

I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:

for f in files:

    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')



Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361906%2fpython-regular-expression-ignoring-characters-until-some-charater-is-matched-a%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Do it all in one step:

.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$

Broken down, this reads:

.epsd+.d+ # ".eps", followed by digits, a dot and other digits

[-_.]         # one of -, _ or .

(.+?)         # anything else lazily afterwards

(?:.720p.+)  # until .720p is found (might need some tweaking)

.            # a dot

(w+)$        # some word characters (aka the file extension) at the end

This needs to be replaced by .1.2 to get your desired format in the end.

Everything in Python:

import re



filenames = """

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

"""



rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)



filenames = rx.sub(r".1.2", filenames)

print(filenames)

Which yields

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

See a demo on regex101.com.

answered Nov 18 '18 at 21:00

Jan

24.8k52448

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

add a comment |

Do it all in one step:

.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$

Broken down, this reads:

.epsd+.d+ # ".eps", followed by digits, a dot and other digits

[-_.]         # one of -, _ or .

(.+?)         # anything else lazily afterwards

(?:.720p.+)  # until .720p is found (might need some tweaking)

.            # a dot

(w+)$        # some word characters (aka the file extension) at the end

This needs to be replaced by .1.2 to get your desired format in the end.

Everything in Python:

import re



filenames = """

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

"""



rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)



filenames = rx.sub(r".1.2", filenames)

print(filenames)

Which yields

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

See a demo on regex101.com.

answered Nov 18 '18 at 21:00

Jan

24.8k52448

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

add a comment |

Do it all in one step:

.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$

Broken down, this reads:

.epsd+.d+ # ".eps", followed by digits, a dot and other digits

[-_.]         # one of -, _ or .

(.+?)         # anything else lazily afterwards

(?:.720p.+)  # until .720p is found (might need some tweaking)

.            # a dot

(w+)$        # some word characters (aka the file extension) at the end

This needs to be replaced by .1.2 to get your desired format in the end.

Everything in Python:

import re



filenames = """

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

"""



rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)



filenames = rx.sub(r".1.2", filenames)

print(filenames)

Which yields

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

See a demo on regex101.com.

answered Nov 18 '18 at 21:00

Jan

24.8k52448

Do it all in one step:

.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$

Broken down, this reads:

.epsd+.d+ # ".eps", followed by digits, a dot and other digits

[-_.]         # one of -, _ or .

(.+?)         # anything else lazily afterwards

(?:.720p.+)  # until .720p is found (might need some tweaking)

.            # a dot

(w+)$        # some word characters (aka the file extension) at the end

This needs to be replaced by .1.2 to get your desired format in the end.

Everything in Python:

import re



filenames = """

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv

"""



rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)



filenames = rx.sub(r".1.2", filenames)

print(filenames)

Which yields

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

See a demo on regex101.com.

answered Nov 18 '18 at 21:00

Jan

24.8k52448

answered Nov 18 '18 at 21:00

Jan

24.8k52448

answered Nov 18 '18 at 21:00

Jan

24.8k52448

answered Nov 18 '18 at 21:00

Jan

24.8k52448

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

add a comment |

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

you're a god man!

– Gustavo Barros
Nov 20 '18 at 14:32

add a comment |

Firstly import the regex module of Python:

import re

Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :

ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)

use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.

Output will be like:

r.Robot.S02E01.unm4sk-pt1.t

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

add a comment |

Firstly import the regex module of Python:

import re

Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :

ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)

use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.

Output will be like:

r.Robot.S02E01.unm4sk-pt1.t

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

add a comment |

Firstly import the regex module of Python:

import re

Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :

ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)

use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.

Output will be like:

r.Robot.S02E01.unm4sk-pt1.t

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

Firstly import the regex module of Python:

import re

Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :

ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)

use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.

Output will be like:

r.Robot.S02E01.unm4sk-pt1.t

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

edited Nov 18 '18 at 18:04

user6910411

33.9k979101

answered Nov 18 '18 at 15:14

Nagar16

112

answered Nov 18 '18 at 15:14

Nagar16

112

answered Nov 18 '18 at 15:14

Nagar16

112

add a comment |

I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?

for f in files:

    print(f[23:-35].split('.')[0])

results in

unm4sk-pt1

unm4sk-pt2

k3rnel-pan1c                                                

init_1                                                      

logic-b0mb                                                  

m4ster-s1ave                                                

h4ndshake                                                   

succ3ss0r                                                  

init_5                                                      

h1dden-pr0cess                                              

pyth0n-pt1                                                  

pyth0n-pt2

Edit:

I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:

for f in files:

    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')



Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

add a comment |

I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?

for f in files:

    print(f[23:-35].split('.')[0])

results in

unm4sk-pt1

unm4sk-pt2

k3rnel-pan1c                                                

init_1                                                      

logic-b0mb                                                  

m4ster-s1ave                                                

h4ndshake                                                   

succ3ss0r                                                  

init_5                                                      

h1dden-pr0cess                                              

pyth0n-pt1                                                  

pyth0n-pt2

Edit:

I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:

for f in files:

    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')



Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

add a comment |

I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?

for f in files:

    print(f[23:-35].split('.')[0])

results in

unm4sk-pt1

unm4sk-pt2

k3rnel-pan1c                                                

init_1                                                      

logic-b0mb                                                  

m4ster-s1ave                                                

h4ndshake                                                   

succ3ss0r                                                  

init_5                                                      

h1dden-pr0cess                                              

pyth0n-pt1                                                  

pyth0n-pt2

Edit:

I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:

for f in files:

    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')



Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?

for f in files:

    print(f[23:-35].split('.')[0])

results in

unm4sk-pt1

unm4sk-pt2

k3rnel-pan1c                                                

init_1                                                      

logic-b0mb                                                  

m4ster-s1ave                                                

h4ndshake                                                   

succ3ss0r                                                  

init_5                                                      

h1dden-pr0cess                                              

pyth0n-pt1                                                  

pyth0n-pt2

Edit:

I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:

for f in files:

    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')



Mr.Robot.S02E01.unm4sk-pt1.tc.mkv

Mr.Robot.S02E02.unm4sk-pt2.tc.mkv

Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv

Mr.Robot.S02E04.init_1.asec.mkv

Mr.Robot.S02E05.logic-b0mb.hc.mkv

Mr.Robot.S02E06.m4ster-s1ave.aes.mkv

Mr.Robot.S02E07.h4ndshake.sme.mkv

Mr.Robot.S02E08.succ3ss0r.p12.mkv

Mr.Robot.S02E09.init_5.fve.mkv

Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv

Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv

Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

edited Nov 18 '18 at 23:06

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

answered Nov 18 '18 at 15:22

SpghttCd

4,5072313

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

add a comment |

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

– Gustavo Barros
Nov 18 '18 at 15:34

Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

– SpghttCd
Nov 18 '18 at 16:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk