Python regular expression, ignoring characters until some charater is matched a number of times












3















i'm renaming a batch of files i downloaded from a torrent and wanted to get the episode's name,so i figured regex would do the trick. I'm kinda new to regex so I'd appreciate the help. This is what i could come up to:



i have a class related to other renaming functions so the function defined here is within this class, that initializes with the path to the files directory, the expression to rename to and the file extension.



im using glob to access all files with the extension ".mkv"



for debugging i printed out all the file names:



Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv




def strip_ep_name(self):
for i, f in enumerate(self.files):
f_list = f.split("\")
name, ext = os.path.splitext(f_list[-1])
ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')
print(ep_name)


for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.



and the output is:



r.Robot.S02E01.eps2.0_unm4sk-pt1.t
r.Robot.S02E02.eps2.0_unm4sk-pt2.t
r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks
r.Robot.S02E04.eps2.2_init_1.as
r.Robot.S02E05.eps2.3.logic-b0mb.h
r.Robot.S02E06.eps2.4.m4ster-s1ave.aes
r.Robot.S02E07.eps2.5_h4ndshake.sm
r.Robot.S02E08.eps2.6.succ3ss0r.p1
r.Robot.S02E09.eps2.7_init_5.fv
r.Robot.S02E10.eps2.8_h1dden-pr0cess.a
r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z
r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z


I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.



Now I don't know how to move on from here. can anyone help?










share|improve this question





























    3















    i'm renaming a batch of files i downloaded from a torrent and wanted to get the episode's name,so i figured regex would do the trick. I'm kinda new to regex so I'd appreciate the help. This is what i could come up to:



    i have a class related to other renaming functions so the function defined here is within this class, that initializes with the path to the files directory, the expression to rename to and the file extension.



    im using glob to access all files with the extension ".mkv"



    for debugging i printed out all the file names:



    Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
    Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv




    def strip_ep_name(self):
    for i, f in enumerate(self.files):
    f_list = f.split("\")
    name, ext = os.path.splitext(f_list[-1])
    ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')
    print(ep_name)


    for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.



    and the output is:



    r.Robot.S02E01.eps2.0_unm4sk-pt1.t
    r.Robot.S02E02.eps2.0_unm4sk-pt2.t
    r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks
    r.Robot.S02E04.eps2.2_init_1.as
    r.Robot.S02E05.eps2.3.logic-b0mb.h
    r.Robot.S02E06.eps2.4.m4ster-s1ave.aes
    r.Robot.S02E07.eps2.5_h4ndshake.sm
    r.Robot.S02E08.eps2.6.succ3ss0r.p1
    r.Robot.S02E09.eps2.7_init_5.fv
    r.Robot.S02E10.eps2.8_h1dden-pr0cess.a
    r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z
    r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z


    I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.



    Now I don't know how to move on from here. can anyone help?










    share|improve this question



























      3












      3








      3








      i'm renaming a batch of files i downloaded from a torrent and wanted to get the episode's name,so i figured regex would do the trick. I'm kinda new to regex so I'd appreciate the help. This is what i could come up to:



      i have a class related to other renaming functions so the function defined here is within this class, that initializes with the path to the files directory, the expression to rename to and the file extension.



      im using glob to access all files with the extension ".mkv"



      for debugging i printed out all the file names:



      Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv




      def strip_ep_name(self):
      for i, f in enumerate(self.files):
      f_list = f.split("\")
      name, ext = os.path.splitext(f_list[-1])
      ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')
      print(ep_name)


      for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.



      and the output is:



      r.Robot.S02E01.eps2.0_unm4sk-pt1.t
      r.Robot.S02E02.eps2.0_unm4sk-pt2.t
      r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks
      r.Robot.S02E04.eps2.2_init_1.as
      r.Robot.S02E05.eps2.3.logic-b0mb.h
      r.Robot.S02E06.eps2.4.m4ster-s1ave.aes
      r.Robot.S02E07.eps2.5_h4ndshake.sm
      r.Robot.S02E08.eps2.6.succ3ss0r.p1
      r.Robot.S02E09.eps2.7_init_5.fv
      r.Robot.S02E10.eps2.8_h1dden-pr0cess.a
      r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z
      r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z


      I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.



      Now I don't know how to move on from here. can anyone help?










      share|improve this question
















      i'm renaming a batch of files i downloaded from a torrent and wanted to get the episode's name,so i figured regex would do the trick. I'm kinda new to regex so I'd appreciate the help. This is what i could come up to:



      i have a class related to other renaming functions so the function defined here is within this class, that initializes with the path to the files directory, the expression to rename to and the file extension.



      im using glob to access all files with the extension ".mkv"



      for debugging i printed out all the file names:



      Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
      Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv




      def strip_ep_name(self):
      for i, f in enumerate(self.files):
      f_list = f.split("\")
      name, ext = os.path.splitext(f_list[-1])
      ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')
      print(ep_name)


      for me, the goal is to get the episode's name, either with or without the episode's number, because i can, later on, give the episode a new name.



      and the output is:



      r.Robot.S02E01.eps2.0_unm4sk-pt1.t
      r.Robot.S02E02.eps2.0_unm4sk-pt2.t
      r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks
      r.Robot.S02E04.eps2.2_init_1.as
      r.Robot.S02E05.eps2.3.logic-b0mb.h
      r.Robot.S02E06.eps2.4.m4ster-s1ave.aes
      r.Robot.S02E07.eps2.5_h4ndshake.sm
      r.Robot.S02E08.eps2.6.succ3ss0r.p1
      r.Robot.S02E09.eps2.7_init_5.fv
      r.Robot.S02E10.eps2.8_h1dden-pr0cess.a
      r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z
      r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z


      I wanted to strip all the ".eps2.2" before the episode's name, but they dont follow an order.



      Now I don't know how to move on from here. can anyone help?







      python regex regex-group






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 18 '18 at 21:00









      Jan

      24.8k52448




      24.8k52448










      asked Nov 18 '18 at 14:22









      Gustavo BarrosGustavo Barros

      485




      485
























          3 Answers
          3






          active

          oldest

          votes


















          1














          Do it all in one step:



          .epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$


          Broken down, this reads:



          .epsd+.d+ # ".eps", followed by digits, a dot and other digits
          [-_.] # one of -, _ or .
          (.+?) # anything else lazily afterwards
          (?:.720p.+) # until .720p is found (might need some tweaking)
          . # a dot
          (w+)$ # some word characters (aka the file extension) at the end


          This needs to be replaced by .1.2 to get your desired format in the end.




          Everything in Python:

          import re

          filenames = """
          Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
          Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
          """

          rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)

          filenames = rx.sub(r".1.2", filenames)
          print(filenames)


          Which yields



          Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
          Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
          Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
          Mr.Robot.S02E04.init_1.asec.mkv
          Mr.Robot.S02E05.logic-b0mb.hc.mkv
          Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
          Mr.Robot.S02E07.h4ndshake.sme.mkv
          Mr.Robot.S02E08.succ3ss0r.p12.mkv
          Mr.Robot.S02E09.init_5.fve.mkv
          Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
          Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
          Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv


          See a demo on regex101.com.






          share|improve this answer
























          • you're a god man!

            – Gustavo Barros
            Nov 20 '18 at 14:32



















          1














          Firstly import the regex module of Python:



          import re


          Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :



          ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)


          use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.



          Output will be like:




          r.Robot.S02E01.unm4sk-pt1.t







          share|improve this answer

































            0














            I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?



            for f in files:
            print(f[23:-35].split('.')[0])


            results in



            unm4sk-pt1
            unm4sk-pt2
            k3rnel-pan1c
            init_1
            logic-b0mb
            m4ster-s1ave
            h4ndshake
            succ3ss0r
            init_5
            h1dden-pr0cess
            pyth0n-pt1
            pyth0n-pt2




            Edit:



            I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:



            for f in files:
            print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

            Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
            Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
            Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
            Mr.Robot.S02E04.init_1.asec.mkv
            Mr.Robot.S02E05.logic-b0mb.hc.mkv
            Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
            Mr.Robot.S02E07.h4ndshake.sme.mkv
            Mr.Robot.S02E08.succ3ss0r.p12.mkv
            Mr.Robot.S02E09.init_5.fve.mkv
            Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
            Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
            Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv





            share|improve this answer


























            • yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

              – Gustavo Barros
              Nov 18 '18 at 15:34











            • Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

              – SpghttCd
              Nov 18 '18 at 16:31











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361906%2fpython-regular-expression-ignoring-characters-until-some-charater-is-matched-a%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Do it all in one step:



            .epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$


            Broken down, this reads:



            .epsd+.d+ # ".eps", followed by digits, a dot and other digits
            [-_.] # one of -, _ or .
            (.+?) # anything else lazily afterwards
            (?:.720p.+) # until .720p is found (might need some tweaking)
            . # a dot
            (w+)$ # some word characters (aka the file extension) at the end


            This needs to be replaced by .1.2 to get your desired format in the end.




            Everything in Python:

            import re

            filenames = """
            Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            """

            rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)

            filenames = rx.sub(r".1.2", filenames)
            print(filenames)


            Which yields



            Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
            Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
            Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
            Mr.Robot.S02E04.init_1.asec.mkv
            Mr.Robot.S02E05.logic-b0mb.hc.mkv
            Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
            Mr.Robot.S02E07.h4ndshake.sme.mkv
            Mr.Robot.S02E08.succ3ss0r.p12.mkv
            Mr.Robot.S02E09.init_5.fve.mkv
            Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
            Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
            Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv


            See a demo on regex101.com.






            share|improve this answer
























            • you're a god man!

              – Gustavo Barros
              Nov 20 '18 at 14:32
















            1














            Do it all in one step:



            .epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$


            Broken down, this reads:



            .epsd+.d+ # ".eps", followed by digits, a dot and other digits
            [-_.] # one of -, _ or .
            (.+?) # anything else lazily afterwards
            (?:.720p.+) # until .720p is found (might need some tweaking)
            . # a dot
            (w+)$ # some word characters (aka the file extension) at the end


            This needs to be replaced by .1.2 to get your desired format in the end.




            Everything in Python:

            import re

            filenames = """
            Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            """

            rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)

            filenames = rx.sub(r".1.2", filenames)
            print(filenames)


            Which yields



            Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
            Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
            Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
            Mr.Robot.S02E04.init_1.asec.mkv
            Mr.Robot.S02E05.logic-b0mb.hc.mkv
            Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
            Mr.Robot.S02E07.h4ndshake.sme.mkv
            Mr.Robot.S02E08.succ3ss0r.p12.mkv
            Mr.Robot.S02E09.init_5.fve.mkv
            Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
            Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
            Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv


            See a demo on regex101.com.






            share|improve this answer
























            • you're a god man!

              – Gustavo Barros
              Nov 20 '18 at 14:32














            1












            1








            1







            Do it all in one step:



            .epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$


            Broken down, this reads:



            .epsd+.d+ # ".eps", followed by digits, a dot and other digits
            [-_.] # one of -, _ or .
            (.+?) # anything else lazily afterwards
            (?:.720p.+) # until .720p is found (might need some tweaking)
            . # a dot
            (w+)$ # some word characters (aka the file extension) at the end


            This needs to be replaced by .1.2 to get your desired format in the end.




            Everything in Python:

            import re

            filenames = """
            Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            """

            rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)

            filenames = rx.sub(r".1.2", filenames)
            print(filenames)


            Which yields



            Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
            Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
            Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
            Mr.Robot.S02E04.init_1.asec.mkv
            Mr.Robot.S02E05.logic-b0mb.hc.mkv
            Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
            Mr.Robot.S02E07.h4ndshake.sme.mkv
            Mr.Robot.S02E08.succ3ss0r.p12.mkv
            Mr.Robot.S02E09.init_5.fve.mkv
            Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
            Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
            Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv


            See a demo on regex101.com.






            share|improve this answer













            Do it all in one step:



            .epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$


            Broken down, this reads:



            .epsd+.d+ # ".eps", followed by digits, a dot and other digits
            [-_.] # one of -, _ or .
            (.+?) # anything else lazily afterwards
            (?:.720p.+) # until .720p is found (might need some tweaking)
            . # a dot
            (w+)$ # some word characters (aka the file extension) at the end


            This needs to be replaced by .1.2 to get your desired format in the end.




            Everything in Python:

            import re

            filenames = """
            Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
            """

            rx = re.compile(r'.epsd+.d+[-_.](.+?)(?:.720p.+).(w+)$', re.M)

            filenames = rx.sub(r".1.2", filenames)
            print(filenames)


            Which yields



            Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
            Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
            Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
            Mr.Robot.S02E04.init_1.asec.mkv
            Mr.Robot.S02E05.logic-b0mb.hc.mkv
            Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
            Mr.Robot.S02E07.h4ndshake.sme.mkv
            Mr.Robot.S02E08.succ3ss0r.p12.mkv
            Mr.Robot.S02E09.init_5.fve.mkv
            Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
            Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
            Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv


            See a demo on regex101.com.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 18 '18 at 21:00









            JanJan

            24.8k52448




            24.8k52448













            • you're a god man!

              – Gustavo Barros
              Nov 20 '18 at 14:32



















            • you're a god man!

              – Gustavo Barros
              Nov 20 '18 at 14:32

















            you're a god man!

            – Gustavo Barros
            Nov 20 '18 at 14:32





            you're a god man!

            – Gustavo Barros
            Nov 20 '18 at 14:32













            1














            Firstly import the regex module of Python:



            import re


            Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :



            ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)


            use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.



            Output will be like:




            r.Robot.S02E01.unm4sk-pt1.t







            share|improve this answer






























              1














              Firstly import the regex module of Python:



              import re


              Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :



              ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)


              use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.



              Output will be like:




              r.Robot.S02E01.unm4sk-pt1.t







              share|improve this answer




























                1












                1








                1







                Firstly import the regex module of Python:



                import re


                Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :



                ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)


                use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.



                Output will be like:




                r.Robot.S02E01.unm4sk-pt1.t







                share|improve this answer















                Firstly import the regex module of Python:



                import re


                Then use this to replace from "r.Robot.S02E01.eps2.0_unm4sk-pt1.t" :



                ep_name = re.sub(r"eps2.d{1,2}(.|_)","",episode_name)


                use ep_name in loop and pass episode name to episode_name one by one and then print ep_name.



                Output will be like:




                r.Robot.S02E01.unm4sk-pt1.t








                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 18 '18 at 18:04









                user6910411

                33.9k979101




                33.9k979101










                answered Nov 18 '18 at 15:14









                Nagar16Nagar16

                112




                112























                    0














                    I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?



                    for f in files:
                    print(f[23:-35].split('.')[0])


                    results in



                    unm4sk-pt1
                    unm4sk-pt2
                    k3rnel-pan1c
                    init_1
                    logic-b0mb
                    m4ster-s1ave
                    h4ndshake
                    succ3ss0r
                    init_5
                    h1dden-pr0cess
                    pyth0n-pt1
                    pyth0n-pt2




                    Edit:



                    I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:



                    for f in files:
                    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

                    Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
                    Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
                    Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
                    Mr.Robot.S02E04.init_1.asec.mkv
                    Mr.Robot.S02E05.logic-b0mb.hc.mkv
                    Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
                    Mr.Robot.S02E07.h4ndshake.sme.mkv
                    Mr.Robot.S02E08.succ3ss0r.p12.mkv
                    Mr.Robot.S02E09.init_5.fve.mkv
                    Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
                    Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
                    Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv





                    share|improve this answer


























                    • yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                      – Gustavo Barros
                      Nov 18 '18 at 15:34











                    • Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                      – SpghttCd
                      Nov 18 '18 at 16:31
















                    0














                    I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?



                    for f in files:
                    print(f[23:-35].split('.')[0])


                    results in



                    unm4sk-pt1
                    unm4sk-pt2
                    k3rnel-pan1c
                    init_1
                    logic-b0mb
                    m4ster-s1ave
                    h4ndshake
                    succ3ss0r
                    init_5
                    h1dden-pr0cess
                    pyth0n-pt1
                    pyth0n-pt2




                    Edit:



                    I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:



                    for f in files:
                    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

                    Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
                    Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
                    Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
                    Mr.Robot.S02E04.init_1.asec.mkv
                    Mr.Robot.S02E05.logic-b0mb.hc.mkv
                    Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
                    Mr.Robot.S02E07.h4ndshake.sme.mkv
                    Mr.Robot.S02E08.succ3ss0r.p12.mkv
                    Mr.Robot.S02E09.init_5.fve.mkv
                    Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
                    Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
                    Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv





                    share|improve this answer


























                    • yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                      – Gustavo Barros
                      Nov 18 '18 at 15:34











                    • Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                      – SpghttCd
                      Nov 18 '18 at 16:31














                    0












                    0








                    0







                    I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?



                    for f in files:
                    print(f[23:-35].split('.')[0])


                    results in



                    unm4sk-pt1
                    unm4sk-pt2
                    k3rnel-pan1c
                    init_1
                    logic-b0mb
                    m4ster-s1ave
                    h4ndshake
                    succ3ss0r
                    init_5
                    h1dden-pr0cess
                    pyth0n-pt1
                    pyth0n-pt2




                    Edit:



                    I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:



                    for f in files:
                    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

                    Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
                    Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
                    Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
                    Mr.Robot.S02E04.init_1.asec.mkv
                    Mr.Robot.S02E05.logic-b0mb.hc.mkv
                    Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
                    Mr.Robot.S02E07.h4ndshake.sme.mkv
                    Mr.Robot.S02E08.succ3ss0r.p12.mkv
                    Mr.Robot.S02E09.init_5.fve.mkv
                    Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
                    Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
                    Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv





                    share|improve this answer















                    I'm not sure if I understand correctly, I don't know the series hence nor do I the titles. But do you really need re?



                    for f in files:
                    print(f[23:-35].split('.')[0])


                    results in



                    unm4sk-pt1
                    unm4sk-pt2
                    k3rnel-pan1c
                    init_1
                    logic-b0mb
                    m4ster-s1ave
                    h4ndshake
                    succ3ss0r
                    init_5
                    h1dden-pr0cess
                    pyth0n-pt1
                    pyth0n-pt2




                    Edit:



                    I still don't see an actual target format definition in your post, but just in case that @Jan is right, here's the re-less solution for that, too:



                    for f in files:
                    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

                    Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
                    Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
                    Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
                    Mr.Robot.S02E04.init_1.asec.mkv
                    Mr.Robot.S02E05.logic-b0mb.hc.mkv
                    Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
                    Mr.Robot.S02E07.h4ndshake.sme.mkv
                    Mr.Robot.S02E08.succ3ss0r.p12.mkv
                    Mr.Robot.S02E09.init_5.fve.mkv
                    Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
                    Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
                    Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 18 '18 at 23:06

























                    answered Nov 18 '18 at 15:22









                    SpghttCdSpghttCd

                    4,5072313




                    4,5072313













                    • yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                      – Gustavo Barros
                      Nov 18 '18 at 15:34











                    • Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                      – SpghttCd
                      Nov 18 '18 at 16:31



















                    • yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                      – Gustavo Barros
                      Nov 18 '18 at 15:34











                    • Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                      – SpghttCd
                      Nov 18 '18 at 16:31

















                    yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                    – Gustavo Barros
                    Nov 18 '18 at 15:34





                    yes, because the titles are mixed with the episodes number, and some of them have "." before the title and others don't. But your answear alongside the other one did manage to do it!

                    – Gustavo Barros
                    Nov 18 '18 at 15:34













                    Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                    – SpghttCd
                    Nov 18 '18 at 16:31





                    Sorry, I still don't get it. Is your list of filenames representative? So yes, there are some with a period before and others without. But afaics in any case you'll end up with a string directly starting with an episode name simply by cutting the first 23 characters away.

                    – SpghttCd
                    Nov 18 '18 at 16:31


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361906%2fpython-regular-expression-ignoring-characters-until-some-charater-is-matched-a%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Hercules Kyvelos

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud