Find missing filenames in sequence off numbers stored in a list











up vote
2
down vote

favorite












I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:



    [..., file_20181105_110001.csv, file_20181105_120002.csv,    file_20181105_130002.csv, file_20181105_140002.csv,    file_20181105_150003.csv, file_20181105_160002.csv,    file_20181105_170002.csv, file_20181105_200002.csv,    
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]


So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.



I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.










share|improve this question
























  • One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
    – wendelbsilva
    Nov 9 at 19:20















up vote
2
down vote

favorite












I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:



    [..., file_20181105_110001.csv, file_20181105_120002.csv,    file_20181105_130002.csv, file_20181105_140002.csv,    file_20181105_150003.csv, file_20181105_160002.csv,    file_20181105_170002.csv, file_20181105_200002.csv,    
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]


So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.



I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.










share|improve this question
























  • One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
    – wendelbsilva
    Nov 9 at 19:20













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:



    [..., file_20181105_110001.csv, file_20181105_120002.csv,    file_20181105_130002.csv, file_20181105_140002.csv,    file_20181105_150003.csv, file_20181105_160002.csv,    file_20181105_170002.csv, file_20181105_200002.csv,    
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]


So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.



I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.










share|improve this question















I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:



    [..., file_20181105_110001.csv, file_20181105_120002.csv,    file_20181105_130002.csv, file_20181105_140002.csv,    file_20181105_150003.csv, file_20181105_160002.csv,    file_20181105_170002.csv, file_20181105_200002.csv,    
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]


So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.



I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.







python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 19:36

























asked Nov 9 at 19:11









Atihska

8701434




8701434












  • One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
    – wendelbsilva
    Nov 9 at 19:20


















  • One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
    – wendelbsilva
    Nov 9 at 19:20
















One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20




One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20












2 Answers
2






active

oldest

votes

















up vote
2
down vote













One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:



L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

min_time, max_time = min(present), max(present)

res = set(range(min_time, max_time)) - present # {18, 19}


You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].






share|improve this answer




























    up vote
    0
    down vote













    Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)



    filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
    pos = 0
    for h in range(0, 23):
    n = "file_20181105_" + str(h).zfill(2)
    if pos < len(filenames) and n == filenames[pos][: len(n)]:
    print("Found", h)
    pos += 1
    else: print("Not found", h)


    Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.



    Edit:



    If we want to check for more than one day, we can loop through the days checking its files/hours.



    IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.



    filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
    pos = 0
    missing =
    for d in (4, 5):
    for h in range(0, 23):
    n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
    if pos < len(filenames) and n == filenames[pos][: len(n)]:
    pos += 1
    print("Found", d, h)
    else:
    print("Not Found", d, h)





    share|improve this answer























    • Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
      – Atihska
      Nov 9 at 19:44










    • Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
      – wendelbsilva
      Nov 9 at 19:48











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231948%2ffind-missing-filenames-in-sequence-off-numbers-stored-in-a-list%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:



    L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
    'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
    'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

    present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

    min_time, max_time = min(present), max(present)

    res = set(range(min_time, max_time)) - present # {18, 19}


    You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].






    share|improve this answer

























      up vote
      2
      down vote













      One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:



      L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
      'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
      'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

      present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

      min_time, max_time = min(present), max(present)

      res = set(range(min_time, max_time)) - present # {18, 19}


      You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].






      share|improve this answer























        up vote
        2
        down vote










        up vote
        2
        down vote









        One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:



        L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
        'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
        'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

        present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

        min_time, max_time = min(present), max(present)

        res = set(range(min_time, max_time)) - present # {18, 19}


        You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].






        share|improve this answer












        One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:



        L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
        'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
        'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

        present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

        min_time, max_time = min(present), max(present)

        res = set(range(min_time, max_time)) - present # {18, 19}


        You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 9 at 19:22









        jpp

        88.6k195199




        88.6k195199
























            up vote
            0
            down vote













            Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)



            filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            for h in range(0, 23):
            n = "file_20181105_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            print("Found", h)
            pos += 1
            else: print("Not found", h)


            Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.



            Edit:



            If we want to check for more than one day, we can loop through the days checking its files/hours.



            IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.



            filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            missing =
            for d in (4, 5):
            for h in range(0, 23):
            n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            pos += 1
            print("Found", d, h)
            else:
            print("Not Found", d, h)





            share|improve this answer























            • Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
              – Atihska
              Nov 9 at 19:44










            • Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
              – wendelbsilva
              Nov 9 at 19:48















            up vote
            0
            down vote













            Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)



            filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            for h in range(0, 23):
            n = "file_20181105_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            print("Found", h)
            pos += 1
            else: print("Not found", h)


            Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.



            Edit:



            If we want to check for more than one day, we can loop through the days checking its files/hours.



            IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.



            filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            missing =
            for d in (4, 5):
            for h in range(0, 23):
            n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            pos += 1
            print("Found", d, h)
            else:
            print("Not Found", d, h)





            share|improve this answer























            • Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
              – Atihska
              Nov 9 at 19:44










            • Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
              – wendelbsilva
              Nov 9 at 19:48













            up vote
            0
            down vote










            up vote
            0
            down vote









            Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)



            filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            for h in range(0, 23):
            n = "file_20181105_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            print("Found", h)
            pos += 1
            else: print("Not found", h)


            Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.



            Edit:



            If we want to check for more than one day, we can loop through the days checking its files/hours.



            IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.



            filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            missing =
            for d in (4, 5):
            for h in range(0, 23):
            n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            pos += 1
            print("Found", d, h)
            else:
            print("Not Found", d, h)





            share|improve this answer














            Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)



            filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            for h in range(0, 23):
            n = "file_20181105_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            print("Found", h)
            pos += 1
            else: print("Not found", h)


            Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.



            Edit:



            If we want to check for more than one day, we can loop through the days checking its files/hours.



            IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.



            filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
            pos = 0
            missing =
            for d in (4, 5):
            for h in range(0, 23):
            n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
            if pos < len(filenames) and n == filenames[pos][: len(n)]:
            pos += 1
            print("Found", d, h)
            else:
            print("Not Found", d, h)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 9 at 20:06

























            answered Nov 9 at 19:32









            wendelbsilva

            675516




            675516












            • Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
              – Atihska
              Nov 9 at 19:44










            • Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
              – wendelbsilva
              Nov 9 at 19:48


















            • Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
              – Atihska
              Nov 9 at 19:44










            • Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
              – wendelbsilva
              Nov 9 at 19:48
















            Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
            – Atihska
            Nov 9 at 19:44




            Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
            – Atihska
            Nov 9 at 19:44












            Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
            – wendelbsilva
            Nov 9 at 19:48




            Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
            – wendelbsilva
            Nov 9 at 19:48


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231948%2ffind-missing-filenames-in-sequence-off-numbers-stored-in-a-list%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Academy of Television Arts & Sciences

            L'Équipe

            1995 France bombings