Find missing filenames in sequence off numbers stored in a list
up vote
2
down vote
favorite
I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:
[..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv, file_20181105_140002.csv, file_20181105_150003.csv, file_20181105_160002.csv, file_20181105_170002.csv, file_20181105_200002.csv,
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]
So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.
I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.
python
add a comment |
up vote
2
down vote
favorite
I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:
[..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv, file_20181105_140002.csv, file_20181105_150003.csv, file_20181105_160002.csv, file_20181105_170002.csv, file_20181105_200002.csv,
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]
So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.
I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.
python
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:
[..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv, file_20181105_140002.csv, file_20181105_150003.csv, file_20181105_160002.csv, file_20181105_170002.csv, file_20181105_200002.csv,
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]
So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.
I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.
python
I have a string list of timestamp (date_millisecondtime.csv) based filenames like these:
[..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv, file_20181105_140002.csv, file_20181105_150003.csv, file_20181105_160002.csv, file_20181105_170002.csv, file_20181105_200002.csv,
file_20181105_210002.csv, file_20181106_010002.csv, file_20181106_020002.csv, file_20181106_030002.csv...]
So here files with date 2018-11-05 (Nov 5, 2018) with timestamp 11, 12, 13, 14, 15, 16, 17, 20 and 21.
I want to print only filenames 18 and 19 as they are missing. And the valid time range is from 1 - 23 so if hour in filenames are not present in this range for a given day (here its 2018-11-05), print those missing hours files.
python
python
edited Nov 9 at 19:36
asked Nov 9 at 19:11
Atihska
8701434
8701434
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20
add a comment |
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
add a comment |
up vote
0
down vote
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing =
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231948%2ffind-missing-filenames-in-sequence-off-numbers-stored-in-a-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
add a comment |
up vote
2
down vote
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
add a comment |
up vote
2
down vote
up vote
2
down vote
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:
L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']
present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}
min_time, max_time = min(present), max(present)
res = set(range(min_time, max_time)) - present # {18, 19}
You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].
answered Nov 9 at 19:22
jpp
88.6k195199
88.6k195199
add a comment |
add a comment |
up vote
0
down vote
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing =
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
add a comment |
up vote
0
down vote
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing =
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
add a comment |
up vote
0
down vote
up vote
0
down vote
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing =
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)
filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
for h in range(0, 23):
n = "file_20181105_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
print("Found", h)
pos += 1
else: print("Not found", h)
Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.
Edit:
If we want to check for more than one day, we can loop through the days checking its files/hours.
IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.
filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing =
for d in (4, 5):
for h in range(0, 23):
n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
if pos < len(filenames) and n == filenames[pos][: len(n)]:
pos += 1
print("Found", d, h)
else:
print("Not Found", d, h)
edited Nov 9 at 20:06
answered Nov 9 at 19:32
wendelbsilva
675516
675516
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
add a comment |
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Thank you for your reply. When there is more than 1 date in the same list, assume sorted, I will have another for loop on top of it? How will that work? Like in my example, I have dates Nov 5 and Nov 6
– Atihska
Nov 9 at 19:44
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
Yes, you will need to loop outside this for the dates you want to check. I will update the answer with the for loop outside.
– wendelbsilva
Nov 9 at 19:48
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231948%2ffind-missing-filenames-in-sequence-off-numbers-stored-in-a-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
One way to do it is to sequentially iterate through both of them (timestamp you want and filename) together. For that you will need to sort the list of filenames and have a (sorted) list of all timestamp you want. For the second input, you can pre-compute a list of interactively generate it. Afterwards, iterate through your list of Timestamp and check if the file exist. If the file exist, (do something) and move forward both the inputs. If doesnt exist a filename for that timestamp (do something when doesnt exist) and move forward Only the input with the timestamp.
– wendelbsilva
Nov 9 at 19:20