Recommended Algorithim for time based clustering
I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.
I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.
My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.
Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.
algorithm k-means data-science hierarchical-clustering
add a comment |
I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.
I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.
My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.
Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.
algorithm k-means data-science hierarchical-clustering
add a comment |
I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.
I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.
My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.
Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.
algorithm k-means data-science hierarchical-clustering
I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.
I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.
My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.
Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.
algorithm k-means data-science hierarchical-clustering
algorithm k-means data-science hierarchical-clustering
edited Nov 12 at 5:39
Kromster
4,77164886
4,77164886
asked Nov 11 at 22:25
mornindew
5363721
5363721
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I think you could get good results from a dynamic program. For each interval [i, j)
, let C(i, j)
be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r)
be the minimum loss for up to k
clusters of elements [0, r)
, we have equations
L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).
If there are O(1)
values of k
needed, evaluating these equations with memoization takes O(n^2)
time and O(n)
space where n
is the number of samples.
A plausible first choice for C(i, j)
would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3)
time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s
from its greatest value to its least, so the overall algorithm would still be O(n^2)
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53253857%2frecommended-algorithim-for-time-based-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think you could get good results from a dynamic program. For each interval [i, j)
, let C(i, j)
be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r)
be the minimum loss for up to k
clusters of elements [0, r)
, we have equations
L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).
If there are O(1)
values of k
needed, evaluating these equations with memoization takes O(n^2)
time and O(n)
space where n
is the number of samples.
A plausible first choice for C(i, j)
would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3)
time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s
from its greatest value to its least, so the overall algorithm would still be O(n^2)
.
add a comment |
I think you could get good results from a dynamic program. For each interval [i, j)
, let C(i, j)
be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r)
be the minimum loss for up to k
clusters of elements [0, r)
, we have equations
L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).
If there are O(1)
values of k
needed, evaluating these equations with memoization takes O(n^2)
time and O(n)
space where n
is the number of samples.
A plausible first choice for C(i, j)
would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3)
time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s
from its greatest value to its least, so the overall algorithm would still be O(n^2)
.
add a comment |
I think you could get good results from a dynamic program. For each interval [i, j)
, let C(i, j)
be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r)
be the minimum loss for up to k
clusters of elements [0, r)
, we have equations
L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).
If there are O(1)
values of k
needed, evaluating these equations with memoization takes O(n^2)
time and O(n)
space where n
is the number of samples.
A plausible first choice for C(i, j)
would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3)
time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s
from its greatest value to its least, so the overall algorithm would still be O(n^2)
.
I think you could get good results from a dynamic program. For each interval [i, j)
, let C(i, j)
be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r)
be the minimum loss for up to k
clusters of elements [0, r)
, we have equations
L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).
If there are O(1)
values of k
needed, evaluating these equations with memoization takes O(n^2)
time and O(n)
space where n
is the number of samples.
A plausible first choice for C(i, j)
would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3)
time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s
from its greatest value to its least, so the overall algorithm would still be O(n^2)
.
answered Nov 12 at 13:01
David Eisenstat
36.6k73373
36.6k73373
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53253857%2frecommended-algorithim-for-time-based-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown