How to calculate columns that have circular dependency in pandas dataframe?
I have a pandas dataframe like this-
Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0
Now the columns Entry
,Exit
,ltpchange
and ltpcumchange
are interdependent as follows-
Entry
becomes "Buy" or "Sell" based on a condition depending on
other columns. Otherwise it will remain 0.
- Just when
Entry
becomes not equal to 0,ltpchange
starts taking changes in subsequent values ofLTP
. Otherwise it will
remain 0.
ltpcumchange
will take cumulative sum ofltpchange
.
- Just when
ltpcumchange
reaches a target value (any direction),Exit
will become 1.
Entry
will remain "Buy" or "Sell", depending on its previous row, untillExit
becomes 1 after which it will revert to 0.
I have used iterrows
() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.
I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?
python pandas dataframe
add a comment |
I have a pandas dataframe like this-
Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0
Now the columns Entry
,Exit
,ltpchange
and ltpcumchange
are interdependent as follows-
Entry
becomes "Buy" or "Sell" based on a condition depending on
other columns. Otherwise it will remain 0.
- Just when
Entry
becomes not equal to 0,ltpchange
starts taking changes in subsequent values ofLTP
. Otherwise it will
remain 0.
ltpcumchange
will take cumulative sum ofltpchange
.
- Just when
ltpcumchange
reaches a target value (any direction),Exit
will become 1.
Entry
will remain "Buy" or "Sell", depending on its previous row, untillExit
becomes 1 after which it will revert to 0.
I have used iterrows
() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.
I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?
python pandas dataframe
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
1
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.
– kon_u
Nov 16 '18 at 14:34
add a comment |
I have a pandas dataframe like this-
Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0
Now the columns Entry
,Exit
,ltpchange
and ltpcumchange
are interdependent as follows-
Entry
becomes "Buy" or "Sell" based on a condition depending on
other columns. Otherwise it will remain 0.
- Just when
Entry
becomes not equal to 0,ltpchange
starts taking changes in subsequent values ofLTP
. Otherwise it will
remain 0.
ltpcumchange
will take cumulative sum ofltpchange
.
- Just when
ltpcumchange
reaches a target value (any direction),Exit
will become 1.
Entry
will remain "Buy" or "Sell", depending on its previous row, untillExit
becomes 1 after which it will revert to 0.
I have used iterrows
() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.
I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?
python pandas dataframe
I have a pandas dataframe like this-
Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0
Now the columns Entry
,Exit
,ltpchange
and ltpcumchange
are interdependent as follows-
Entry
becomes "Buy" or "Sell" based on a condition depending on
other columns. Otherwise it will remain 0.
- Just when
Entry
becomes not equal to 0,ltpchange
starts taking changes in subsequent values ofLTP
. Otherwise it will
remain 0.
ltpcumchange
will take cumulative sum ofltpchange
.
- Just when
ltpcumchange
reaches a target value (any direction),Exit
will become 1.
Entry
will remain "Buy" or "Sell", depending on its previous row, untillExit
becomes 1 after which it will revert to 0.
I have used iterrows
() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.
I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?
python pandas dataframe
python pandas dataframe
asked Nov 16 '18 at 11:49
Sagar UpadhyaySagar Upadhyay
1163
1163
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
1
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.
– kon_u
Nov 16 '18 at 14:34
add a comment |
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
1
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.
– kon_u
Nov 16 '18 at 14:34
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
1
1
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.– kon_u
Nov 16 '18 at 14:34
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.– kon_u
Nov 16 '18 at 14:34
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337297%2fhow-to-calculate-columns-that-have-circular-dependency-in-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337297%2fhow-to-calculate-columns-that-have-circular-dependency-in-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example
– zipa
Nov 16 '18 at 12:04
1
I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.
– Charles R
Nov 16 '18 at 12:43
@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'
– Sagar Upadhyay
Nov 16 '18 at 13:20
Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice
– Charles R
Nov 16 '18 at 13:38
iterrows()
will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.– kon_u
Nov 16 '18 at 14:34