reading json to pandas DataFrame but with thousands of rows to pandas append
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
add a comment |
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
add a comment |
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
I have an text file, where each line I have cleansed up to be of a json format. I can read each line, clean them, and convert them into a panda dataframe.
My problem is that I want to add/combine them all into one dataframe, but there are more than 200k lines.
I am reading each line in as 'd' = '{"test1":"test2","data":{"key":{"isin":"test3"},"creationTimeStamp":1541491884194,"signal":0,"hPreds":[0,0,0,0],"bidPrice":6.413000,"preferredBidSize":1,"offerPrice":6.415000,"preferredOfferSize":1,"averageTradeSize":1029,"averageTradePrice":0.065252,"changedValues":true,"test4":10,"snapshot":false}}'
Assume I am able to convert each line into a panda... is there a way to append each line into the panda dataframe, such that it is very fast. Right now, with >200k lines, it takes hours to append... reading the file itself takes less than 5 min...
file ='fileName.txt'
with open(file) as f:
content = f.readlines()
content = [x.strip() for x in content]
data = pd.DataFrame()
count = 0
for line in content:
line = line.replace('{"string1','')
z = line.splitlines()
z[0] = z[0][:-1]
z = pd.read_json('[%s]' % ','.join(z))
data = data.append(z)
json python-3.x pandas
json python-3.x pandas
edited Nov 20 '18 at 16:45
Kiann
asked Nov 19 '18 at 15:27
KiannKiann
1238
1238
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377807%2freading-json-to-pandas-dataframe-but-with-thousands-of-rows-to-pandas-append%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
add a comment |
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
add a comment |
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
You may check with Series
pd.Series(d)
Out[154]:
averageTradePrice 0.065
averageTradeSize 109
bidPrice 6.13
changedValues True
creationTimeStamp 15414994
Preds [0, 0, 0, 0]
key {'epic': 'XXX'}
dataLevel 10
offerPrice 3.333
dtype: object
Preds
and key
's value are list
and dict
, that is why when you pass it to DataFrame
it flag as :
ValueError: arrays must all be same length
Since you mention json
from pandas.io.json import json_normalize
json_normalize(d)
Out[157]:
Preds averageTradePrice ... key.epic offerPrice
0 [0, 0, 0, 0] 0.065 ... XXX 3.333
[1 rows x 9 columns]
answered Nov 19 '18 at 15:36
Wen-BenWen-Ben
110k83266
110k83266
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
add a comment |
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
thanks @W-B. I tried your solutions, and while it worked for the testing text-string I provided... unfortunately, it didn't seem to work for the (very long) actual text string I really do pull-in.
– Kiann
Nov 19 '18 at 16:22
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
to put some context, my file f is actually a text file of > 250k rows, and I am trying to read each line, convert each line into a pandaFrame, and then append it... I had some original code using pd.read_json(...); but it no longer works...
– Kiann
Nov 19 '18 at 16:23
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
@Kiann did you try json_normalize
– Wen-Ben
Nov 19 '18 at 16:34
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
stackoverflow.com/users/7964527/w-b; yes I did try json_normalize.. error message : AttributeError: 'str' object has no attribute 'values'
– Kiann
Nov 19 '18 at 16:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377807%2freading-json-to-pandas-dataframe-but-with-thousands-of-rows-to-pandas-append%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown