Python: Cleaning the data from the csv file that is mismatched
I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.
Input File is as follows:
Required Format:
I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:
import csv
import re
file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)
if re.search('LAT', values):
lat.append(row)
print(mode)
print(lat)
I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !
Thank You
Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H
Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.
python pandas csv
add a comment |
I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.
Input File is as follows:
Required Format:
I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:
import csv
import re
file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)
if re.search('LAT', values):
lat.append(row)
print(mode)
print(lat)
I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !
Thank You
Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H
Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.
python pandas csv
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59
add a comment |
I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.
Input File is as follows:
Required Format:
I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:
import csv
import re
file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)
if re.search('LAT', values):
lat.append(row)
print(mode)
print(lat)
I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !
Thank You
Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H
Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.
python pandas csv
I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.
Input File is as follows:
Required Format:
I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:
import csv
import re
file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)
if re.search('LAT', values):
lat.append(row)
print(mode)
print(lat)
I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !
Thank You
Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H
Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.
python pandas csv
python pandas csv
edited Nov 15 '18 at 7:13
P. Wania
asked Nov 15 '18 at 6:50
P. WaniaP. Wania
304
304
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59
add a comment |
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59
add a comment |
2 Answers
2
active
oldest
votes
try this,
import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()
Note: I have removed few rows from original file to make identical between your sample.
Input:
Output:
LAT LAT_MSO LOCAL_TIME LON LON_MSO MARS_SEASON_LS
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...
SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460
N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000
add a comment |
You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.
Example based on the file you provided:
data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')
This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do
data['ALTITUDE'][4]
Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.
Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313893%2fpython-cleaning-the-data-from-the-csv-file-that-is-mismatched%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
try this,
import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()
Note: I have removed few rows from original file to make identical between your sample.
Input:
Output:
LAT LAT_MSO LOCAL_TIME LON LON_MSO MARS_SEASON_LS
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...
SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460
N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000
add a comment |
try this,
import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()
Note: I have removed few rows from original file to make identical between your sample.
Input:
Output:
LAT LAT_MSO LOCAL_TIME LON LON_MSO MARS_SEASON_LS
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...
SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460
N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000
add a comment |
try this,
import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()
Note: I have removed few rows from original file to make identical between your sample.
Input:
Output:
LAT LAT_MSO LOCAL_TIME LON LON_MSO MARS_SEASON_LS
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...
SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460
N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000
try this,
import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()
Note: I have removed few rows from original file to make identical between your sample.
Input:
Output:
LAT LAT_MSO LOCAL_TIME LON LON_MSO MARS_SEASON_LS
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...
SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460
N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000
edited Nov 15 '18 at 10:26
answered Nov 15 '18 at 8:16
Mohamed Thasin ahMohamed Thasin ah
3,53331540
3,53331540
add a comment |
add a comment |
You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.
Example based on the file you provided:
data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')
This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do
data['ALTITUDE'][4]
Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.
Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
add a comment |
You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.
Example based on the file you provided:
data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')
This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do
data['ALTITUDE'][4]
Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.
Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
add a comment |
You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.
Example based on the file you provided:
data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')
This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do
data['ALTITUDE'][4]
Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.
Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.
You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.
Example based on the file you provided:
data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')
This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do
data['ALTITUDE'][4]
Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.
Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.
edited Nov 15 '18 at 7:45
answered Nov 15 '18 at 7:04
Patol75Patol75
6236
6236
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
add a comment |
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
Can you give an example of how that can be done ?
– P. Wania
Nov 15 '18 at 7:14
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
I have updated my answer, hope it helps.
– Patol75
Nov 15 '18 at 7:46
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313893%2fpython-cleaning-the-data-from-the-csv-file-that-is-mismatched%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is it visible now ?
– P. Wania
Nov 15 '18 at 6:54
Yes, but it's better not to share images. Rather share it like a code format.
– Mayank Porwal
Nov 15 '18 at 6:54
I tried sharing it as a code format when I submitted but nothing was visible so edited it to images
– P. Wania
Nov 15 '18 at 6:56
@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?
– Mohamed Thasin ah
Nov 15 '18 at 6:58
Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !
– P. Wania
Nov 15 '18 at 6:59