Python: Cleaning the data from the csv file that is mismatched












2















I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.



Input File is as follows:
enter image description here



Required Format: enter image description here



I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:



import csv
import re

file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)

if re.search('LAT', values):
lat.append(row)

print(mode)
print(lat)


I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !



Thank You



Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H



Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.










share|improve this question

























  • Is it visible now ?

    – P. Wania
    Nov 15 '18 at 6:54











  • Yes, but it's better not to share images. Rather share it like a code format.

    – Mayank Porwal
    Nov 15 '18 at 6:54











  • I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

    – P. Wania
    Nov 15 '18 at 6:56











  • @P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

    – Mohamed Thasin ah
    Nov 15 '18 at 6:58











  • Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

    – P. Wania
    Nov 15 '18 at 6:59
















2















I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.



Input File is as follows:
enter image description here



Required Format: enter image description here



I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:



import csv
import re

file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)

if re.search('LAT', values):
lat.append(row)

print(mode)
print(lat)


I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !



Thank You



Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H



Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.










share|improve this question

























  • Is it visible now ?

    – P. Wania
    Nov 15 '18 at 6:54











  • Yes, but it's better not to share images. Rather share it like a code format.

    – Mayank Porwal
    Nov 15 '18 at 6:54











  • I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

    – P. Wania
    Nov 15 '18 at 6:56











  • @P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

    – Mohamed Thasin ah
    Nov 15 '18 at 6:58











  • Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

    – P. Wania
    Nov 15 '18 at 6:59














2












2








2








I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.



Input File is as follows:
enter image description here



Required Format: enter image description here



I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:



import csv
import re

file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)

if re.search('LAT', values):
lat.append(row)

print(mode)
print(lat)


I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !



Thank You



Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H



Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.










share|improve this question
















I am new in programming. I am trying to clean the data from a csv file for a further project extension. The csv file that is given as an input is really messy and I need its particular portions only.



Input File is as follows:
enter image description here



Required Format: enter image description here



I am trying to extract the value for the 'OBSERVATION_MODE', 'LON' and 'LAT' so far but I am not sure how to append the later values.
This is what I have tried so far:



import csv
import re

file = csv.reader(open('1mvn_kp_iuvs_2018_01_r01.tab.csv','r'))
mode =
lat =
for row in file:
for values in row:
if 'OBSERVATION_MODE' in values:
print("n")
mode.append(row)

if re.search('LAT', values):
lat.append(row)

print(mode)
print(lat)


I am pretty sure the logic I am trying to work on is not at all useful. Can someone please give me a better overview of this ? I tried searching online too, but I found nothing to clean the data when the rows and columns both are mismatched. Any help is appreciated !



Thank You



Link to the inut csv file and expected output is https://drive.google.com/open?id=1LJxxbDcplSCPVWKnOC3usx7kZE8dS32H



Please note that the expected output 'Cleaned_sample.xlsx' is something I have manually generated and I want a similar output using python programming.







python pandas csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 7:13







P. Wania

















asked Nov 15 '18 at 6:50









P. WaniaP. Wania

304




304













  • Is it visible now ?

    – P. Wania
    Nov 15 '18 at 6:54











  • Yes, but it's better not to share images. Rather share it like a code format.

    – Mayank Porwal
    Nov 15 '18 at 6:54











  • I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

    – P. Wania
    Nov 15 '18 at 6:56











  • @P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

    – Mohamed Thasin ah
    Nov 15 '18 at 6:58











  • Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

    – P. Wania
    Nov 15 '18 at 6:59



















  • Is it visible now ?

    – P. Wania
    Nov 15 '18 at 6:54











  • Yes, but it's better not to share images. Rather share it like a code format.

    – Mayank Porwal
    Nov 15 '18 at 6:54











  • I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

    – P. Wania
    Nov 15 '18 at 6:56











  • @P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

    – Mohamed Thasin ah
    Nov 15 '18 at 6:58











  • Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

    – P. Wania
    Nov 15 '18 at 6:59

















Is it visible now ?

– P. Wania
Nov 15 '18 at 6:54





Is it visible now ?

– P. Wania
Nov 15 '18 at 6:54













Yes, but it's better not to share images. Rather share it like a code format.

– Mayank Porwal
Nov 15 '18 at 6:54





Yes, but it's better not to share images. Rather share it like a code format.

– Mayank Porwal
Nov 15 '18 at 6:54













I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

– P. Wania
Nov 15 '18 at 6:56





I tried sharing it as a code format when I submitted but nothing was visible so edited it to images

– P. Wania
Nov 15 '18 at 6:56













@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

– Mohamed Thasin ah
Nov 15 '18 at 6:58





@P.Saini - Do you want to remove Meta data in your file? I mean you want to keep the file only from Altitude, Co2, co2+.... ?

– Mohamed Thasin ah
Nov 15 '18 at 6:58













Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

– P. Wania
Nov 15 '18 at 6:59





Actually I want the values of LON, LAT and OBSERVATION MODE appended to the values of Altitude, CO0, CO2+ ,.... and so on !

– P. Wania
Nov 15 '18 at 6:59












2 Answers
2






active

oldest

votes


















0














try this,



import pandas as pd
df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
dic=df1.set_index(0)[2].to_dict()
for u,v in dic.items():
dic[u]=[v]
df1= pd.DataFrame(dic)
df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
df1 = pd.concat([df1]*len(df2),ignore_index=True)
df3=pd.concat([df1,df2],axis=1)
print df3.head()


Note: I have removed few rows from original file to make identical between your sample.



Input:



enter image description here



Output:



         LAT  LAT_MSO  LOCAL_TIME       LON  LON_MSO  MARS_SEASON_LS  
0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089

MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
0 1.630965 6330.0 203.680405 -17.815445 ...
1 1.630965 6330.0 203.680405 -17.815445 ...
2 1.630965 6330.0 203.680405 -17.815445 ...
3 1.630965 6330.0 203.680405 -17.815445 ...
4 1.630965 6330.0 203.680405 -17.815445 ...

SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
4 65.4571 71.790688 120 -9999999000 -9999999000 551467460

N2 C N H
0 -9999999000 -9999999000 -9999999000 -9999999000
1 -9999999000 -9999999000 -9999999000 -9999999000
2 -9999999000 -9999999000 -9999999000 -9999999000
3 -9999999000 -9999999000 -9999999000 -9999999000
4 710188930 -9999999000 -9999999000 -9999999000





share|improve this answer

































    0














    You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.



    Example based on the file you provided:



    data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')


    This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do



    data['ALTITUDE'][4]


    Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.



    Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.






    share|improve this answer


























    • Can you give an example of how that can be done ?

      – P. Wania
      Nov 15 '18 at 7:14











    • I have updated my answer, hope it helps.

      – Patol75
      Nov 15 '18 at 7:46











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313893%2fpython-cleaning-the-data-from-the-csv-file-that-is-mismatched%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    try this,



    import pandas as pd
    df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
    dic=df1.set_index(0)[2].to_dict()
    for u,v in dic.items():
    dic[u]=[v]
    df1= pd.DataFrame(dic)
    df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
    df1 = pd.concat([df1]*len(df2),ignore_index=True)
    df3=pd.concat([df1,df2],axis=1)
    print df3.head()


    Note: I have removed few rows from original file to make identical between your sample.



    Input:



    enter image description here



    Output:



             LAT  LAT_MSO  LOCAL_TIME       LON  LON_MSO  MARS_SEASON_LS  
    0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
    1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
    2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
    3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
    4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089

    MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
    0 1.630965 6330.0 203.680405 -17.815445 ...
    1 1.630965 6330.0 203.680405 -17.815445 ...
    2 1.630965 6330.0 203.680405 -17.815445 ...
    3 1.630965 6330.0 203.680405 -17.815445 ...
    4 1.630965 6330.0 203.680405 -17.815445 ...

    SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
    0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
    1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
    2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
    3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
    4 65.4571 71.790688 120 -9999999000 -9999999000 551467460

    N2 C N H
    0 -9999999000 -9999999000 -9999999000 -9999999000
    1 -9999999000 -9999999000 -9999999000 -9999999000
    2 -9999999000 -9999999000 -9999999000 -9999999000
    3 -9999999000 -9999999000 -9999999000 -9999999000
    4 710188930 -9999999000 -9999999000 -9999999000





    share|improve this answer






























      0














      try this,



      import pandas as pd
      df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
      dic=df1.set_index(0)[2].to_dict()
      for u,v in dic.items():
      dic[u]=[v]
      df1= pd.DataFrame(dic)
      df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
      df1 = pd.concat([df1]*len(df2),ignore_index=True)
      df3=pd.concat([df1,df2],axis=1)
      print df3.head()


      Note: I have removed few rows from original file to make identical between your sample.



      Input:



      enter image description here



      Output:



               LAT  LAT_MSO  LOCAL_TIME       LON  LON_MSO  MARS_SEASON_LS  
      0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
      1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
      2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
      3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
      4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089

      MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
      0 1.630965 6330.0 203.680405 -17.815445 ...
      1 1.630965 6330.0 203.680405 -17.815445 ...
      2 1.630965 6330.0 203.680405 -17.815445 ...
      3 1.630965 6330.0 203.680405 -17.815445 ...
      4 1.630965 6330.0 203.680405 -17.815445 ...

      SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
      0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
      1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
      2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
      3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
      4 65.4571 71.790688 120 -9999999000 -9999999000 551467460

      N2 C N H
      0 -9999999000 -9999999000 -9999999000 -9999999000
      1 -9999999000 -9999999000 -9999999000 -9999999000
      2 -9999999000 -9999999000 -9999999000 -9999999000
      3 -9999999000 -9999999000 -9999999000 -9999999000
      4 710188930 -9999999000 -9999999000 -9999999000





      share|improve this answer




























        0












        0








        0







        try this,



        import pandas as pd
        df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
        dic=df1.set_index(0)[2].to_dict()
        for u,v in dic.items():
        dic[u]=[v]
        df1= pd.DataFrame(dic)
        df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
        df1 = pd.concat([df1]*len(df2),ignore_index=True)
        df3=pd.concat([df1,df2],axis=1)
        print df3.head()


        Note: I have removed few rows from original file to make identical between your sample.



        Input:



        enter image description here



        Output:



                 LAT  LAT_MSO  LOCAL_TIME       LON  LON_MSO  MARS_SEASON_LS  
        0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089

        MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
        0 1.630965 6330.0 203.680405 -17.815445 ...
        1 1.630965 6330.0 203.680405 -17.815445 ...
        2 1.630965 6330.0 203.680405 -17.815445 ...
        3 1.630965 6330.0 203.680405 -17.815445 ...
        4 1.630965 6330.0 203.680405 -17.815445 ...

        SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
        0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
        1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
        2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
        3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
        4 65.4571 71.790688 120 -9999999000 -9999999000 551467460

        N2 C N H
        0 -9999999000 -9999999000 -9999999000 -9999999000
        1 -9999999000 -9999999000 -9999999000 -9999999000
        2 -9999999000 -9999999000 -9999999000 -9999999000
        3 -9999999000 -9999999000 -9999999000 -9999999000
        4 710188930 -9999999000 -9999999000 -9999999000





        share|improve this answer















        try this,



        import pandas as pd
        df1=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',header=None,nrows=18)
        dic=df1.set_index(0)[2].to_dict()
        for u,v in dic.items():
        dic[u]=[v]
        df1= pd.DataFrame(dic)
        df2=pd.read_csv('1mvn_kp_iuvs_2018_01_r01.tab (1).csv',skiprows=19)
        df1 = pd.concat([df1]*len(df2),ignore_index=True)
        df3=pd.concat([df1,df2],axis=1)
        print df3.head()


        Note: I have removed few rows from original file to make identical between your sample.



        Input:



        enter image description here



        Output:



                 LAT  LAT_MSO  LOCAL_TIME       LON  LON_MSO  MARS_SEASON_LS  
        0 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        1 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        2 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        3 -19.512522 NaN 8.083779 6.757075 NaN 108.81089
        4 -19.512522 NaN 8.083779 6.757075 NaN 108.81089

        MARS_SUN_DIST ORBIT_NUMBER SC_ALT SC_GEO_LAT ...
        0 1.630965 6330.0 203.680405 -17.815445 ...
        1 1.630965 6330.0 203.680405 -17.815445 ...
        2 1.630965 6330.0 203.680405 -17.815445 ...
        3 1.630965 6330.0 203.680405 -17.815445 ...
        4 1.630965 6330.0 203.680405 -17.815445 ...

        SUBSOL_GEO_LON SZA ALTITUDE CO2 CO2+ O
        0 65.4571 71.790688 80 -9999999000 -9999999000 -9999999000
        1 65.4571 71.790688 90 -9999999000 -9999999000 -9999999000
        2 65.4571 71.790688 100 -9999999000 -9999999000 -9999999000
        3 65.4571 71.790688 110 -9999999000 -9999999000 -9999999000
        4 65.4571 71.790688 120 -9999999000 -9999999000 551467460

        N2 C N H
        0 -9999999000 -9999999000 -9999999000 -9999999000
        1 -9999999000 -9999999000 -9999999000 -9999999000
        2 -9999999000 -9999999000 -9999999000 -9999999000
        3 -9999999000 -9999999000 -9999999000 -9999999000
        4 710188930 -9999999000 -9999999000 -9999999000






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 15 '18 at 10:26

























        answered Nov 15 '18 at 8:16









        Mohamed Thasin ahMohamed Thasin ah

        3,53331540




        3,53331540

























            0














            You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.



            Example based on the file you provided:



            data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')


            This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do



            data['ALTITUDE'][4]


            Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.



            Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.






            share|improve this answer


























            • Can you give an example of how that can be done ?

              – P. Wania
              Nov 15 '18 at 7:14











            • I have updated my answer, hope it helps.

              – Patol75
              Nov 15 '18 at 7:46
















            0














            You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.



            Example based on the file you provided:



            data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')


            This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do



            data['ALTITUDE'][4]


            Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.



            Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.






            share|improve this answer


























            • Can you give an example of how that can be done ?

              – P. Wania
              Nov 15 '18 at 7:14











            • I have updated my answer, hope it helps.

              – Patol75
              Nov 15 '18 at 7:46














            0












            0








            0







            You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.



            Example based on the file you provided:



            data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')


            This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do



            data['ALTITUDE'][4]


            Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.



            Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.






            share|improve this answer















            You should try to use the read_csv function from pandas. There are mutliple keywords such as header, skiprows or usecols that allow you to set where you data starts in the file, skip a number of rows, only use specific columns, etc... The returned object is similar to an array and you can easily access your data.



            Example based on the file you provided:



            data = pandas.read_csv(path_to_file, skiprows=44, skipfooter=378, engine='python', dtype='float')


            This call will read the first set of data that you have in your file. To access the fifth value in the ALTITUDE column, you can for example do



            data['ALTITUDE'][4]


            Then you would have to use a similar read_csv call with different values of skiprows and skipfooter to access the other sets of data. Once you have them all, a call to concatenate from numpy should allow you to have all your data as one array. Be careful with the headers.



            Note that lambda expressions can be used in skiprows, it may allow you to call read_csv() only once if you find a pattern that you can use to specify which rows you do not want.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 15 '18 at 7:45

























            answered Nov 15 '18 at 7:04









            Patol75Patol75

            6236




            6236













            • Can you give an example of how that can be done ?

              – P. Wania
              Nov 15 '18 at 7:14











            • I have updated my answer, hope it helps.

              – Patol75
              Nov 15 '18 at 7:46



















            • Can you give an example of how that can be done ?

              – P. Wania
              Nov 15 '18 at 7:14











            • I have updated my answer, hope it helps.

              – Patol75
              Nov 15 '18 at 7:46

















            Can you give an example of how that can be done ?

            – P. Wania
            Nov 15 '18 at 7:14





            Can you give an example of how that can be done ?

            – P. Wania
            Nov 15 '18 at 7:14













            I have updated my answer, hope it helps.

            – Patol75
            Nov 15 '18 at 7:46





            I have updated my answer, hope it helps.

            – Patol75
            Nov 15 '18 at 7:46


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313893%2fpython-cleaning-the-data-from-the-csv-file-that-is-mismatched%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Xamarin.form Move up view when keyboard appear

            Post-Redirect-Get with Spring WebFlux and Thymeleaf

            Anylogic : not able to use stopDelay()