Python: Pandas Module - Nested IF Statement that Fills in NaN (Empty Values) in a Dataframe











up vote
0
down vote

favorite












I've created a function that tests multiple IF statements given the data in the 'Name' column.



Criteria 1: If 'Name' is blank, return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'.



Criteria 2: If 'Name' == 'GENERAL', return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'



Else: Return the 'Name'



def account_name(row):
if row['Name'] == None and row['Secondary_Name'] == None:
return row['Third_Name']

elif row['Name'] == 'GENERAL':
if row['Secondary_Name'] == None:
return row['Third_Name']

else:
return row['Name']


I've tried == None, == np.NaN, == Null, .isnull(), == '', == '0'.



Nothing seems to replace the empty values to what I want.



Edit:



Example of DF










share|improve this question
























  • can you provide us with a sample df?
    – wpercy
    Nov 7 at 18:38










  • Example provided in original post: 'Example of DF'
    – Matthew
    Nov 7 at 18:51















up vote
0
down vote

favorite












I've created a function that tests multiple IF statements given the data in the 'Name' column.



Criteria 1: If 'Name' is blank, return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'.



Criteria 2: If 'Name' == 'GENERAL', return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'



Else: Return the 'Name'



def account_name(row):
if row['Name'] == None and row['Secondary_Name'] == None:
return row['Third_Name']

elif row['Name'] == 'GENERAL':
if row['Secondary_Name'] == None:
return row['Third_Name']

else:
return row['Name']


I've tried == None, == np.NaN, == Null, .isnull(), == '', == '0'.



Nothing seems to replace the empty values to what I want.



Edit:



Example of DF










share|improve this question
























  • can you provide us with a sample df?
    – wpercy
    Nov 7 at 18:38










  • Example provided in original post: 'Example of DF'
    – Matthew
    Nov 7 at 18:51













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I've created a function that tests multiple IF statements given the data in the 'Name' column.



Criteria 1: If 'Name' is blank, return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'.



Criteria 2: If 'Name' == 'GENERAL', return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'



Else: Return the 'Name'



def account_name(row):
if row['Name'] == None and row['Secondary_Name'] == None:
return row['Third_Name']

elif row['Name'] == 'GENERAL':
if row['Secondary_Name'] == None:
return row['Third_Name']

else:
return row['Name']


I've tried == None, == np.NaN, == Null, .isnull(), == '', == '0'.



Nothing seems to replace the empty values to what I want.



Edit:



Example of DF










share|improve this question















I've created a function that tests multiple IF statements given the data in the 'Name' column.



Criteria 1: If 'Name' is blank, return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'.



Criteria 2: If 'Name' == 'GENERAL', return the 'Secondary_Name'. However, if 'Secondary_Name' is also blank, return the 'Third_Name'



Else: Return the 'Name'



def account_name(row):
if row['Name'] == None and row['Secondary_Name'] == None:
return row['Third_Name']

elif row['Name'] == 'GENERAL':
if row['Secondary_Name'] == None:
return row['Third_Name']

else:
return row['Name']


I've tried == None, == np.NaN, == Null, .isnull(), == '', == '0'.



Nothing seems to replace the empty values to what I want.



Edit:



Example of DF







python-3.x pandas dataframe null






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 18:50

























asked Nov 7 at 18:28









Matthew

42




42












  • can you provide us with a sample df?
    – wpercy
    Nov 7 at 18:38










  • Example provided in original post: 'Example of DF'
    – Matthew
    Nov 7 at 18:51


















  • can you provide us with a sample df?
    – wpercy
    Nov 7 at 18:38










  • Example provided in original post: 'Example of DF'
    – Matthew
    Nov 7 at 18:51
















can you provide us with a sample df?
– wpercy
Nov 7 at 18:38




can you provide us with a sample df?
– wpercy
Nov 7 at 18:38












Example provided in original post: 'Example of DF'
– Matthew
Nov 7 at 18:51




Example provided in original post: 'Example of DF'
– Matthew
Nov 7 at 18:51












2 Answers
2






active

oldest

votes

















up vote
0
down vote













Depending on column dtype, NULLs dont interact well with NaN for whatever reason.
None would mean that the field is blank, which it is not since you have "a value" for a given dtype.



The simple way to ensure you can identify NaNs is to see if the field is equal to itself



def isNaN(value):
if value != value:
return True
else:
return False


And to provide an example:



df = pd.DataFrame(data={'ClientId':[1,2,3,4] , 'SomeNULLs':['main','main',None,None], 'NewNULLs':[1,None,0,1]})

df['Test'] = df.NewNULLs.apply(isNaN)


The resulting dataset should be



   ClientId SomeNULLs  NewNULLs   Test
0 1 main 1.0 False
1 2 main NaN True
2 3 None 0.0 False
3 4 None 1.0 False





share|improve this answer






























    up vote
    0
    down vote













    Consider this df



    df = pd.DataFrame({'Name':['a', 'GENERAL', None],'Secondary_Name':['e','f',None], 'Third_Name':['x', 'y', 'z']})

    Name Secondary_Name Third_Name
    0 a e x
    1 GENERAL f y
    2 None None z


    Since you are writing the function in python, you can use is None



    def account_name(row):

    if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None):
    return row['Third_Name']

    elif row['Name'] is None or row['Name'] == 'GENERAL':
    return row['Secondary_Name']

    else:
    return row['Name']

    df['Name'] = df.apply(account_name, axis = 1)


    You get



        Name    Secondary_Name  Third_Name
    0 a e x
    1 f f y
    2 z None z


    You can get same output using pandas and nested np.where



    cond1 = (df['Name'].isnull()) | (df['Name'] == 'GENERAL')
    cond2 = (cond1) & (df['Secondary_Name'].isnull())
    np.where(cond2, df['Third_Name'], np.where(cond1, df['Secondary_Name'], df['Name']))





    share|improve this answer





















    • Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
      – Matthew
      Nov 7 at 19:26












    • Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
      – Vaishali
      Nov 7 at 20:27










    • Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
      – Matthew
      Nov 7 at 20:45










    • That should not make any difference because your are still apply the function on original df but assigning the result to a new column
      – Vaishali
      Nov 7 at 22:06










    • I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
      – Matthew
      Nov 7 at 22:15











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195614%2fpython-pandas-module-nested-if-statement-that-fills-in-nan-empty-values-in%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Depending on column dtype, NULLs dont interact well with NaN for whatever reason.
    None would mean that the field is blank, which it is not since you have "a value" for a given dtype.



    The simple way to ensure you can identify NaNs is to see if the field is equal to itself



    def isNaN(value):
    if value != value:
    return True
    else:
    return False


    And to provide an example:



    df = pd.DataFrame(data={'ClientId':[1,2,3,4] , 'SomeNULLs':['main','main',None,None], 'NewNULLs':[1,None,0,1]})

    df['Test'] = df.NewNULLs.apply(isNaN)


    The resulting dataset should be



       ClientId SomeNULLs  NewNULLs   Test
    0 1 main 1.0 False
    1 2 main NaN True
    2 3 None 0.0 False
    3 4 None 1.0 False





    share|improve this answer



























      up vote
      0
      down vote













      Depending on column dtype, NULLs dont interact well with NaN for whatever reason.
      None would mean that the field is blank, which it is not since you have "a value" for a given dtype.



      The simple way to ensure you can identify NaNs is to see if the field is equal to itself



      def isNaN(value):
      if value != value:
      return True
      else:
      return False


      And to provide an example:



      df = pd.DataFrame(data={'ClientId':[1,2,3,4] , 'SomeNULLs':['main','main',None,None], 'NewNULLs':[1,None,0,1]})

      df['Test'] = df.NewNULLs.apply(isNaN)


      The resulting dataset should be



         ClientId SomeNULLs  NewNULLs   Test
      0 1 main 1.0 False
      1 2 main NaN True
      2 3 None 0.0 False
      3 4 None 1.0 False





      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        Depending on column dtype, NULLs dont interact well with NaN for whatever reason.
        None would mean that the field is blank, which it is not since you have "a value" for a given dtype.



        The simple way to ensure you can identify NaNs is to see if the field is equal to itself



        def isNaN(value):
        if value != value:
        return True
        else:
        return False


        And to provide an example:



        df = pd.DataFrame(data={'ClientId':[1,2,3,4] , 'SomeNULLs':['main','main',None,None], 'NewNULLs':[1,None,0,1]})

        df['Test'] = df.NewNULLs.apply(isNaN)


        The resulting dataset should be



           ClientId SomeNULLs  NewNULLs   Test
        0 1 main 1.0 False
        1 2 main NaN True
        2 3 None 0.0 False
        3 4 None 1.0 False





        share|improve this answer














        Depending on column dtype, NULLs dont interact well with NaN for whatever reason.
        None would mean that the field is blank, which it is not since you have "a value" for a given dtype.



        The simple way to ensure you can identify NaNs is to see if the field is equal to itself



        def isNaN(value):
        if value != value:
        return True
        else:
        return False


        And to provide an example:



        df = pd.DataFrame(data={'ClientId':[1,2,3,4] , 'SomeNULLs':['main','main',None,None], 'NewNULLs':[1,None,0,1]})

        df['Test'] = df.NewNULLs.apply(isNaN)


        The resulting dataset should be



           ClientId SomeNULLs  NewNULLs   Test
        0 1 main 1.0 False
        1 2 main NaN True
        2 3 None 0.0 False
        3 4 None 1.0 False






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 7 at 18:49

























        answered Nov 7 at 18:41









        rogersdevop

        112




        112
























            up vote
            0
            down vote













            Consider this df



            df = pd.DataFrame({'Name':['a', 'GENERAL', None],'Secondary_Name':['e','f',None], 'Third_Name':['x', 'y', 'z']})

            Name Secondary_Name Third_Name
            0 a e x
            1 GENERAL f y
            2 None None z


            Since you are writing the function in python, you can use is None



            def account_name(row):

            if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None):
            return row['Third_Name']

            elif row['Name'] is None or row['Name'] == 'GENERAL':
            return row['Secondary_Name']

            else:
            return row['Name']

            df['Name'] = df.apply(account_name, axis = 1)


            You get



                Name    Secondary_Name  Third_Name
            0 a e x
            1 f f y
            2 z None z


            You can get same output using pandas and nested np.where



            cond1 = (df['Name'].isnull()) | (df['Name'] == 'GENERAL')
            cond2 = (cond1) & (df['Secondary_Name'].isnull())
            np.where(cond2, df['Third_Name'], np.where(cond1, df['Secondary_Name'], df['Name']))





            share|improve this answer





















            • Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
              – Matthew
              Nov 7 at 19:26












            • Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
              – Vaishali
              Nov 7 at 20:27










            • Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
              – Matthew
              Nov 7 at 20:45










            • That should not make any difference because your are still apply the function on original df but assigning the result to a new column
              – Vaishali
              Nov 7 at 22:06










            • I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
              – Matthew
              Nov 7 at 22:15















            up vote
            0
            down vote













            Consider this df



            df = pd.DataFrame({'Name':['a', 'GENERAL', None],'Secondary_Name':['e','f',None], 'Third_Name':['x', 'y', 'z']})

            Name Secondary_Name Third_Name
            0 a e x
            1 GENERAL f y
            2 None None z


            Since you are writing the function in python, you can use is None



            def account_name(row):

            if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None):
            return row['Third_Name']

            elif row['Name'] is None or row['Name'] == 'GENERAL':
            return row['Secondary_Name']

            else:
            return row['Name']

            df['Name'] = df.apply(account_name, axis = 1)


            You get



                Name    Secondary_Name  Third_Name
            0 a e x
            1 f f y
            2 z None z


            You can get same output using pandas and nested np.where



            cond1 = (df['Name'].isnull()) | (df['Name'] == 'GENERAL')
            cond2 = (cond1) & (df['Secondary_Name'].isnull())
            np.where(cond2, df['Third_Name'], np.where(cond1, df['Secondary_Name'], df['Name']))





            share|improve this answer





















            • Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
              – Matthew
              Nov 7 at 19:26












            • Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
              – Vaishali
              Nov 7 at 20:27










            • Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
              – Matthew
              Nov 7 at 20:45










            • That should not make any difference because your are still apply the function on original df but assigning the result to a new column
              – Vaishali
              Nov 7 at 22:06










            • I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
              – Matthew
              Nov 7 at 22:15













            up vote
            0
            down vote










            up vote
            0
            down vote









            Consider this df



            df = pd.DataFrame({'Name':['a', 'GENERAL', None],'Secondary_Name':['e','f',None], 'Third_Name':['x', 'y', 'z']})

            Name Secondary_Name Third_Name
            0 a e x
            1 GENERAL f y
            2 None None z


            Since you are writing the function in python, you can use is None



            def account_name(row):

            if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None):
            return row['Third_Name']

            elif row['Name'] is None or row['Name'] == 'GENERAL':
            return row['Secondary_Name']

            else:
            return row['Name']

            df['Name'] = df.apply(account_name, axis = 1)


            You get



                Name    Secondary_Name  Third_Name
            0 a e x
            1 f f y
            2 z None z


            You can get same output using pandas and nested np.where



            cond1 = (df['Name'].isnull()) | (df['Name'] == 'GENERAL')
            cond2 = (cond1) & (df['Secondary_Name'].isnull())
            np.where(cond2, df['Third_Name'], np.where(cond1, df['Secondary_Name'], df['Name']))





            share|improve this answer












            Consider this df



            df = pd.DataFrame({'Name':['a', 'GENERAL', None],'Secondary_Name':['e','f',None], 'Third_Name':['x', 'y', 'z']})

            Name Secondary_Name Third_Name
            0 a e x
            1 GENERAL f y
            2 None None z


            Since you are writing the function in python, you can use is None



            def account_name(row):

            if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None):
            return row['Third_Name']

            elif row['Name'] is None or row['Name'] == 'GENERAL':
            return row['Secondary_Name']

            else:
            return row['Name']

            df['Name'] = df.apply(account_name, axis = 1)


            You get



                Name    Secondary_Name  Third_Name
            0 a e x
            1 f f y
            2 z None z


            You can get same output using pandas and nested np.where



            cond1 = (df['Name'].isnull()) | (df['Name'] == 'GENERAL')
            cond2 = (cond1) & (df['Secondary_Name'].isnull())
            np.where(cond2, df['Third_Name'], np.where(cond1, df['Secondary_Name'], df['Name']))






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 7 at 18:52









            Vaishali

            16.5k3927




            16.5k3927












            • Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
              – Matthew
              Nov 7 at 19:26












            • Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
              – Vaishali
              Nov 7 at 20:27










            • Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
              – Matthew
              Nov 7 at 20:45










            • That should not make any difference because your are still apply the function on original df but assigning the result to a new column
              – Vaishali
              Nov 7 at 22:06










            • I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
              – Matthew
              Nov 7 at 22:15


















            • Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
              – Matthew
              Nov 7 at 19:26












            • Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
              – Vaishali
              Nov 7 at 20:27










            • Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
              – Matthew
              Nov 7 at 20:45










            • That should not make any difference because your are still apply the function on original df but assigning the result to a new column
              – Vaishali
              Nov 7 at 22:06










            • I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
              – Matthew
              Nov 7 at 22:15
















            Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
            – Matthew
            Nov 7 at 19:26






            Dear Vaishali, Thank you for the function provided above. However, when I tested it, the first IF statement didn't return the Third_Name. if (row['Name'] is None or row['Name'] == 'GENERAL') and (row['Secondary_Name'] is None): return row['Third_Name']
            – Matthew
            Nov 7 at 19:26














            Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
            – Vaishali
            Nov 7 at 20:27




            Its working fine with the example df I created. Can you post your sample df on the question, not as an image?
            – Vaishali
            Nov 7 at 20:27












            Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
            – Matthew
            Nov 7 at 20:45




            Vaishali, Does it make a difference if I use the df.apply on a new column? For instance, df['Target_Col'] = df.apply(account_name, axis=1) as opposed to df['Name']... My target col doesn't return the ['Third_Name'] during the first TRUE condition.
            – Matthew
            Nov 7 at 20:45












            That should not make any difference because your are still apply the function on original df but assigning the result to a new column
            – Vaishali
            Nov 7 at 22:06




            That should not make any difference because your are still apply the function on original df but assigning the result to a new column
            – Vaishali
            Nov 7 at 22:06












            I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
            – Matthew
            Nov 7 at 22:15




            I figured. I'm not sure why when I export the DataFrame, there is still null values in the new column. All of which fall under the first condition and should return 'Third_Name' Info.
            – Matthew
            Nov 7 at 22:15


















             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53195614%2fpython-pandas-module-nested-if-statement-that-fills-in-nan-empty-values-in%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Xamarin.form Move up view when keyboard appear

            Post-Redirect-Get with Spring WebFlux and Thymeleaf

            Anylogic : not able to use stopDelay()