Remove repetitive values from a list based on certain conditions












-1















I have a list of variables as below



roll_off_m4**
ov_offer_desc_m4
curr_ov_tier_desc2_m4
income
age
vid_offer_up_flag_m3
vidpromo_rng_m4*
ovpromo_rng_m4*
ovpromo_rng_m3*
roll_off_m3
roll_off_m2
oolpromo_rng_m3*
ov_offer_group_v2_desc_m4
oolpromo_rng_m2*
rsdvr_orig_m2
vidpromo_rng_m2*
ovpromo_rng_m2*


Some, like the ones marked by a * are essentially the same variable but the month in which it's value is taken can be different.
Like roll_off in Feb is m2 , in Mar is m3 and Apr is m4.
I need to pick only the variable corresponding to m2 in case multiple month values are present.
In case only a single month value is present , I pick that only.



In case of variables like age and income, since there is no month info associated with them, I just pick them as is.



All these picked values are appended to a final list of variables.



Can someone please help me to do this in python?










share|improve this question



























    -1















    I have a list of variables as below



    roll_off_m4**
    ov_offer_desc_m4
    curr_ov_tier_desc2_m4
    income
    age
    vid_offer_up_flag_m3
    vidpromo_rng_m4*
    ovpromo_rng_m4*
    ovpromo_rng_m3*
    roll_off_m3
    roll_off_m2
    oolpromo_rng_m3*
    ov_offer_group_v2_desc_m4
    oolpromo_rng_m2*
    rsdvr_orig_m2
    vidpromo_rng_m2*
    ovpromo_rng_m2*


    Some, like the ones marked by a * are essentially the same variable but the month in which it's value is taken can be different.
    Like roll_off in Feb is m2 , in Mar is m3 and Apr is m4.
    I need to pick only the variable corresponding to m2 in case multiple month values are present.
    In case only a single month value is present , I pick that only.



    In case of variables like age and income, since there is no month info associated with them, I just pick them as is.



    All these picked values are appended to a final list of variables.



    Can someone please help me to do this in python?










    share|improve this question

























      -1












      -1








      -1








      I have a list of variables as below



      roll_off_m4**
      ov_offer_desc_m4
      curr_ov_tier_desc2_m4
      income
      age
      vid_offer_up_flag_m3
      vidpromo_rng_m4*
      ovpromo_rng_m4*
      ovpromo_rng_m3*
      roll_off_m3
      roll_off_m2
      oolpromo_rng_m3*
      ov_offer_group_v2_desc_m4
      oolpromo_rng_m2*
      rsdvr_orig_m2
      vidpromo_rng_m2*
      ovpromo_rng_m2*


      Some, like the ones marked by a * are essentially the same variable but the month in which it's value is taken can be different.
      Like roll_off in Feb is m2 , in Mar is m3 and Apr is m4.
      I need to pick only the variable corresponding to m2 in case multiple month values are present.
      In case only a single month value is present , I pick that only.



      In case of variables like age and income, since there is no month info associated with them, I just pick them as is.



      All these picked values are appended to a final list of variables.



      Can someone please help me to do this in python?










      share|improve this question














      I have a list of variables as below



      roll_off_m4**
      ov_offer_desc_m4
      curr_ov_tier_desc2_m4
      income
      age
      vid_offer_up_flag_m3
      vidpromo_rng_m4*
      ovpromo_rng_m4*
      ovpromo_rng_m3*
      roll_off_m3
      roll_off_m2
      oolpromo_rng_m3*
      ov_offer_group_v2_desc_m4
      oolpromo_rng_m2*
      rsdvr_orig_m2
      vidpromo_rng_m2*
      ovpromo_rng_m2*


      Some, like the ones marked by a * are essentially the same variable but the month in which it's value is taken can be different.
      Like roll_off in Feb is m2 , in Mar is m3 and Apr is m4.
      I need to pick only the variable corresponding to m2 in case multiple month values are present.
      In case only a single month value is present , I pick that only.



      In case of variables like age and income, since there is no month info associated with them, I just pick them as is.



      All these picked values are appended to a final list of variables.



      Can someone please help me to do this in python?







      python list unique






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 15 '18 at 3:51









      Shuvayan DasShuvayan Das

      424514




      424514
























          1 Answer
          1






          active

          oldest

          votes


















          1














          EDIT:
          I have updated the function to achieve the results you have specified in the comments.



          I'm using a pretty long list comprehension, so I'll quickly outline what's being checked to add the item to the output:




          1. The variable contains a number and ("_m" + month_number) is in the variable name

          2. There is no number in the variable (e.g. 'age')

          3. There is no other example of this variable with a different month


          So running the code below will output the following for a month variable of '2':



          ['roll_off_m4**', 'ov_offer_desc_m4', 'curr_ov_tier_desc2_m4', 'income', 'age', 'vid_offer_up_flag_m3', 'roll_off_m2', 'ov_offer_group_v2_desc_m4', 'oolpromo_rng_m2*', 'rsdvr_orig_m2', 'vidpromo_rng_m2*', 'ovpromo_rng_m2*']


          Full code:



          # 're' is imported for determining if the month number is in the variable
          import re

          # Your initial list of variables
          my_list = ['roll_off_m4**',
          'ov_offer_desc_m4',
          'curr_ov_tier_desc2_m4',
          'income',
          'age',
          'vid_offer_up_flag_m3',
          'vidpromo_rng_m4*',
          'ovpromo_rng_m4*',
          'ovpromo_rng_m3*',
          'roll_off_m3',
          'roll_off_m2',
          'oolpromo_rng_m3*',
          'ov_offer_group_v2_desc_m4',
          'oolpromo_rng_m2*',
          'rsdvr_orig_m2',
          'vidpromo_rng_m2*',
          'ovpromo_rng_m2*']

          # This function will return the list for the month specified
          def get_data_for_month(month_number, variable_list):
          return [variable for variable in variable_list if (bool(re.search(r'd', variable)) == True and ("_m" + str(month_number)) in variable) or (bool(re.search(r'd', variable)) == False) or (variable.replace(variable[variable.find("_m"):variable.find("_m")+3], "_m2")) not in variable_list]


          #function call
          output = get_data_for_month(2, my_list)

          #output is printed
          print(output)





          share|improve this answer


























          • Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

            – Shuvayan Das
            Nov 15 '18 at 4:18











          • @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

            – PL200
            Nov 15 '18 at 4:27











          • ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

            – Shuvayan Das
            Nov 15 '18 at 4:29













          • That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

            – PL200
            Nov 15 '18 at 4:35











          • Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

            – Shuvayan Das
            Nov 15 '18 at 4:37











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312185%2fremove-repetitive-values-from-a-list-based-on-certain-conditions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          EDIT:
          I have updated the function to achieve the results you have specified in the comments.



          I'm using a pretty long list comprehension, so I'll quickly outline what's being checked to add the item to the output:




          1. The variable contains a number and ("_m" + month_number) is in the variable name

          2. There is no number in the variable (e.g. 'age')

          3. There is no other example of this variable with a different month


          So running the code below will output the following for a month variable of '2':



          ['roll_off_m4**', 'ov_offer_desc_m4', 'curr_ov_tier_desc2_m4', 'income', 'age', 'vid_offer_up_flag_m3', 'roll_off_m2', 'ov_offer_group_v2_desc_m4', 'oolpromo_rng_m2*', 'rsdvr_orig_m2', 'vidpromo_rng_m2*', 'ovpromo_rng_m2*']


          Full code:



          # 're' is imported for determining if the month number is in the variable
          import re

          # Your initial list of variables
          my_list = ['roll_off_m4**',
          'ov_offer_desc_m4',
          'curr_ov_tier_desc2_m4',
          'income',
          'age',
          'vid_offer_up_flag_m3',
          'vidpromo_rng_m4*',
          'ovpromo_rng_m4*',
          'ovpromo_rng_m3*',
          'roll_off_m3',
          'roll_off_m2',
          'oolpromo_rng_m3*',
          'ov_offer_group_v2_desc_m4',
          'oolpromo_rng_m2*',
          'rsdvr_orig_m2',
          'vidpromo_rng_m2*',
          'ovpromo_rng_m2*']

          # This function will return the list for the month specified
          def get_data_for_month(month_number, variable_list):
          return [variable for variable in variable_list if (bool(re.search(r'd', variable)) == True and ("_m" + str(month_number)) in variable) or (bool(re.search(r'd', variable)) == False) or (variable.replace(variable[variable.find("_m"):variable.find("_m")+3], "_m2")) not in variable_list]


          #function call
          output = get_data_for_month(2, my_list)

          #output is printed
          print(output)





          share|improve this answer


























          • Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

            – Shuvayan Das
            Nov 15 '18 at 4:18











          • @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

            – PL200
            Nov 15 '18 at 4:27











          • ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

            – Shuvayan Das
            Nov 15 '18 at 4:29













          • That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

            – PL200
            Nov 15 '18 at 4:35











          • Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

            – Shuvayan Das
            Nov 15 '18 at 4:37
















          1














          EDIT:
          I have updated the function to achieve the results you have specified in the comments.



          I'm using a pretty long list comprehension, so I'll quickly outline what's being checked to add the item to the output:




          1. The variable contains a number and ("_m" + month_number) is in the variable name

          2. There is no number in the variable (e.g. 'age')

          3. There is no other example of this variable with a different month


          So running the code below will output the following for a month variable of '2':



          ['roll_off_m4**', 'ov_offer_desc_m4', 'curr_ov_tier_desc2_m4', 'income', 'age', 'vid_offer_up_flag_m3', 'roll_off_m2', 'ov_offer_group_v2_desc_m4', 'oolpromo_rng_m2*', 'rsdvr_orig_m2', 'vidpromo_rng_m2*', 'ovpromo_rng_m2*']


          Full code:



          # 're' is imported for determining if the month number is in the variable
          import re

          # Your initial list of variables
          my_list = ['roll_off_m4**',
          'ov_offer_desc_m4',
          'curr_ov_tier_desc2_m4',
          'income',
          'age',
          'vid_offer_up_flag_m3',
          'vidpromo_rng_m4*',
          'ovpromo_rng_m4*',
          'ovpromo_rng_m3*',
          'roll_off_m3',
          'roll_off_m2',
          'oolpromo_rng_m3*',
          'ov_offer_group_v2_desc_m4',
          'oolpromo_rng_m2*',
          'rsdvr_orig_m2',
          'vidpromo_rng_m2*',
          'ovpromo_rng_m2*']

          # This function will return the list for the month specified
          def get_data_for_month(month_number, variable_list):
          return [variable for variable in variable_list if (bool(re.search(r'd', variable)) == True and ("_m" + str(month_number)) in variable) or (bool(re.search(r'd', variable)) == False) or (variable.replace(variable[variable.find("_m"):variable.find("_m")+3], "_m2")) not in variable_list]


          #function call
          output = get_data_for_month(2, my_list)

          #output is printed
          print(output)





          share|improve this answer


























          • Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

            – Shuvayan Das
            Nov 15 '18 at 4:18











          • @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

            – PL200
            Nov 15 '18 at 4:27











          • ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

            – Shuvayan Das
            Nov 15 '18 at 4:29













          • That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

            – PL200
            Nov 15 '18 at 4:35











          • Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

            – Shuvayan Das
            Nov 15 '18 at 4:37














          1












          1








          1







          EDIT:
          I have updated the function to achieve the results you have specified in the comments.



          I'm using a pretty long list comprehension, so I'll quickly outline what's being checked to add the item to the output:




          1. The variable contains a number and ("_m" + month_number) is in the variable name

          2. There is no number in the variable (e.g. 'age')

          3. There is no other example of this variable with a different month


          So running the code below will output the following for a month variable of '2':



          ['roll_off_m4**', 'ov_offer_desc_m4', 'curr_ov_tier_desc2_m4', 'income', 'age', 'vid_offer_up_flag_m3', 'roll_off_m2', 'ov_offer_group_v2_desc_m4', 'oolpromo_rng_m2*', 'rsdvr_orig_m2', 'vidpromo_rng_m2*', 'ovpromo_rng_m2*']


          Full code:



          # 're' is imported for determining if the month number is in the variable
          import re

          # Your initial list of variables
          my_list = ['roll_off_m4**',
          'ov_offer_desc_m4',
          'curr_ov_tier_desc2_m4',
          'income',
          'age',
          'vid_offer_up_flag_m3',
          'vidpromo_rng_m4*',
          'ovpromo_rng_m4*',
          'ovpromo_rng_m3*',
          'roll_off_m3',
          'roll_off_m2',
          'oolpromo_rng_m3*',
          'ov_offer_group_v2_desc_m4',
          'oolpromo_rng_m2*',
          'rsdvr_orig_m2',
          'vidpromo_rng_m2*',
          'ovpromo_rng_m2*']

          # This function will return the list for the month specified
          def get_data_for_month(month_number, variable_list):
          return [variable for variable in variable_list if (bool(re.search(r'd', variable)) == True and ("_m" + str(month_number)) in variable) or (bool(re.search(r'd', variable)) == False) or (variable.replace(variable[variable.find("_m"):variable.find("_m")+3], "_m2")) not in variable_list]


          #function call
          output = get_data_for_month(2, my_list)

          #output is printed
          print(output)





          share|improve this answer















          EDIT:
          I have updated the function to achieve the results you have specified in the comments.



          I'm using a pretty long list comprehension, so I'll quickly outline what's being checked to add the item to the output:




          1. The variable contains a number and ("_m" + month_number) is in the variable name

          2. There is no number in the variable (e.g. 'age')

          3. There is no other example of this variable with a different month


          So running the code below will output the following for a month variable of '2':



          ['roll_off_m4**', 'ov_offer_desc_m4', 'curr_ov_tier_desc2_m4', 'income', 'age', 'vid_offer_up_flag_m3', 'roll_off_m2', 'ov_offer_group_v2_desc_m4', 'oolpromo_rng_m2*', 'rsdvr_orig_m2', 'vidpromo_rng_m2*', 'ovpromo_rng_m2*']


          Full code:



          # 're' is imported for determining if the month number is in the variable
          import re

          # Your initial list of variables
          my_list = ['roll_off_m4**',
          'ov_offer_desc_m4',
          'curr_ov_tier_desc2_m4',
          'income',
          'age',
          'vid_offer_up_flag_m3',
          'vidpromo_rng_m4*',
          'ovpromo_rng_m4*',
          'ovpromo_rng_m3*',
          'roll_off_m3',
          'roll_off_m2',
          'oolpromo_rng_m3*',
          'ov_offer_group_v2_desc_m4',
          'oolpromo_rng_m2*',
          'rsdvr_orig_m2',
          'vidpromo_rng_m2*',
          'ovpromo_rng_m2*']

          # This function will return the list for the month specified
          def get_data_for_month(month_number, variable_list):
          return [variable for variable in variable_list if (bool(re.search(r'd', variable)) == True and ("_m" + str(month_number)) in variable) or (bool(re.search(r'd', variable)) == False) or (variable.replace(variable[variable.find("_m"):variable.find("_m")+3], "_m2")) not in variable_list]


          #function call
          output = get_data_for_month(2, my_list)

          #output is printed
          print(output)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 15 '18 at 5:40

























          answered Nov 15 '18 at 4:10









          PL200PL200

          547212




          547212













          • Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

            – Shuvayan Das
            Nov 15 '18 at 4:18











          • @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

            – PL200
            Nov 15 '18 at 4:27











          • ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

            – Shuvayan Das
            Nov 15 '18 at 4:29













          • That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

            – PL200
            Nov 15 '18 at 4:35











          • Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

            – Shuvayan Das
            Nov 15 '18 at 4:37



















          • Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

            – Shuvayan Das
            Nov 15 '18 at 4:18











          • @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

            – PL200
            Nov 15 '18 at 4:27











          • ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

            – Shuvayan Das
            Nov 15 '18 at 4:29













          • That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

            – PL200
            Nov 15 '18 at 4:35











          • Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

            – Shuvayan Das
            Nov 15 '18 at 4:37

















          Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

          – Shuvayan Das
          Nov 15 '18 at 4:18





          Hello @PL200.. Thanks a lot. However, this does not take of variables like ov_offer_group_v2_desc_m4, which should also come in the output since there are no other months associated with it.

          – Shuvayan Das
          Nov 15 '18 at 4:18













          @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

          – PL200
          Nov 15 '18 at 4:27





          @ShuvayanDas Apologies, but isn't the '_m4' at the end indicating that month 4 (April) is associated with it?

          – PL200
          Nov 15 '18 at 4:27













          ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

          – Shuvayan Das
          Nov 15 '18 at 4:29







          ,, Yes it does. But , as I mentioned in the question, in cases where single month values are present, it is taken as it is. Only when other months along with m2 are present, I need to select the m2 ones.I am sorry if the question is not clear.

          – Shuvayan Das
          Nov 15 '18 at 4:29















          That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

          – PL200
          Nov 15 '18 at 4:35





          That is not very clear. So are you saying that if there is only one example of 'm6', it should also be picked up?

          – PL200
          Nov 15 '18 at 4:35













          Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

          – Shuvayan Das
          Nov 15 '18 at 4:37





          Yes @PL200.. That is right.. The reason is, several months appearing together basically means duplicate. So, need to pick only m2.

          – Shuvayan Das
          Nov 15 '18 at 4:37


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312185%2fremove-repetitive-values-from-a-list-based-on-certain-conditions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini