nltk bags of words showing emotions











up vote
0
down vote

favorite












i am working on NLP using python and nltk.



I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc



from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.



Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?



Any help would be greatly appreciated










share|improve this question


























    up vote
    0
    down vote

    favorite












    i am working on NLP using python and nltk.



    I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc



    from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.



    Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?



    Any help would be greatly appreciated










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      i am working on NLP using python and nltk.



      I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc



      from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.



      Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?



      Any help would be greatly appreciated










      share|improve this question













      i am working on NLP using python and nltk.



      I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc



      from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.



      Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?



      Any help would be greatly appreciated







      python nlp nltk






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 8 at 2:59









      Calvin

      166




      166
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.



          1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.



          2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.



          3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.



          This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.






          share|improve this answer





















          • I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
            – Calvin
            Nov 9 at 13:04












          • Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
            – Roberto
            Nov 9 at 13:23












          • Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
            – Roberto
            Nov 9 at 13:25











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200934%2fnltk-bags-of-words-showing-emotions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.



          1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.



          2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.



          3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.



          This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.






          share|improve this answer





















          • I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
            – Calvin
            Nov 9 at 13:04












          • Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
            – Roberto
            Nov 9 at 13:23












          • Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
            – Roberto
            Nov 9 at 13:25















          up vote
          0
          down vote













          I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.



          1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.



          2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.



          3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.



          This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.






          share|improve this answer





















          • I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
            – Calvin
            Nov 9 at 13:04












          • Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
            – Roberto
            Nov 9 at 13:23












          • Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
            – Roberto
            Nov 9 at 13:25













          up vote
          0
          down vote










          up vote
          0
          down vote









          I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.



          1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.



          2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.



          3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.



          This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.






          share|improve this answer












          I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.



          1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.



          2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.



          3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.



          This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 8 at 15:33









          Roberto

          4947




          4947












          • I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
            – Calvin
            Nov 9 at 13:04












          • Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
            – Roberto
            Nov 9 at 13:23












          • Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
            – Roberto
            Nov 9 at 13:25


















          • I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
            – Calvin
            Nov 9 at 13:04












          • Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
            – Roberto
            Nov 9 at 13:23












          • Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
            – Roberto
            Nov 9 at 13:25
















          I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
          – Calvin
          Nov 9 at 13:04






          I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
          – Calvin
          Nov 9 at 13:04














          Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
          – Roberto
          Nov 9 at 13:23






          Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
          – Roberto
          Nov 9 at 13:23














          Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
          – Roberto
          Nov 9 at 13:25




          Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
          – Roberto
          Nov 9 at 13:25


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200934%2fnltk-bags-of-words-showing-emotions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini