nltk bags of words showing emotions
up vote
0
down vote
favorite
i am working on NLP using python and nltk.
I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc
from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.
Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?
Any help would be greatly appreciated
python nlp nltk
add a comment |
up vote
0
down vote
favorite
i am working on NLP using python and nltk.
I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc
from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.
Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?
Any help would be greatly appreciated
python nlp nltk
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
i am working on NLP using python and nltk.
I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc
from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.
Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?
Any help would be greatly appreciated
python nlp nltk
i am working on NLP using python and nltk.
I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc
from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.
Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?
Any help would be greatly appreciated
python nlp nltk
python nlp nltk
asked Nov 8 at 2:59
Calvin
166
166
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.
1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.
2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.
3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.
This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.
1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.
2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.
3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.
This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
add a comment |
up vote
0
down vote
I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.
1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.
2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.
3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.
This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
add a comment |
up vote
0
down vote
up vote
0
down vote
I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.
1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.
2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.
3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.
This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.
I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.
1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.
2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.
3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.
This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.
answered Nov 8 at 15:33
Roberto
4947
4947
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
add a comment |
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200934%2fnltk-bags-of-words-showing-emotions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown