nltk bags of words showing emotions

up vote
0
down vote

favorite

i am working on NLP using python and nltk.

I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc

from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.

Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?

Any help would be greatly appreciated

asked Nov 8 at 2:59

Calvin

166

add a comment |

up vote
0
down vote

favorite

i am working on NLP using python and nltk.

I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc

from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.

Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?

Any help would be greatly appreciated

asked Nov 8 at 2:59

Calvin

166

add a comment |

up vote
0
down vote

favorite

i am working on NLP using python and nltk.

I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc

from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.

Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?

Any help would be greatly appreciated

asked Nov 8 at 2:59

Calvin

166

i am working on NLP using python and nltk.

I was wondering whether is there any dataset which have bags of words which shows keywords relating to emotions such as happy, joy, anger, sadness and etc

from what i dug up in the nltk corpus, i see there are some sentiment analysis corpus which contain positive and negative review which doesn't exactly related to keywords showing emotions.

Is there anyway which i could build my own dictionary containing words which shows emotion for this purpose? is so, how do i do it and is there any collection of such words?

Any help would be greatly appreciated

python nlp nltk

asked Nov 8 at 2:59

Calvin

166

asked Nov 8 at 2:59

Calvin

166

asked Nov 8 at 2:59

Calvin

166

asked Nov 8 at 2:59

Calvin

166

asked Nov 8 at 2:59

Calvin

166

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.

1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.

2)Compute the count of each words in the two sentiment classes and normalize it. In this way you will associate a probability to each word to belong to a class. Let's suppose that you have 300 times the word "love" appearing in the positive sentences and the same word appearing 150 times in the negative sentences. Normalizing you have that the word "love" belongs with a probability of 66% (300/(150+300)) to the positive class and 33% to the negative one.

3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.

This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.

answered Nov 8 at 15:33

Roberto

4947

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53200934%2fnltk-bags-of-words-showing-emotions%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.

1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.

3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.

This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.

answered Nov 8 at 15:33

Roberto

4947

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

add a comment |

up vote
0
down vote

I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.

1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.

3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.

This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.

answered Nov 8 at 15:33

Roberto

4947

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

add a comment |

up vote
0
down vote

I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.

1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.

3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.

This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.

answered Nov 8 at 15:33

Roberto

4947

I'm not aware of any dataset that associates sentiments to keywords, but you can easily built one starting from a generic sentiment analysis dataset.

1) Clean the datasets from the stopwords and all the terms that you don't want to associate to a sentiment.

3) In order to make the dictionary more robust to the borderline terms you can set a threshold to consider neutral all the words with the max probability lower than the threshold.

This is an easy approach to build the dictionary that you are looking for. You could use more sophisticated approach as Term Frequency-Inverse Document Frequency.

answered Nov 8 at 15:33

Roberto

4947

answered Nov 8 at 15:33

Roberto

4947

answered Nov 8 at 15:33

Roberto

4947

answered Nov 8 at 15:33

Roberto

4947

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

add a comment |

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

I dont quite get what you mean, for example if i am using this SentiWordNet courpus, inside it have many entries such as below: better_off#1 in a more fortunate or prosperous condition; "she would have been better off if she had stuck with teaching"; "is better off than his classmate" a 01048406 0.75 0 happy#2 felicitous#2 marked by good fortune; "a felicitous life"; "a happy outcome"a 01048587 0.5 0 So if i am only interested in words representing emotion, i could just remove the entire other entries i am not interested in such as the better_off entry?
– Calvin
Nov 9 at 13:04

Hello @Calvin. I gave a look into SentiwordNet. I didn't know this dataset. It seems that it contains what you were looking for. In the dataset each term is caracterized by 4 values: POS: "a"; ID: "01048202"; POSITIVE_score: "0.875"; NEGATIVE_Score: 0; SynsetTerms: "better_off#1"; Glossary: "in amore fortunate [...]". The tuple (POS,ID) uniquely identifies the term in WordNet (3.0). POSITIVE/NEGATIVE_score measure the degree of association of the term with the sentiment. "SynsetTerms": is the term under analysis. "Glossary": is a list of sentence contaminating the term (original corpora?)
– Roberto
Nov 9 at 13:23

Basically SentiWordNet contains the result of what I suggested to do step by step. In the SentiwordNet you are interested in the "SynsetTerms": ["better_off#1",happy#2,nonadaptive#1 dysfunctional#2 ] and their POSITIVE/NEGATIVE_score.
– Roberto
Nov 9 at 13:25

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

xam2VVXwj8W AhK9GziG5TXLjxHjX3m xhSz1jUUL3AP15pKI

搜尋此網誌

Wsrtjtyk