Segmenting sentence into subsentences with CoreNLP

up vote
2
down vote

favorite

I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:

"Richard is working with CoreNLP, but does not really understand what he is doing"

I would now like my sentence to be split into single "S" as shown in the tree diagram below:

enter image description here

I would like the output to be a list with the single "S" as follows:

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

I would be really thankful for any help :)

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

add a comment |

up vote
2
down vote

favorite

I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:

"Richard is working with CoreNLP, but does not really understand what he is doing"

I would now like my sentence to be split into single "S" as shown in the tree diagram below:

enter image description here

I would like the output to be a list with the single "S" as follows:

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

I would be really thankful for any help :)

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

add a comment |

up vote
2
down vote

favorite

I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:

"Richard is working with CoreNLP, but does not really understand what he is doing"

I would now like my sentence to be split into single "S" as shown in the tree diagram below:

enter image description here

I would like the output to be a list with the single "S" as follows:

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

I would be really thankful for any help :)

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:

"Richard is working with CoreNLP, but does not really understand what he is doing"

I would now like my sentence to be split into single "S" as shown in the tree diagram below:

enter image description here

I would like the output to be a list with the single "S" as follows:

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

I would be really thankful for any help :)

nlp stanford-nlp dependency-parsing natural-language-processing pycorenlp

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

edited Nov 5 at 17:33

David Batista

1,11911023

edited Nov 5 at 17:33

David Batista

1,11911023

edited Nov 5 at 17:33

David Batista

1,11911023

asked Nov 5 at 13:07

moritz

164

asked Nov 5 at 13:07

moritz

164

asked Nov 5 at 13:07

moritz

164

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

add a comment |

up vote
0
down vote

Ok, I found that one do this as follows:

import requests



url = "http://localhost:9000/tregex"

request_params = {"pattern": "S"}

text = "Pusheen and Smitha walked along the beach."

r = requests.post(url, data=text, params=request_params)

print r.json()

Does anybody know how to use other languages (I need German)?

answered Nov 6 at 10:21

moritz

164

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53155057%2fsegmenting-sentence-into-subsentences-with-corenlp%23new-answer', 'question_page');
}
);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

add a comment |

up vote
1
down vote

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

add a comment |

up vote
1
down vote

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

answered Nov 6 at 6:39

Gabor Angeli

4,86611124

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

add a comment |

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
– moritz
Nov 6 at 9:40

add a comment |

up vote
0
down vote

Ok, I found that one do this as follows:

import requests



url = "http://localhost:9000/tregex"

request_params = {"pattern": "S"}

text = "Pusheen and Smitha walked along the beach."

r = requests.post(url, data=text, params=request_params)

print r.json()

Does anybody know how to use other languages (I need German)?

answered Nov 6 at 10:21

moritz

164

add a comment |

up vote
0
down vote

Ok, I found that one do this as follows:

import requests



url = "http://localhost:9000/tregex"

request_params = {"pattern": "S"}

text = "Pusheen and Smitha walked along the beach."

r = requests.post(url, data=text, params=request_params)

print r.json()

Does anybody know how to use other languages (I need German)?

answered Nov 6 at 10:21

moritz

164

add a comment |

up vote
0
down vote

Ok, I found that one do this as follows:

import requests



url = "http://localhost:9000/tregex"

request_params = {"pattern": "S"}

text = "Pusheen and Smitha walked along the beach."

r = requests.post(url, data=text, params=request_params)

print r.json()

Does anybody know how to use other languages (I need German)?

answered Nov 6 at 10:21

moritz

164

Ok, I found that one do this as follows:

import requests



url = "http://localhost:9000/tregex"

request_params = {"pattern": "S"}

text = "Pusheen and Smitha walked along the beach."

r = requests.post(url, data=text, params=request_params)

print r.json()

Does anybody know how to use other languages (I need German)?

answered Nov 6 at 10:21

moritz

164

answered Nov 6 at 10:21

moritz

164

answered Nov 6 at 10:21

moritz

164

answered Nov 6 at 10:21

moritz

164

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk