Segmenting sentence into subsentences with CoreNLP











up vote
2
down vote

favorite












I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:



"Richard is working with CoreNLP, but does not really understand what he is doing"


I would now like my sentence to be split into single "S" as shown in the tree diagram below:



enter image description here



I would like the output to be a list with the single "S" as follows:



['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']


I would be really thankful for any help :)










share|improve this question




























    up vote
    2
    down vote

    favorite












    I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:



    "Richard is working with CoreNLP, but does not really understand what he is doing"


    I would now like my sentence to be split into single "S" as shown in the tree diagram below:



    enter image description here



    I would like the output to be a list with the single "S" as follows:



    ['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']


    I would be really thankful for any help :)










    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:



      "Richard is working with CoreNLP, but does not really understand what he is doing"


      I would now like my sentence to be split into single "S" as shown in the tree diagram below:



      enter image description here



      I would like the output to be a list with the single "S" as follows:



      ['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']


      I would be really thankful for any help :)










      share|improve this question















      I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:



      "Richard is working with CoreNLP, but does not really understand what he is doing"


      I would now like my sentence to be split into single "S" as shown in the tree diagram below:



      enter image description here



      I would like the output to be a list with the single "S" as follows:



      ['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']


      I would be really thankful for any help :)







      nlp stanford-nlp dependency-parsing natural-language-processing pycorenlp






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 5 at 17:33









      David Batista

      1,11911023




      1,11911023










      asked Nov 5 at 13:07









      moritz

      164




      164
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote













          I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.



          In your case, I believe the pattern you're looking for is simply S. So, something like:



          tregex.sh “S” <path_to_file>


          where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).



          As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".



          If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:



          S !>> S




          Note: I haven't tested the patterns -- use at your own risk!






          share|improve this answer





















          • Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
            – moritz
            Nov 6 at 9:40


















          up vote
          0
          down vote













          Ok, I found that one do this as follows:



          import requests

          url = "http://localhost:9000/tregex"
          request_params = {"pattern": "S"}
          text = "Pusheen and Smitha walked along the beach."
          r = requests.post(url, data=text, params=request_params)
          print r.json()


          Does anybody know how to use other languages (I need German)?






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53155057%2fsegmenting-sentence-into-subsentences-with-corenlp%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.



            In your case, I believe the pattern you're looking for is simply S. So, something like:



            tregex.sh “S” <path_to_file>


            where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).



            As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".



            If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:



            S !>> S




            Note: I haven't tested the patterns -- use at your own risk!






            share|improve this answer





















            • Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
              – moritz
              Nov 6 at 9:40















            up vote
            1
            down vote













            I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.



            In your case, I believe the pattern you're looking for is simply S. So, something like:



            tregex.sh “S” <path_to_file>


            where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).



            As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".



            If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:



            S !>> S




            Note: I haven't tested the patterns -- use at your own risk!






            share|improve this answer





















            • Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
              – moritz
              Nov 6 at 9:40













            up vote
            1
            down vote










            up vote
            1
            down vote









            I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.



            In your case, I believe the pattern you're looking for is simply S. So, something like:



            tregex.sh “S” <path_to_file>


            where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).



            As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".



            If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:



            S !>> S




            Note: I haven't tested the patterns -- use at your own risk!






            share|improve this answer












            I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.



            In your case, I believe the pattern you're looking for is simply S. So, something like:



            tregex.sh “S” <path_to_file>


            where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).



            As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".



            If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:



            S !>> S




            Note: I haven't tested the patterns -- use at your own risk!







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 6 at 6:39









            Gabor Angeli

            4,86611124




            4,86611124












            • Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
              – moritz
              Nov 6 at 9:40


















            • Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
              – moritz
              Nov 6 at 9:40
















            Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
            – moritz
            Nov 6 at 9:40




            Thank you for your response. We are working in Python. Do you know how we could integrate your solution here? That would be really great!
            – moritz
            Nov 6 at 9:40












            up vote
            0
            down vote













            Ok, I found that one do this as follows:



            import requests

            url = "http://localhost:9000/tregex"
            request_params = {"pattern": "S"}
            text = "Pusheen and Smitha walked along the beach."
            r = requests.post(url, data=text, params=request_params)
            print r.json()


            Does anybody know how to use other languages (I need German)?






            share|improve this answer

























              up vote
              0
              down vote













              Ok, I found that one do this as follows:



              import requests

              url = "http://localhost:9000/tregex"
              request_params = {"pattern": "S"}
              text = "Pusheen and Smitha walked along the beach."
              r = requests.post(url, data=text, params=request_params)
              print r.json()


              Does anybody know how to use other languages (I need German)?






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Ok, I found that one do this as follows:



                import requests

                url = "http://localhost:9000/tregex"
                request_params = {"pattern": "S"}
                text = "Pusheen and Smitha walked along the beach."
                r = requests.post(url, data=text, params=request_params)
                print r.json()


                Does anybody know how to use other languages (I need German)?






                share|improve this answer












                Ok, I found that one do this as follows:



                import requests

                url = "http://localhost:9000/tregex"
                request_params = {"pattern": "S"}
                text = "Pusheen and Smitha walked along the beach."
                r = requests.post(url, data=text, params=request_params)
                print r.json()


                Does anybody know how to use other languages (I need German)?







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 6 at 10:21









                moritz

                164




                164






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53155057%2fsegmenting-sentence-into-subsentences-with-corenlp%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    這個網誌中的熱門文章

                    Hercules Kyvelos

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud