Matcher for keyword and its children spacy












0















I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:



I have a car with chrome 1000-inch rims.



Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.



In python this is what I'm doing:



test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))

Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART


So, what I want to do is use is something like:



test_matcher = Matcher(nlp.vocab)

test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)


call the test_matcher on test_doc have it return:



chrome 100-inch rims









share|improve this question



























    0















    I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:



    I have a car with chrome 1000-inch rims.



    Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.



    In python this is what I'm doing:



    test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
    print(test_phrases.cats)
    for t in test_phrases:
    print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))

    Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
    Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
    ent_type:
    Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
    Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
    Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
    Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
    Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
    Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
    ent_type:
    Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART


    So, what I want to do is use is something like:



    test_matcher = Matcher(nlp.vocab)

    test_phrase = ['']
    patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
    test_matcher.add('CARPHRASE', None, *patterns)


    call the test_matcher on test_doc have it return:



    chrome 100-inch rims









    share|improve this question

























      0












      0








      0








      I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:



      I have a car with chrome 1000-inch rims.



      Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.



      In python this is what I'm doing:



      test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
      print(test_phrases.cats)
      for t in test_phrases:
      print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))

      Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
      Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
      ent_type:
      Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
      Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
      Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
      Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
      Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
      Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
      ent_type:
      Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART


      So, what I want to do is use is something like:



      test_matcher = Matcher(nlp.vocab)

      test_phrase = ['']
      patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
      test_matcher.add('CARPHRASE', None, *patterns)


      call the test_matcher on test_doc have it return:



      chrome 100-inch rims









      share|improve this question














      I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:



      I have a car with chrome 1000-inch rims.



      Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.



      In python this is what I'm doing:



      test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
      print(test_phrases.cats)
      for t in test_phrases:
      print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))

      Token: I || POS: PRON || DEP: nsubj CHILDREN: || ent_type:
      Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
      ent_type:
      Token: a || POS: DET || DEP: det CHILDREN: || ent_type:
      Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
      Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
      Token: chrome || POS: ADJ || DEP: amod CHILDREN: || ent_type:
      Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: || ent_type:
      Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
      ent_type:
      Token: . || POS: PUNCT || DEP: punct CHILDREN: || ent_type: CARPART


      So, what I want to do is use is something like:



      test_matcher = Matcher(nlp.vocab)

      test_phrase = ['']
      patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
      test_matcher.add('CARPHRASE', None, *patterns)


      call the test_matcher on test_doc have it return:



      chrome 100-inch rims






      parsing nlp spacy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '18 at 0:44









      Aus_10Aus_10

      12318




      12318
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:



          from spacy.matcher import Matcher

          keyword_list = ['rims']
          patterns = [[{'LOWER':kw}] for kw in keyword_list]

          test_matcher.add('TESTPHRASE',None, *patterns)


          def add_children_matches(doc,keyword_matcher):
          '''Add children to match on original single-token keyword.'''
          matches = keyword_matcher(doc)
          for match_id, start, end in matches:
          tokens = doc[start:end]
          print('keyword:',tokens)
          # Since we are getting children for keyword, there should only be one token
          if len(tokens) != 1:
          print('Skipping {}. Too many tokens to match.'.format(tokens))
          continue
          keyword_token = tokens[0]
          sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
          print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])



          doc = nlp("""I have a car with chrome 1000-inch rims.""")
          add_children_matches(doc,test_matcher)


          This gives:



          keyword: rims
          keyphrase: chrome 1000-inch rims


          Edit: To fully answer my own question you'd have to use something like:



           def add_children_matches(doc,keyword_matcher):
          '''Add children to match on original single-token keyword.'''
          matches = keyword_matcher(doc)
          spans =
          for match_id, start, end in matches:
          tokens = doc[start:end]
          print('keyword:',tokens)
          # Since we are getting children for keyword, there should only be one token
          if len(tokens) != 1:
          print('Skipping {}. Too many tokens to match.'.format(tokens))
          continue
          keyword_token = tokens[0]
          sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
          print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])


          start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
          end_char = doc[min(sorted_children):max(sorted_children)+1].end_char

          span = doc.char_span(start_char, end_char,label='CARPHRASE')
          if span != None:
          spans.append(span)

          return doc





          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403752%2fmatcher-for-keyword-and-its-children-spacy%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:



            from spacy.matcher import Matcher

            keyword_list = ['rims']
            patterns = [[{'LOWER':kw}] for kw in keyword_list]

            test_matcher.add('TESTPHRASE',None, *patterns)


            def add_children_matches(doc,keyword_matcher):
            '''Add children to match on original single-token keyword.'''
            matches = keyword_matcher(doc)
            for match_id, start, end in matches:
            tokens = doc[start:end]
            print('keyword:',tokens)
            # Since we are getting children for keyword, there should only be one token
            if len(tokens) != 1:
            print('Skipping {}. Too many tokens to match.'.format(tokens))
            continue
            keyword_token = tokens[0]
            sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
            print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])



            doc = nlp("""I have a car with chrome 1000-inch rims.""")
            add_children_matches(doc,test_matcher)


            This gives:



            keyword: rims
            keyphrase: chrome 1000-inch rims


            Edit: To fully answer my own question you'd have to use something like:



             def add_children_matches(doc,keyword_matcher):
            '''Add children to match on original single-token keyword.'''
            matches = keyword_matcher(doc)
            spans =
            for match_id, start, end in matches:
            tokens = doc[start:end]
            print('keyword:',tokens)
            # Since we are getting children for keyword, there should only be one token
            if len(tokens) != 1:
            print('Skipping {}. Too many tokens to match.'.format(tokens))
            continue
            keyword_token = tokens[0]
            sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
            print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])


            start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
            end_char = doc[min(sorted_children):max(sorted_children)+1].end_char

            span = doc.char_span(start_char, end_char,label='CARPHRASE')
            if span != None:
            spans.append(span)

            return doc





            share|improve this answer






























              0














              I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:



              from spacy.matcher import Matcher

              keyword_list = ['rims']
              patterns = [[{'LOWER':kw}] for kw in keyword_list]

              test_matcher.add('TESTPHRASE',None, *patterns)


              def add_children_matches(doc,keyword_matcher):
              '''Add children to match on original single-token keyword.'''
              matches = keyword_matcher(doc)
              for match_id, start, end in matches:
              tokens = doc[start:end]
              print('keyword:',tokens)
              # Since we are getting children for keyword, there should only be one token
              if len(tokens) != 1:
              print('Skipping {}. Too many tokens to match.'.format(tokens))
              continue
              keyword_token = tokens[0]
              sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
              print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])



              doc = nlp("""I have a car with chrome 1000-inch rims.""")
              add_children_matches(doc,test_matcher)


              This gives:



              keyword: rims
              keyphrase: chrome 1000-inch rims


              Edit: To fully answer my own question you'd have to use something like:



               def add_children_matches(doc,keyword_matcher):
              '''Add children to match on original single-token keyword.'''
              matches = keyword_matcher(doc)
              spans =
              for match_id, start, end in matches:
              tokens = doc[start:end]
              print('keyword:',tokens)
              # Since we are getting children for keyword, there should only be one token
              if len(tokens) != 1:
              print('Skipping {}. Too many tokens to match.'.format(tokens))
              continue
              keyword_token = tokens[0]
              sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
              print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])


              start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
              end_char = doc[min(sorted_children):max(sorted_children)+1].end_char

              span = doc.char_span(start_char, end_char,label='CARPHRASE')
              if span != None:
              spans.append(span)

              return doc





              share|improve this answer




























                0












                0








                0







                I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:



                from spacy.matcher import Matcher

                keyword_list = ['rims']
                patterns = [[{'LOWER':kw}] for kw in keyword_list]

                test_matcher.add('TESTPHRASE',None, *patterns)


                def add_children_matches(doc,keyword_matcher):
                '''Add children to match on original single-token keyword.'''
                matches = keyword_matcher(doc)
                for match_id, start, end in matches:
                tokens = doc[start:end]
                print('keyword:',tokens)
                # Since we are getting children for keyword, there should only be one token
                if len(tokens) != 1:
                print('Skipping {}. Too many tokens to match.'.format(tokens))
                continue
                keyword_token = tokens[0]
                sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
                print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])



                doc = nlp("""I have a car with chrome 1000-inch rims.""")
                add_children_matches(doc,test_matcher)


                This gives:



                keyword: rims
                keyphrase: chrome 1000-inch rims


                Edit: To fully answer my own question you'd have to use something like:



                 def add_children_matches(doc,keyword_matcher):
                '''Add children to match on original single-token keyword.'''
                matches = keyword_matcher(doc)
                spans =
                for match_id, start, end in matches:
                tokens = doc[start:end]
                print('keyword:',tokens)
                # Since we are getting children for keyword, there should only be one token
                if len(tokens) != 1:
                print('Skipping {}. Too many tokens to match.'.format(tokens))
                continue
                keyword_token = tokens[0]
                sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
                print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])


                start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
                end_char = doc[min(sorted_children):max(sorted_children)+1].end_char

                span = doc.char_span(start_char, end_char,label='CARPHRASE')
                if span != None:
                spans.append(span)

                return doc





                share|improve this answer















                I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:



                from spacy.matcher import Matcher

                keyword_list = ['rims']
                patterns = [[{'LOWER':kw}] for kw in keyword_list]

                test_matcher.add('TESTPHRASE',None, *patterns)


                def add_children_matches(doc,keyword_matcher):
                '''Add children to match on original single-token keyword.'''
                matches = keyword_matcher(doc)
                for match_id, start, end in matches:
                tokens = doc[start:end]
                print('keyword:',tokens)
                # Since we are getting children for keyword, there should only be one token
                if len(tokens) != 1:
                print('Skipping {}. Too many tokens to match.'.format(tokens))
                continue
                keyword_token = tokens[0]
                sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
                print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])



                doc = nlp("""I have a car with chrome 1000-inch rims.""")
                add_children_matches(doc,test_matcher)


                This gives:



                keyword: rims
                keyphrase: chrome 1000-inch rims


                Edit: To fully answer my own question you'd have to use something like:



                 def add_children_matches(doc,keyword_matcher):
                '''Add children to match on original single-token keyword.'''
                matches = keyword_matcher(doc)
                spans =
                for match_id, start, end in matches:
                tokens = doc[start:end]
                print('keyword:',tokens)
                # Since we are getting children for keyword, there should only be one token
                if len(tokens) != 1:
                print('Skipping {}. Too many tokens to match.'.format(tokens))
                continue
                keyword_token = tokens[0]
                sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
                print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])


                start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
                end_char = doc[min(sorted_children):max(sorted_children)+1].end_char

                span = doc.char_span(start_char, end_char,label='CARPHRASE')
                if span != None:
                spans.append(span)

                return doc






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 21 '18 at 2:40

























                answered Nov 21 '18 at 2:21









                Aus_10Aus_10

                12318




                12318
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403752%2fmatcher-for-keyword-and-its-children-spacy%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud

                    Zucchini