R script - PDF error: Illegal character in hex string; when I am searching for keywords





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I am trying to count the number of keywords in multiple pdf files.



library(tm)
library(pdftools)

files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))

words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))


When I run the code I always get this errors:



PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)


If you have any suggestions, please let me know. Thank you!










share|improve this question




















  • 1





    I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

    – phiver
    Nov 24 '18 at 11:07











  • You did a library(pdftools). What happens wen you try to use it?

    – hrbrmstr
    Nov 24 '18 at 17:44











  • library(pdftools) works good, there is no error at all.

    – Daniel Meyer
    Nov 24 '18 at 18:46











  • @DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

    – Sanjay Mehrotra
    Dec 13 '18 at 16:35


















0















I am trying to count the number of keywords in multiple pdf files.



library(tm)
library(pdftools)

files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))

words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))


When I run the code I always get this errors:



PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)


If you have any suggestions, please let me know. Thank you!










share|improve this question




















  • 1





    I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

    – phiver
    Nov 24 '18 at 11:07











  • You did a library(pdftools). What happens wen you try to use it?

    – hrbrmstr
    Nov 24 '18 at 17:44











  • library(pdftools) works good, there is no error at all.

    – Daniel Meyer
    Nov 24 '18 at 18:46











  • @DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

    – Sanjay Mehrotra
    Dec 13 '18 at 16:35














0












0








0








I am trying to count the number of keywords in multiple pdf files.



library(tm)
library(pdftools)

files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))

words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))


When I run the code I always get this errors:



PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)


If you have any suggestions, please let me know. Thank you!










share|improve this question
















I am trying to count the number of keywords in multiple pdf files.



library(tm)
library(pdftools)

files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))

words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))


When I run the code I always get this errors:



PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)


If you have any suggestions, please let me know. Thank you!







r pdf text-mining






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 9:54









phiver

13.9k92936




13.9k92936










asked Nov 24 '18 at 5:25









Daniel MeyerDaniel Meyer

41




41








  • 1





    I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

    – phiver
    Nov 24 '18 at 11:07











  • You did a library(pdftools). What happens wen you try to use it?

    – hrbrmstr
    Nov 24 '18 at 17:44











  • library(pdftools) works good, there is no error at all.

    – Daniel Meyer
    Nov 24 '18 at 18:46











  • @DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

    – Sanjay Mehrotra
    Dec 13 '18 at 16:35














  • 1





    I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

    – phiver
    Nov 24 '18 at 11:07











  • You did a library(pdftools). What happens wen you try to use it?

    – hrbrmstr
    Nov 24 '18 at 17:44











  • library(pdftools) works good, there is no error at all.

    – Daniel Meyer
    Nov 24 '18 at 18:46











  • @DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

    – Sanjay Mehrotra
    Dec 13 '18 at 16:35








1




1





I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

– phiver
Nov 24 '18 at 11:07





I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the warnings() to your question.

– phiver
Nov 24 '18 at 11:07













You did a library(pdftools). What happens wen you try to use it?

– hrbrmstr
Nov 24 '18 at 17:44





You did a library(pdftools). What happens wen you try to use it?

– hrbrmstr
Nov 24 '18 at 17:44













library(pdftools) works good, there is no error at all.

– Daniel Meyer
Nov 24 '18 at 18:46





library(pdftools) works good, there is no error at all.

– Daniel Meyer
Nov 24 '18 at 18:46













@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

– Sanjay Mehrotra
Dec 13 '18 at 16:35





@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?

– Sanjay Mehrotra
Dec 13 '18 at 16:35












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455431%2fr-script-pdf-error-illegal-character-in-hex-string-when-i-am-searching-for-k%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455431%2fr-script-pdf-error-illegal-character-in-hex-string-when-i-am-searching-for-k%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Academy of Television Arts & Sciences

L'Équipe

1995 France bombings