R script - PDF error: Illegal character in hex string; when I am searching for keywords
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am trying to count the number of keywords in multiple pdf files.
library(tm)
library(pdftools)
files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))
words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))
When I run the code I always get this errors:
PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)
If you have any suggestions, please let me know. Thank you!
r pdf text-mining
add a comment |
I am trying to count the number of keywords in multiple pdf files.
library(tm)
library(pdftools)
files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))
words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))
When I run the code I always get this errors:
PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)
If you have any suggestions, please let me know. Thank you!
r pdf text-mining
1
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of thewarnings()to your question.
– phiver
Nov 24 '18 at 11:07
You did alibrary(pdftools). What happens wen you try to use it?
– hrbrmstr
Nov 24 '18 at 17:44
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of filesPDF error (21): Illegal character '{'and this aborts all my processing upto that point. How did you manage to get around this error?
– Sanjay Mehrotra
Dec 13 '18 at 16:35
add a comment |
I am trying to count the number of keywords in multiple pdf files.
library(tm)
library(pdftools)
files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))
words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))
When I run the code I always get this errors:
PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)
If you have any suggestions, please let me know. Thank you!
r pdf text-mining
I am trying to count the number of keywords in multiple pdf files.
library(tm)
library(pdftools)
files <- list.files(pattern = "pdf$")
Rpdf <- readPDF(control = list(text = "-layout"))
corp <- Corpus(URISource(files), readerControl = list(reader = Rpdf))
words <- c("example", "keyword", "test")
dt <- DocumentTermMatrix(corp, control=list(dictionary=words))
When I run the code I always get this errors:
PDF error: May not be a PDF file (continuing anyway)
PDF error (3): Illegal character <21> in hex string
PDF error (5): Illegal character <4f> in hex string
PDF error (7): Illegal character <54> in hex string
PDF error (8): Illegal character <59> in hex string
PDF error (9): Illegal character <50> in hex string
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
In addition: There were 12 warnings (use warnings() to see them)
If you have any suggestions, please let me know. Thank you!
r pdf text-mining
r pdf text-mining
edited Nov 24 '18 at 9:54
phiver
13.9k92936
13.9k92936
asked Nov 24 '18 at 5:25
Daniel MeyerDaniel Meyer
41
41
1
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of thewarnings()to your question.
– phiver
Nov 24 '18 at 11:07
You did alibrary(pdftools). What happens wen you try to use it?
– hrbrmstr
Nov 24 '18 at 17:44
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of filesPDF error (21): Illegal character '{'and this aborts all my processing upto that point. How did you manage to get around this error?
– Sanjay Mehrotra
Dec 13 '18 at 16:35
add a comment |
1
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of thewarnings()to your question.
– phiver
Nov 24 '18 at 11:07
You did alibrary(pdftools). What happens wen you try to use it?
– hrbrmstr
Nov 24 '18 at 17:44
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of filesPDF error (21): Illegal character '{'and this aborts all my processing upto that point. How did you manage to get around this error?
– Sanjay Mehrotra
Dec 13 '18 at 16:35
1
1
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the
warnings() to your question.– phiver
Nov 24 '18 at 11:07
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the
warnings() to your question.– phiver
Nov 24 '18 at 11:07
You did a
library(pdftools). What happens wen you try to use it?– hrbrmstr
Nov 24 '18 at 17:44
You did a
library(pdftools). What happens wen you try to use it?– hrbrmstr
Nov 24 '18 at 17:44
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files
PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?– Sanjay Mehrotra
Dec 13 '18 at 16:35
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files
PDF error (21): Illegal character '{' and this aborts all my processing upto that point. How did you manage to get around this error?– Sanjay Mehrotra
Dec 13 '18 at 16:35
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455431%2fr-script-pdf-error-illegal-character-in-hex-string-when-i-am-searching-for-k%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455431%2fr-script-pdf-error-illegal-character-in-hex-string-when-i-am-searching-for-k%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I can't reproduce your error. You will have to point to an example pdf that generates this error. Also please add the results of the
warnings()to your question.– phiver
Nov 24 '18 at 11:07
You did a
library(pdftools). What happens wen you try to use it?– hrbrmstr
Nov 24 '18 at 17:44
library(pdftools) works good, there is no error at all.
– Daniel Meyer
Nov 24 '18 at 18:46
@DanielMeyer - did you manage to get a solution to this? I am also getting a similar error on a specific pdf file in a large set of files
PDF error (21): Illegal character '{'and this aborts all my processing upto that point. How did you manage to get around this error?– Sanjay Mehrotra
Dec 13 '18 at 16:35