Unable to set character encoding in java.util.Scanner
up vote
4
down vote
favorite
I use Apache Tika
to get encoding of file.
FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());
I use Scanner
to read values from file.
Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
Scanner
is unable to read text from files with encoding windows-1252
, I get empty string.
UPDATE 2018.11.07.
I have same problem in case of BufferedReader.
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);
while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
java java.util.scanner apache-tika
add a comment |
up vote
4
down vote
favorite
I use Apache Tika
to get encoding of file.
FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());
I use Scanner
to read values from file.
Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
Scanner
is unable to read text from files with encoding windows-1252
, I get empty string.
UPDATE 2018.11.07.
I have same problem in case of BufferedReader.
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);
while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
java java.util.scanner apache-tika
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I use Apache Tika
to get encoding of file.
FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());
I use Scanner
to read values from file.
Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
Scanner
is unable to read text from files with encoding windows-1252
, I get empty string.
UPDATE 2018.11.07.
I have same problem in case of BufferedReader.
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);
while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
java java.util.scanner apache-tika
I use Apache Tika
to get encoding of file.
FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());
I use Scanner
to read values from file.
Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
Scanner
is unable to read text from files with encoding windows-1252
, I get empty string.
UPDATE 2018.11.07.
I have same problem in case of BufferedReader.
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);
while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
java java.util.scanner apache-tika
java java.util.scanner apache-tika
edited Nov 7 at 8:23
asked Nov 6 at 15:27
plaidshirt
55632367
55632367
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String
which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it havehasNextInt()
at all?
– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that.nextInt()
works.
– Giovanni Terlingen
Nov 7 at 12:22
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String
which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it havehasNextInt()
at all?
– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that.nextInt()
works.
– Giovanni Terlingen
Nov 7 at 12:22
add a comment |
up vote
0
down vote
Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String
which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it havehasNextInt()
at all?
– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that.nextInt()
works.
– Giovanni Terlingen
Nov 7 at 12:22
add a comment |
up vote
0
down vote
up vote
0
down vote
Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String
which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.
Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String
which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.
answered Nov 6 at 15:51
Giovanni Terlingen
2,4381827
2,4381827
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it havehasNextInt()
at all?
– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that.nextInt()
works.
– Giovanni Terlingen
Nov 7 at 12:22
add a comment |
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it havehasNextInt()
at all?
– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that.nextInt()
works.
– Giovanni Terlingen
Nov 7 at 12:22
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
– plaidshirt
Nov 7 at 7:13
Ah that's sad! Does it have
hasNextInt()
at all?– Giovanni Terlingen
Nov 7 at 12:07
Ah that's sad! Does it have
hasNextInt()
at all?– Giovanni Terlingen
Nov 7 at 12:07
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
No, hasn't, but I replaced it with hasNextLine() method.
– plaidshirt
Nov 7 at 12:20
I see, but I recommend you to check if the file has any content and that
.nextInt()
works.– Giovanni Terlingen
Nov 7 at 12:22
I see, but I recommend you to check if the file has any content and that
.nextInt()
works.– Giovanni Terlingen
Nov 7 at 12:22
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53174957%2funable-to-set-character-encoding-in-java-util-scanner%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown