Unable to set character encoding in java.util.Scanner











up vote
4
down vote

favorite












I use Apache Tika to get encoding of file.



            FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());


I use Scanner to read values from file.



                Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}


Scanner is unable to read text from files with encoding windows-1252, I get empty string.



UPDATE 2018.11.07.
I have same problem in case of BufferedReader.



                    Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);

while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}









share|improve this question




























    up vote
    4
    down vote

    favorite












    I use Apache Tika to get encoding of file.



                FileInputStream fis = new FileInputStream(my_file);
    final AutoDetectReader detector = new AutoDetectReader(fis);
    fis.close();
    System.out.println("Encoding:" + detector.getCharset().toString());


    I use Scanner to read values from file.



                    Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
    Map<String, String> values = new HashMap<>();
    String line, key = null, value = null;
    while (scanner.hasNextLine()) {
    line = scanner.nextLine();
    if (line.contains(":")) {
    if (key != null) {
    values.put(key, value.trim());
    key = null;
    value = null;
    }
    int indexOfColon = line.indexOf(":");
    key = line.substring(0, indexOfColon);
    value = line.substring(indexOfColon + 1);
    } else {
    value += " " + line;
    }
    }


    Scanner is unable to read text from files with encoding windows-1252, I get empty string.



    UPDATE 2018.11.07.
    I have same problem in case of BufferedReader.



                        Map<String, String> values = new HashMap<>();
    String line, key = null, value = null;
    FileInputStream is = new FileInputStream(my_file);
    InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
    BufferedReader buffReader = new BufferedReader(isr);

    while (buffReader.readLine() != null) {
    line = buffReader.readLine();
    if (line.contains(":")) {
    if (key != null) {
    values.put(key, value.trim());
    key = null;
    value = null;
    }
    int indexOfColon = line.indexOf(":");
    key = line.substring(0, indexOfColon);
    value = line.substring(indexOfColon + 1);
    } else {
    value += " " + line;
    }
    }









    share|improve this question


























      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      I use Apache Tika to get encoding of file.



                  FileInputStream fis = new FileInputStream(my_file);
      final AutoDetectReader detector = new AutoDetectReader(fis);
      fis.close();
      System.out.println("Encoding:" + detector.getCharset().toString());


      I use Scanner to read values from file.



                      Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
      Map<String, String> values = new HashMap<>();
      String line, key = null, value = null;
      while (scanner.hasNextLine()) {
      line = scanner.nextLine();
      if (line.contains(":")) {
      if (key != null) {
      values.put(key, value.trim());
      key = null;
      value = null;
      }
      int indexOfColon = line.indexOf(":");
      key = line.substring(0, indexOfColon);
      value = line.substring(indexOfColon + 1);
      } else {
      value += " " + line;
      }
      }


      Scanner is unable to read text from files with encoding windows-1252, I get empty string.



      UPDATE 2018.11.07.
      I have same problem in case of BufferedReader.



                          Map<String, String> values = new HashMap<>();
      String line, key = null, value = null;
      FileInputStream is = new FileInputStream(my_file);
      InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
      BufferedReader buffReader = new BufferedReader(isr);

      while (buffReader.readLine() != null) {
      line = buffReader.readLine();
      if (line.contains(":")) {
      if (key != null) {
      values.put(key, value.trim());
      key = null;
      value = null;
      }
      int indexOfColon = line.indexOf(":");
      key = line.substring(0, indexOfColon);
      value = line.substring(indexOfColon + 1);
      } else {
      value += " " + line;
      }
      }









      share|improve this question















      I use Apache Tika to get encoding of file.



                  FileInputStream fis = new FileInputStream(my_file);
      final AutoDetectReader detector = new AutoDetectReader(fis);
      fis.close();
      System.out.println("Encoding:" + detector.getCharset().toString());


      I use Scanner to read values from file.



                      Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
      Map<String, String> values = new HashMap<>();
      String line, key = null, value = null;
      while (scanner.hasNextLine()) {
      line = scanner.nextLine();
      if (line.contains(":")) {
      if (key != null) {
      values.put(key, value.trim());
      key = null;
      value = null;
      }
      int indexOfColon = line.indexOf(":");
      key = line.substring(0, indexOfColon);
      value = line.substring(indexOfColon + 1);
      } else {
      value += " " + line;
      }
      }


      Scanner is unable to read text from files with encoding windows-1252, I get empty string.



      UPDATE 2018.11.07.
      I have same problem in case of BufferedReader.



                          Map<String, String> values = new HashMap<>();
      String line, key = null, value = null;
      FileInputStream is = new FileInputStream(my_file);
      InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
      BufferedReader buffReader = new BufferedReader(isr);

      while (buffReader.readLine() != null) {
      line = buffReader.readLine();
      if (line.contains(":")) {
      if (key != null) {
      values.put(key, value.trim());
      key = null;
      value = null;
      }
      int indexOfColon = line.indexOf(":");
      key = line.substring(0, indexOfColon);
      value = line.substring(indexOfColon + 1);
      } else {
      value += " " + line;
      }
      }






      java java.util.scanner apache-tika






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 7 at 8:23

























      asked Nov 6 at 15:27









      plaidshirt

      55632367




      55632367
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Instead of reading lines, I would try reading characters instead using the following approach:



          ByteArrayOutputStream line = new ByteArrayOutputStream();
          Scanner scanner = new Scanner(my_file);

          while (scanner.hasNextInt()) {
          int c = 0;
          // read every line
          while (c != newline) { // TODO: Check for a newline char
          c = scanner.nextInt();
          line.write((byte) c);
          }
          byte array = line.toByteArray();
          String output = new String(array, "Windows-1252"); // This should do the trick

          // We have a string here, do your logic

          line.reset();
          }


          This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.






          share|improve this answer





















          • It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
            – plaidshirt
            Nov 7 at 7:13










          • Ah that's sad! Does it have hasNextInt() at all?
            – Giovanni Terlingen
            Nov 7 at 12:07










          • No, hasn't, but I replaced it with hasNextLine() method.
            – plaidshirt
            Nov 7 at 12:20










          • I see, but I recommend you to check if the file has any content and that .nextInt() works.
            – Giovanni Terlingen
            Nov 7 at 12:22











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53174957%2funable-to-set-character-encoding-in-java-util-scanner%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Instead of reading lines, I would try reading characters instead using the following approach:



          ByteArrayOutputStream line = new ByteArrayOutputStream();
          Scanner scanner = new Scanner(my_file);

          while (scanner.hasNextInt()) {
          int c = 0;
          // read every line
          while (c != newline) { // TODO: Check for a newline char
          c = scanner.nextInt();
          line.write((byte) c);
          }
          byte array = line.toByteArray();
          String output = new String(array, "Windows-1252"); // This should do the trick

          // We have a string here, do your logic

          line.reset();
          }


          This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.






          share|improve this answer





















          • It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
            – plaidshirt
            Nov 7 at 7:13










          • Ah that's sad! Does it have hasNextInt() at all?
            – Giovanni Terlingen
            Nov 7 at 12:07










          • No, hasn't, but I replaced it with hasNextLine() method.
            – plaidshirt
            Nov 7 at 12:20










          • I see, but I recommend you to check if the file has any content and that .nextInt() works.
            – Giovanni Terlingen
            Nov 7 at 12:22















          up vote
          0
          down vote













          Instead of reading lines, I would try reading characters instead using the following approach:



          ByteArrayOutputStream line = new ByteArrayOutputStream();
          Scanner scanner = new Scanner(my_file);

          while (scanner.hasNextInt()) {
          int c = 0;
          // read every line
          while (c != newline) { // TODO: Check for a newline char
          c = scanner.nextInt();
          line.write((byte) c);
          }
          byte array = line.toByteArray();
          String output = new String(array, "Windows-1252"); // This should do the trick

          // We have a string here, do your logic

          line.reset();
          }


          This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.






          share|improve this answer





















          • It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
            – plaidshirt
            Nov 7 at 7:13










          • Ah that's sad! Does it have hasNextInt() at all?
            – Giovanni Terlingen
            Nov 7 at 12:07










          • No, hasn't, but I replaced it with hasNextLine() method.
            – plaidshirt
            Nov 7 at 12:20










          • I see, but I recommend you to check if the file has any content and that .nextInt() works.
            – Giovanni Terlingen
            Nov 7 at 12:22













          up vote
          0
          down vote










          up vote
          0
          down vote









          Instead of reading lines, I would try reading characters instead using the following approach:



          ByteArrayOutputStream line = new ByteArrayOutputStream();
          Scanner scanner = new Scanner(my_file);

          while (scanner.hasNextInt()) {
          int c = 0;
          // read every line
          while (c != newline) { // TODO: Check for a newline char
          c = scanner.nextInt();
          line.write((byte) c);
          }
          byte array = line.toByteArray();
          String output = new String(array, "Windows-1252"); // This should do the trick

          // We have a string here, do your logic

          line.reset();
          }


          This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.






          share|improve this answer












          Instead of reading lines, I would try reading characters instead using the following approach:



          ByteArrayOutputStream line = new ByteArrayOutputStream();
          Scanner scanner = new Scanner(my_file);

          while (scanner.hasNextInt()) {
          int c = 0;
          // read every line
          while (c != newline) { // TODO: Check for a newline char
          c = scanner.nextInt();
          line.write((byte) c);
          }
          byte array = line.toByteArray();
          String output = new String(array, "Windows-1252"); // This should do the trick

          // We have a string here, do your logic

          line.reset();
          }


          This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 6 at 15:51









          Giovanni Terlingen

          2,4381827




          2,4381827












          • It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
            – plaidshirt
            Nov 7 at 7:13










          • Ah that's sad! Does it have hasNextInt() at all?
            – Giovanni Terlingen
            Nov 7 at 12:07










          • No, hasn't, but I replaced it with hasNextLine() method.
            – plaidshirt
            Nov 7 at 12:20










          • I see, but I recommend you to check if the file has any content and that .nextInt() works.
            – Giovanni Terlingen
            Nov 7 at 12:22


















          • It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
            – plaidshirt
            Nov 7 at 7:13










          • Ah that's sad! Does it have hasNextInt() at all?
            – Giovanni Terlingen
            Nov 7 at 12:07










          • No, hasn't, but I replaced it with hasNextLine() method.
            – plaidshirt
            Nov 7 at 12:20










          • I see, but I recommend you to check if the file has any content and that .nextInt() works.
            – Giovanni Terlingen
            Nov 7 at 12:22
















          It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
          – plaidshirt
          Nov 7 at 7:13




          It has same effect, string is empty. I tried this too: Scanner scanner = new Scanner(new FileInputStream(my_file), detector.getCharset().toString());
          – plaidshirt
          Nov 7 at 7:13












          Ah that's sad! Does it have hasNextInt() at all?
          – Giovanni Terlingen
          Nov 7 at 12:07




          Ah that's sad! Does it have hasNextInt() at all?
          – Giovanni Terlingen
          Nov 7 at 12:07












          No, hasn't, but I replaced it with hasNextLine() method.
          – plaidshirt
          Nov 7 at 12:20




          No, hasn't, but I replaced it with hasNextLine() method.
          – plaidshirt
          Nov 7 at 12:20












          I see, but I recommend you to check if the file has any content and that .nextInt() works.
          – Giovanni Terlingen
          Nov 7 at 12:22




          I see, but I recommend you to check if the file has any content and that .nextInt() works.
          – Giovanni Terlingen
          Nov 7 at 12:22


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53174957%2funable-to-set-character-encoding-in-java-util-scanner%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini