Convert garbled Japanese text back to readable Japanese












0














I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.



If I do a less on the file's content it shows as



ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯



If I open it in a text editor I see



ã“ã‚“ã«ã¡ã¯



I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.



How can I convert that back to readable Japanese?










share|improve this question


















  • 2




    If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
    – Tom Blodget
    Nov 28 '17 at 4:23
















0














I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.



If I do a less on the file's content it shows as



ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯



If I open it in a text editor I see



ã“ã‚“ã«ã¡ã¯



I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.



How can I convert that back to readable Japanese?










share|improve this question


















  • 2




    If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
    – Tom Blodget
    Nov 28 '17 at 4:23














0












0








0







I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.



If I do a less on the file's content it shows as



ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯



If I open it in a text editor I see



ã“ã‚“ã«ã¡ã¯



I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.



How can I convert that back to readable Japanese?










share|improve this question













I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.



If I do a less on the file's content it shows as



ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯



If I open it in a text editor I see



ã“ã‚“ã«ã¡ã¯



I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.



How can I convert that back to readable Japanese?







encoding character-encoding decoding japanese






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 27 '17 at 0:26









Alex Ixeras

1165




1165








  • 2




    If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
    – Tom Blodget
    Nov 28 '17 at 4:23














  • 2




    If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
    – Tom Blodget
    Nov 28 '17 at 4:23








2




2




If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23




If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23












1 Answer
1






active

oldest

votes


















0














The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.



On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:



   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON


The text you pasted:



ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯


Thus becomes:



xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF


We can ask e.g. perl to print this like this:



perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47502478%2fconvert-garbled-japanese-text-back-to-readable-japanese%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.



    On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:



       343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE
    202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
    223 147 93 “ LEFT DOUBLE QUOTATION MARK
    253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
    241 161 A1 ¡ INVERTED EXCLAMATION MARK
    257 175 AF ¯ MACRON


    The text you pasted:



    ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯


    Thus becomes:



    xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF


    We can ask e.g. perl to print this like this:



    perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
    こんにちは





    share|improve this answer


























      0














      The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.



      On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:



         343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE
      202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
      223 147 93 “ LEFT DOUBLE QUOTATION MARK
      253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
      241 161 A1 ¡ INVERTED EXCLAMATION MARK
      257 175 AF ¯ MACRON


      The text you pasted:



      ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯


      Thus becomes:



      xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF


      We can ask e.g. perl to print this like this:



      perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
      こんにちは





      share|improve this answer
























        0












        0








        0






        The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.



        On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:



           343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE
        202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
        223 147 93 “ LEFT DOUBLE QUOTATION MARK
        253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
        241 161 A1 ¡ INVERTED EXCLAMATION MARK
        257 175 AF ¯ MACRON


        The text you pasted:



        ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯


        Thus becomes:



        xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF


        We can ask e.g. perl to print this like this:



        perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
        こんにちは





        share|improve this answer












        The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.



        On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:



           343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE
        202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
        223 147 93 “ LEFT DOUBLE QUOTATION MARK
        253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
        241 161 A1 ¡ INVERTED EXCLAMATION MARK
        257 175 AF ¯ MACRON


        The text you pasted:



        ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯


        Thus becomes:



        xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF


        We can ask e.g. perl to print this like this:



        perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
        こんにちは






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 '18 at 4:46









        sneep

        1,382715




        1,382715






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47502478%2fconvert-garbled-japanese-text-back-to-readable-japanese%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini