Convert garbled Japanese text back to readable Japanese

I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.

If I do a less on the file's content it shows as

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

If I open it in a text editor I see

ã“ã‚“ã«ã¡ã¯

I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.

How can I convert that back to readable Japanese?

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

2

If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23

add a comment |

If I do a less on the file's content it shows as

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

If I open it in a text editor I see

ã“ã‚“ã«ã¡ã¯

I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.

How can I convert that back to readable Japanese?

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

2

If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23

add a comment |

If I do a less on the file's content it shows as

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

If I open it in a text editor I see

ã“ã‚“ã«ã¡ã¯

I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.

How can I convert that back to readable Japanese?

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

If I do a less on the file's content it shows as

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

If I open it in a text editor I see

ã“ã‚“ã«ã¡ã¯

I'm on a Mac and know there's one command called iconv, but so far all attempts to decode failed.

How can I convert that back to readable Japanese?

encoding character-encoding decoding japanese

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

asked Nov 27 '17 at 0:26

Alex Ixeras

1165

2

If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23

add a comment |

2

If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23

If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23

add a comment |

1 Answer
1

active

oldest

votes

The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.

On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:

   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE

   202   130   82     ‚     SINGLE LOW-9 QUOTATION MARK

   223   147   93     “     LEFT DOUBLE QUOTATION MARK

   253   171   AB     «     LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

   241   161   A1     ¡     INVERTED EXCLAMATION MARK

   257   175   AF     ¯     MACRON

The text you pasted:

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

Thus becomes:

xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF

We can ask e.g. perl to print this like this:

perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'

こんにちは

answered Nov 12 '18 at 4:46

sneep

1,382715

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47502478%2fconvert-garbled-japanese-text-back-to-readable-japanese%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.

On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:

   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE

   202   130   82     ‚     SINGLE LOW-9 QUOTATION MARK

   223   147   93     “     LEFT DOUBLE QUOTATION MARK

   253   171   AB     «     LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

   241   161   A1     ¡     INVERTED EXCLAMATION MARK

   257   175   AF     ¯     MACRON

The text you pasted:

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

Thus becomes:

xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF

We can ask e.g. perl to print this like this:

perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'

こんにちは

answered Nov 12 '18 at 4:46

sneep

1,382715

add a comment |

The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.

On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:

   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE

   202   130   82     ‚     SINGLE LOW-9 QUOTATION MARK

   223   147   93     “     LEFT DOUBLE QUOTATION MARK

   253   171   AB     «     LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

   241   161   A1     ¡     INVERTED EXCLAMATION MARK

   257   175   AF     ¯     MACRON

The text you pasted:

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

Thus becomes:

xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF

We can ask e.g. perl to print this like this:

perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'

こんにちは

answered Nov 12 '18 at 4:46

sneep

1,382715

add a comment |

The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.

On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:

   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE

   202   130   82     ‚     SINGLE LOW-9 QUOTATION MARK

   223   147   93     “     LEFT DOUBLE QUOTATION MARK

   253   171   AB     «     LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

   241   161   A1     ¡     INVERTED EXCLAMATION MARK

   257   175   AF     ¯     MACRON

The text you pasted:

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

Thus becomes:

xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF

We can ask e.g. perl to print this like this:

perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'

こんにちは

answered Nov 12 '18 at 4:46

sneep

1,382715

The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.

On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:

   343   227   E3     ã     LATIN SMALL LETTER A WITH TILDE

   202   130   82     ‚     SINGLE LOW-9 QUOTATION MARK

   223   147   93     “     LEFT DOUBLE QUOTATION MARK

   253   171   AB     «     LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

   241   161   A1     ¡     INVERTED EXCLAMATION MARK

   257   175   AF     ¯     MACRON

The text you pasted:

ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯

Thus becomes:

xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF

We can ask e.g. perl to print this like this:

perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'

こんにちは

answered Nov 12 '18 at 4:46

sneep

1,382715

answered Nov 12 '18 at 4:46

sneep

1,382715

answered Nov 12 '18 at 4:46

sneep

1,382715

answered Nov 12 '18 at 4:46

sneep

1,382715

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

nX69x,o,ZOi4,B,L7AKdTejDRmo,pwZELGwVT,a1B8SzIs7y7XZWpP

搜尋此網誌

Wsrtjtyk