Convert garbled Japanese text back to readable Japanese
I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.
If I do a less
on the file's content it shows as
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
If I open it in a text editor I see
ã“ã‚“ã«ã¡ã¯
I'm on a Mac and know there's one command called iconv
, but so far all attempts to decode failed.
How can I convert that back to readable Japanese?
encoding character-encoding decoding japanese
add a comment |
I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.
If I do a less
on the file's content it shows as
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
If I open it in a text editor I see
ã“ã‚“ã«ã¡ã¯
I'm on a Mac and know there's one command called iconv
, but so far all attempts to decode failed.
How can I convert that back to readable Japanese?
encoding character-encoding decoding japanese
2
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23
add a comment |
I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.
If I do a less
on the file's content it shows as
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
If I open it in a text editor I see
ã“ã‚“ã«ã¡ã¯
I'm on a Mac and know there's one command called iconv
, but so far all attempts to decode failed.
How can I convert that back to readable Japanese?
encoding character-encoding decoding japanese
I have a file with garbled Japanese text and need to convert it back to readable Japanese. The problem is that a) I don't know which encoding the original text used, and b) I don't know much about encodings and decodings and how to even go about converting one to the other.
If I do a less
on the file's content it shows as
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
If I open it in a text editor I see
ã“ã‚“ã«ã¡ã¯
I'm on a Mac and know there's one command called iconv
, but so far all attempts to decode failed.
How can I convert that back to readable Japanese?
encoding character-encoding decoding japanese
encoding character-encoding decoding japanese
asked Nov 27 '17 at 0:26
Alex Ixeras
1165
1165
2
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23
add a comment |
2
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23
2
2
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23
add a comment |
1 Answer
1
active
oldest
votes
The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.
On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:
343 227 E3 ã LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON
The text you pasted:
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
Thus becomes:
xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF
We can ask e.g. perl to print this like this:
perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47502478%2fconvert-garbled-japanese-text-back-to-readable-japanese%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.
On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:
343 227 E3 ã LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON
The text you pasted:
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
Thus becomes:
xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF
We can ask e.g. perl to print this like this:
perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは
add a comment |
The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.
On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:
343 227 E3 ã LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON
The text you pasted:
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
Thus becomes:
xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF
We can ask e.g. perl to print this like this:
perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは
add a comment |
The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.
On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:
343 227 E3 ã LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON
The text you pasted:
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
Thus becomes:
xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF
We can ask e.g. perl to print this like this:
perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは
The text you pasted appears to be the CP1252 representation of UTF8. In other words, your text is UTF8.
On many Linux systems, you can execute 'man cp1252' to get the codepoints defined in CP1252. Here are the characters I'm seeing in your pasted text:
343 227 E3 ã LATIN SMALL LETTER A WITH TILDE
202 130 82 ‚ SINGLE LOW-9 QUOTATION MARK
223 147 93 “ LEFT DOUBLE QUOTATION MARK
253 171 AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
241 161 A1 ¡ INVERTED EXCLAMATION MARK
257 175 AF ¯ MACRON
The text you pasted:
ã<U+0081>“ã‚“ã<U+0081>«ã<U+0081>¡ã<U+0081>¯
Thus becomes:
xE3x81x93 xE3x82x93 xE3x81xAB xE3x81xA1 xE3x81xAF
We can ask e.g. perl to print this like this:
perl -e 'print "xE3x81x93xE3x82x93xE3x81xABxE3x81xA1xE3x81xAF"'
こんにちは
answered Nov 12 '18 at 4:46
sneep
1,382715
1,382715
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47502478%2fconvert-garbled-japanese-text-back-to-readable-japanese%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
If garbled, it might not be possible. Text files are a sequence of bytes that represent integers called code units that are produced by a character encoding from codepoints in a character set. The fundamental rule is to read with the encoding the text was written with. To do that, you obviously need metadata, which is probably not stored with the bytes in the file. Any program that you don't tell which encoding to use is just going to guess. Please edit to show the bytes from the file. EUC-JP → 釃釩"祀磧祚
– Tom Blodget
Nov 28 '17 at 4:23