D Unicode string literals: can't print specific Unicode character





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I'm just trying to pick up D having come from C++. I'm sure it's something very basic, but I can't find any documentation to help me. I'm trying to print the character à, which is U+00E0. I am trying to assign this character to a variable and then use write() to output it to the console.



I'm told by this website that U+00E0 is encoded as 0xC3 0xA0 in UTF-8, 0x00E0 in UTF-16 and 0x000000E0 in UTF-32.



Note that for everything I've tried, I've tried replacing string with char and wstring with wchar. I've also tried with and without the w or d suffixes after wide strings.



These methods return the compiler error, "Invalid trailing code unit":



string str = "à";
wstring str = "à"w;
dstring str = "à"d;


These methods print a totally different character (Ò U+00D2):



string str = "xE0";
string str = hexString!"E0";


And all these methods print what looks like ˧á (note á ≠ à!), which is UTF-16 0x2E7 0x00E1:



string str = "xC3xA0";
wstring str = "u00E0"w;
dstring str = "U000000E0"d;


Any ideas?










share|improve this question

























  • What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

    – Adam D. Ruppe
    Nov 23 '18 at 17:31






  • 1





    The bottommost result looks like it thinks the encoding is IBM437.

    – Mr Lister
    Nov 23 '18 at 20:56











  • Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

    – Joe C
    Nov 24 '18 at 14:32











  • You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

    – Adam D. Ruppe
    Nov 24 '18 at 21:50


















1















I'm just trying to pick up D having come from C++. I'm sure it's something very basic, but I can't find any documentation to help me. I'm trying to print the character à, which is U+00E0. I am trying to assign this character to a variable and then use write() to output it to the console.



I'm told by this website that U+00E0 is encoded as 0xC3 0xA0 in UTF-8, 0x00E0 in UTF-16 and 0x000000E0 in UTF-32.



Note that for everything I've tried, I've tried replacing string with char and wstring with wchar. I've also tried with and without the w or d suffixes after wide strings.



These methods return the compiler error, "Invalid trailing code unit":



string str = "à";
wstring str = "à"w;
dstring str = "à"d;


These methods print a totally different character (Ò U+00D2):



string str = "xE0";
string str = hexString!"E0";


And all these methods print what looks like ˧á (note á ≠ à!), which is UTF-16 0x2E7 0x00E1:



string str = "xC3xA0";
wstring str = "u00E0"w;
dstring str = "U000000E0"d;


Any ideas?










share|improve this question

























  • What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

    – Adam D. Ruppe
    Nov 23 '18 at 17:31






  • 1





    The bottommost result looks like it thinks the encoding is IBM437.

    – Mr Lister
    Nov 23 '18 at 20:56











  • Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

    – Joe C
    Nov 24 '18 at 14:32











  • You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

    – Adam D. Ruppe
    Nov 24 '18 at 21:50














1












1








1








I'm just trying to pick up D having come from C++. I'm sure it's something very basic, but I can't find any documentation to help me. I'm trying to print the character à, which is U+00E0. I am trying to assign this character to a variable and then use write() to output it to the console.



I'm told by this website that U+00E0 is encoded as 0xC3 0xA0 in UTF-8, 0x00E0 in UTF-16 and 0x000000E0 in UTF-32.



Note that for everything I've tried, I've tried replacing string with char and wstring with wchar. I've also tried with and without the w or d suffixes after wide strings.



These methods return the compiler error, "Invalid trailing code unit":



string str = "à";
wstring str = "à"w;
dstring str = "à"d;


These methods print a totally different character (Ò U+00D2):



string str = "xE0";
string str = hexString!"E0";


And all these methods print what looks like ˧á (note á ≠ à!), which is UTF-16 0x2E7 0x00E1:



string str = "xC3xA0";
wstring str = "u00E0"w;
dstring str = "U000000E0"d;


Any ideas?










share|improve this question
















I'm just trying to pick up D having come from C++. I'm sure it's something very basic, but I can't find any documentation to help me. I'm trying to print the character à, which is U+00E0. I am trying to assign this character to a variable and then use write() to output it to the console.



I'm told by this website that U+00E0 is encoded as 0xC3 0xA0 in UTF-8, 0x00E0 in UTF-16 and 0x000000E0 in UTF-32.



Note that for everything I've tried, I've tried replacing string with char and wstring with wchar. I've also tried with and without the w or d suffixes after wide strings.



These methods return the compiler error, "Invalid trailing code unit":



string str = "à";
wstring str = "à"w;
dstring str = "à"d;


These methods print a totally different character (Ò U+00D2):



string str = "xE0";
string str = hexString!"E0";


And all these methods print what looks like ˧á (note á ≠ à!), which is UTF-16 0x2E7 0x00E1:



string str = "xC3xA0";
wstring str = "u00E0"w;
dstring str = "U000000E0"d;


Any ideas?







unicode d unicode-string unicode-escapes






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 20:53









0xdd

178114




178114










asked Nov 23 '18 at 17:28









Joe CJoe C

143




143













  • What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

    – Adam D. Ruppe
    Nov 23 '18 at 17:31






  • 1





    The bottommost result looks like it thinks the encoding is IBM437.

    – Mr Lister
    Nov 23 '18 at 20:56











  • Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

    – Joe C
    Nov 24 '18 at 14:32











  • You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

    – Adam D. Ruppe
    Nov 24 '18 at 21:50



















  • What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

    – Adam D. Ruppe
    Nov 23 '18 at 17:31






  • 1





    The bottommost result looks like it thinks the encoding is IBM437.

    – Mr Lister
    Nov 23 '18 at 20:56











  • Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

    – Joe C
    Nov 24 '18 at 14:32











  • You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

    – Adam D. Ruppe
    Nov 24 '18 at 21:50

















What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

– Adam D. Ruppe
Nov 23 '18 at 17:31





What encoding are you saving the source file in and what encoding is your output terminal set to? And what operating system are you on? The language itself defines this stuff, but reading from source and writing to screen can introduce misunderstandings.

– Adam D. Ruppe
Nov 23 '18 at 17:31




1




1





The bottommost result looks like it thinks the encoding is IBM437.

– Mr Lister
Nov 23 '18 at 20:56





The bottommost result looks like it thinks the encoding is IBM437.

– Mr Lister
Nov 23 '18 at 20:56













Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

– Joe C
Nov 24 '18 at 14:32





Thanks for responding! I'm on 64-bit Windows 10.0.17134. Trying to find or alter the source file encoding in Code::Blocks is a bit unclear. It seems to have previously been encoded in a WINDOWS encoding, but I've now switched it to UTF-32LE, recreated the project and issues continue. I find it quite likely that the issue is just in writing to the console, but this is essential to my needs. There seems to be a solution for C (docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) - is there a D equivalent?

– Joe C
Nov 24 '18 at 14:32













You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

– Adam D. Ruppe
Nov 24 '18 at 21:50





You want the source encoded as UTF-8. The D compiler is a bit picky on that. Though if you can't do that, you can also stick to ASCII in the source and use uxxxx escapes to write the other characters. For the output, that same function is the answer: remember, D can call C functions the same as C. So yeah, SetConsoleOutputCP(65001) before doing output should work. You can import core.sys.windows.windows; to make that function defined.

– Adam D. Ruppe
Nov 24 '18 at 21:50












2 Answers
2






active

oldest

votes


















1














I confirmed it works on my Windows box, so gonna type this up as an answer now.



In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.



Or, you can replace the characters in your source code with uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "u00E0" is good, and will work for any type of string (not just wstring like in your example).



Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:



import core.sys.windows.windows;
import std.stdio;

void main() {
SetConsoleOutputCP(65001);
writeln("Hi u00E0");
}


printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.



BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.



The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!



Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.






share|improve this answer
























  • Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

    – Joe C
    Nov 26 '18 at 14:24











  • Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

    – Adam D. Ruppe
    Nov 26 '18 at 16:42



















0














D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.



E.g.



import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}


Will output as UTF-8 the character you seek.



Which you can then hard code like so:



import std.stdio;
void main() {
string str = "à";
writeln(str);
}





share|improve this answer
























  • Thanks for having a go, but sadly these have the same problems as the methods I already tried...

    – Joe C
    Nov 24 '18 at 14:33












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450841%2fd-unicode-string-literals-cant-print-specific-unicode-character%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I confirmed it works on my Windows box, so gonna type this up as an answer now.



In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.



Or, you can replace the characters in your source code with uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "u00E0" is good, and will work for any type of string (not just wstring like in your example).



Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:



import core.sys.windows.windows;
import std.stdio;

void main() {
SetConsoleOutputCP(65001);
writeln("Hi u00E0");
}


printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.



BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.



The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!



Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.






share|improve this answer
























  • Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

    – Joe C
    Nov 26 '18 at 14:24











  • Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

    – Adam D. Ruppe
    Nov 26 '18 at 16:42
















1














I confirmed it works on my Windows box, so gonna type this up as an answer now.



In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.



Or, you can replace the characters in your source code with uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "u00E0" is good, and will work for any type of string (not just wstring like in your example).



Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:



import core.sys.windows.windows;
import std.stdio;

void main() {
SetConsoleOutputCP(65001);
writeln("Hi u00E0");
}


printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.



BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.



The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!



Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.






share|improve this answer
























  • Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

    – Joe C
    Nov 26 '18 at 14:24











  • Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

    – Adam D. Ruppe
    Nov 26 '18 at 16:42














1












1








1







I confirmed it works on my Windows box, so gonna type this up as an answer now.



In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.



Or, you can replace the characters in your source code with uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "u00E0" is good, and will work for any type of string (not just wstring like in your example).



Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:



import core.sys.windows.windows;
import std.stdio;

void main() {
SetConsoleOutputCP(65001);
writeln("Hi u00E0");
}


printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.



BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.



The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!



Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.






share|improve this answer













I confirmed it works on my Windows box, so gonna type this up as an answer now.



In the source code, if you copy/paste the characters directly, make sure your editor is saving it in utf8 encoding. The D compiler insists on it, so if it gives a compile error about a utf thing, that's probably why. I have never used c:b but an old answer on the web said edit->encodings... it is a setting somewhere in the editor regardless.



Or, you can replace the characters in your source code with uxxxx in the strings. Do NOT use the hexstring thing, that is for binary bytes, but your example of "u00E0" is good, and will work for any type of string (not just wstring like in your example).



Then, on the output side, it depends on your target because the program just outputs bytes, and it is up to the recipient program to interpret it correctly. Since you said you are on Windows, the key is to set the console code page to utf-8 so it knows what you are trying to do. Indeed, the same C function can be called from D too. Leading to this program:



import core.sys.windows.windows;
import std.stdio;

void main() {
SetConsoleOutputCP(65001);
writeln("Hi u00E0");
}


printing it successfully. On older Windows versions, you might need to change your font to see the character too (as opposed to the generic box it shows because some fonts don't have all the characters), but on my Windows 10 box, it just worked with the default font.



BTW, technically the console code page a shared setting (after running the program and it exits, you can still hit properties on your console window and see the change reflected there) and you should perhaps set it back when your program exits. You could get that at startup with the get function ( https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp ), store it in a local var, and set it back on exit. You could auto ccp = GetConsoleOutputCP(); SetConsoleOutputCP(65005;) scope(exit) SetConsoleOutputCP(ccp); right at startup - the scope exit will run when the function exits, so doing it in main would be kinda convenient. Just add some error checking if you want.



The Microsoft docs don't say anything about setting it back, so it probably doesn't actually matter, but still I wanna mention it just in case. But also the knowledge that it is shared and persists can help in debugging - if it works after you comment it, it isn't because the code isn't necessary, it is just because it was set previously and not unset yet!



Note that running it from an IDE might not be exactly the same, because IDEs often pipe the output instead of running it right out to the Windows console. If that happens, lemme know and we can type up some stuff about that for future readers too. But you can also open your own copy of the console (run the program outside the IDE) and it should show correctly for you.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 25 '18 at 0:51









Adam D. RuppeAdam D. Ruppe

22.9k43353




22.9k43353













  • Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

    – Joe C
    Nov 26 '18 at 14:24











  • Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

    – Adam D. Ruppe
    Nov 26 '18 at 16:42



















  • Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

    – Joe C
    Nov 26 '18 at 14:24











  • Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

    – Adam D. Ruppe
    Nov 26 '18 at 16:42

















Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

– Joe C
Nov 26 '18 at 14:24





Brilliant, works a charm! Just to note that the UFT-8 encoding "xC3xA0" works just as well as "u00E0", which is the same character in UTF-16.

– Joe C
Nov 26 '18 at 14:24













Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

– Adam D. Ruppe
Nov 26 '18 at 16:42





Right, you can do it byte by byte, but the compiler will translate the various codepoints (strictly speaking, the uxxxx is not utf-16, it is the unicode code point number) into the correct encoding for the given string. So using the u stuff will make the right utf-8 bytes in that context, or utf-16 bytes in that context, etc.

– Adam D. Ruppe
Nov 26 '18 at 16:42













0














D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.



E.g.



import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}


Will output as UTF-8 the character you seek.



Which you can then hard code like so:



import std.stdio;
void main() {
string str = "à";
writeln(str);
}





share|improve this answer
























  • Thanks for having a go, but sadly these have the same problems as the methods I already tried...

    – Joe C
    Nov 24 '18 at 14:33
















0














D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.



E.g.



import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}


Will output as UTF-8 the character you seek.



Which you can then hard code like so:



import std.stdio;
void main() {
string str = "à";
writeln(str);
}





share|improve this answer
























  • Thanks for having a go, but sadly these have the same problems as the methods I already tried...

    – Joe C
    Nov 24 '18 at 14:33














0












0








0







D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.



E.g.



import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}


Will output as UTF-8 the character you seek.



Which you can then hard code like so:



import std.stdio;
void main() {
string str = "à";
writeln(str);
}





share|improve this answer













D source code needs to be encoded as UTF-8.
My guess is that you're putting a UTF-16 character into the UTF-8 source file.



E.g.



import std.stdio;
void main() {
writeln(cast(char)0xC3, cast(char)0xA0);
}


Will output as UTF-8 the character you seek.



Which you can then hard code like so:



import std.stdio;
void main() {
string str = "à";
writeln(str);
}






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 23 '18 at 17:36









Richard Andrew CattermoleRichard Andrew Cattermole

46328




46328













  • Thanks for having a go, but sadly these have the same problems as the methods I already tried...

    – Joe C
    Nov 24 '18 at 14:33



















  • Thanks for having a go, but sadly these have the same problems as the methods I already tried...

    – Joe C
    Nov 24 '18 at 14:33

















Thanks for having a go, but sadly these have the same problems as the methods I already tried...

– Joe C
Nov 24 '18 at 14:33





Thanks for having a go, but sadly these have the same problems as the methods I already tried...

– Joe C
Nov 24 '18 at 14:33


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450841%2fd-unicode-string-literals-cant-print-specific-unicode-character%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini