How to safely pass an arbitrary text as parameter to a program in a shell script?












2















I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.



My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
enter image description here



The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:



echo '%t' >> history.txt


So the full command to be executed is:



echo ''&& rm -rf some_dir'' >> history.txt


Obviously, that's a bad idea.



The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:



#!/bin/sh
echo "$1" >> history.txt


It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?










share|improve this question

























  • I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

    – John Bollinger
    Nov 18 '18 at 20:05











  • I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

    – paddy
    Nov 18 '18 at 20:06






  • 1





    What's the problem with the user specifying rm as the command to run? Also, what data?

    – melpomene
    Nov 18 '18 at 20:08






  • 2





    If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

    – Daniel Pryden
    Nov 18 '18 at 20:26






  • 1





    BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

    – Charles Duffy
    Nov 19 '18 at 2:01


















2















I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.



My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
enter image description here



The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:



echo '%t' >> history.txt


So the full command to be executed is:



echo ''&& rm -rf some_dir'' >> history.txt


Obviously, that's a bad idea.



The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:



#!/bin/sh
echo "$1" >> history.txt


It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?










share|improve this question

























  • I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

    – John Bollinger
    Nov 18 '18 at 20:05











  • I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

    – paddy
    Nov 18 '18 at 20:06






  • 1





    What's the problem with the user specifying rm as the command to run? Also, what data?

    – melpomene
    Nov 18 '18 at 20:08






  • 2





    If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

    – Daniel Pryden
    Nov 18 '18 at 20:26






  • 1





    BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

    – Charles Duffy
    Nov 19 '18 at 2:01
















2












2








2








I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.



My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
enter image description here



The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:



echo '%t' >> history.txt


So the full command to be executed is:



echo ''&& rm -rf some_dir'' >> history.txt


Obviously, that's a bad idea.



The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:



#!/bin/sh
echo "$1" >> history.txt


It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?










share|improve this question
















I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.



My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
enter image description here



The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:



echo '%t' >> history.txt


So the full command to be executed is:



echo ''&& rm -rf some_dir'' >> history.txt


Obviously, that's a bad idea.



The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:



#!/bin/sh
echo "$1" >> history.txt


It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?







c bash sh posix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 18 '18 at 20:33







danpla

















asked Nov 18 '18 at 19:58









danpladanpla

34336




34336













  • I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

    – John Bollinger
    Nov 18 '18 at 20:05











  • I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

    – paddy
    Nov 18 '18 at 20:06






  • 1





    What's the problem with the user specifying rm as the command to run? Also, what data?

    – melpomene
    Nov 18 '18 at 20:08






  • 2





    If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

    – Daniel Pryden
    Nov 18 '18 at 20:26






  • 1





    BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

    – Charles Duffy
    Nov 19 '18 at 2:01





















  • I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

    – John Bollinger
    Nov 18 '18 at 20:05











  • I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

    – paddy
    Nov 18 '18 at 20:06






  • 1





    What's the problem with the user specifying rm as the command to run? Also, what data?

    – melpomene
    Nov 18 '18 at 20:08






  • 2





    If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

    – Daniel Pryden
    Nov 18 '18 at 20:26






  • 1





    BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

    – Charles Duffy
    Nov 19 '18 at 2:01



















I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

– John Bollinger
Nov 18 '18 at 20:05





I'm a bit lost. Are you asking about how your C program should pass arguments to a shell script, or about how a shell script should handle arbitrary arguments? Or about something yet else?

– John Bollinger
Nov 18 '18 at 20:05













I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

– paddy
Nov 18 '18 at 20:06





I'm also lost by this description. Code is more helpful to understand what you're trying to do, especially since this question is tagged as [c]. In general, the way to pass arbitrary data into another program is with a pipe.

– paddy
Nov 18 '18 at 20:06




1




1





What's the problem with the user specifying rm as the command to run? Also, what data?

– melpomene
Nov 18 '18 at 20:08





What's the problem with the user specifying rm as the command to run? Also, what data?

– melpomene
Nov 18 '18 at 20:08




2




2





If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

– Daniel Pryden
Nov 18 '18 at 20:26





If you're letting the user specify an arbitrary command to execute, then the game is already over. Arbitrary code execution leads to arbitrary code execution. If you want to control or whitelist the command to execute while still allowing for arbitrary arguments, then that's a workable problem. The solution is to avoid letting the shell interpret your command line, by keeping all the arguments as distinct strings the whole way.

– Daniel Pryden
Nov 18 '18 at 20:26




1




1





BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

– Charles Duffy
Nov 19 '18 at 2:01







BTW -- thank you for thinking about this. People who don't are why we (as an industry) have command injection vulnerabilities everywhere.

– Charles Duffy
Nov 19 '18 at 2:01














2 Answers
2






active

oldest

votes


















3














In-Band: Escaping Arbitrary Data In An Unquoted Context



Don't do this. See the "Out-Of-Band" section below.



To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:




  • Prepend a ' (moving from the required initial unquoted context to a single-quoted context).

  • Replace each literal ' within the data with the string '"'"'. These characters work as follows:



    1. ' closes the initial single-quoted context.


    2. " enters a double-quoted context.


    3. ' is, in a double-quoted context, literal.


    4. " closes the double-quoted context.


    5. ' re-enters single-quoted context.



  • Append a ' (returning to the required initial single-quoted context).


This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.



However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.



(One might ask why '"'"' is advised instead of '''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).





Out-Of-Band: Environment Variables, Or Command-Line Arguments



Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.



In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).



Using Environment Variables



Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:



setenv("torrent_name", torrent_name_str, 1);
setenv("torrent_category", torrent_category_str, 1);
setenv("save_path", path_str, 1);

# shell script should use "$torrent_name", etc
system(user_provided_shell_script);


A few notes:




  • While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").

  • Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.




Using Command-Line Arguments:



This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.



/* You'll need to do the usual fork() before this, and the usual waitpid() after
* if you want to let it complete before proceeding.
* Lots of Q&A entries on the site already showing the context.
*/
execl("/bin/sh", "-c", user_provided_shell_script,
"sh", /* this is $0 in the script */
torrent_name_str, /* this is $1 in the script */
torrent_category_str, /* this is $2 in the script */
path_str, /* this is $3 in the script */
NUL);





share|improve this answer

































    0














    Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.



    There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).






    share|improve this answer
























    • I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

      – Charles Duffy
      Nov 18 '18 at 20:35













    • @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

      – tadman
      Nov 18 '18 at 20:41











    • @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

      – R..
      Nov 18 '18 at 20:41






    • 1





      @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

      – Charles Duffy
      Nov 18 '18 at 20:44













    • @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

      – Charles Duffy
      Nov 18 '18 at 20:46











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53364883%2fhow-to-safely-pass-an-arbitrary-text-as-parameter-to-a-program-in-a-shell-script%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    In-Band: Escaping Arbitrary Data In An Unquoted Context



    Don't do this. See the "Out-Of-Band" section below.



    To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:




    • Prepend a ' (moving from the required initial unquoted context to a single-quoted context).

    • Replace each literal ' within the data with the string '"'"'. These characters work as follows:



      1. ' closes the initial single-quoted context.


      2. " enters a double-quoted context.


      3. ' is, in a double-quoted context, literal.


      4. " closes the double-quoted context.


      5. ' re-enters single-quoted context.



    • Append a ' (returning to the required initial single-quoted context).


    This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.



    However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.



    (One might ask why '"'"' is advised instead of '''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).





    Out-Of-Band: Environment Variables, Or Command-Line Arguments



    Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.



    In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).



    Using Environment Variables



    Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:



    setenv("torrent_name", torrent_name_str, 1);
    setenv("torrent_category", torrent_category_str, 1);
    setenv("save_path", path_str, 1);

    # shell script should use "$torrent_name", etc
    system(user_provided_shell_script);


    A few notes:




    • While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").

    • Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.




    Using Command-Line Arguments:



    This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.



    /* You'll need to do the usual fork() before this, and the usual waitpid() after
    * if you want to let it complete before proceeding.
    * Lots of Q&A entries on the site already showing the context.
    */
    execl("/bin/sh", "-c", user_provided_shell_script,
    "sh", /* this is $0 in the script */
    torrent_name_str, /* this is $1 in the script */
    torrent_category_str, /* this is $2 in the script */
    path_str, /* this is $3 in the script */
    NUL);





    share|improve this answer






























      3














      In-Band: Escaping Arbitrary Data In An Unquoted Context



      Don't do this. See the "Out-Of-Band" section below.



      To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:




      • Prepend a ' (moving from the required initial unquoted context to a single-quoted context).

      • Replace each literal ' within the data with the string '"'"'. These characters work as follows:



        1. ' closes the initial single-quoted context.


        2. " enters a double-quoted context.


        3. ' is, in a double-quoted context, literal.


        4. " closes the double-quoted context.


        5. ' re-enters single-quoted context.



      • Append a ' (returning to the required initial single-quoted context).


      This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.



      However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.



      (One might ask why '"'"' is advised instead of '''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).





      Out-Of-Band: Environment Variables, Or Command-Line Arguments



      Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.



      In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).



      Using Environment Variables



      Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:



      setenv("torrent_name", torrent_name_str, 1);
      setenv("torrent_category", torrent_category_str, 1);
      setenv("save_path", path_str, 1);

      # shell script should use "$torrent_name", etc
      system(user_provided_shell_script);


      A few notes:




      • While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").

      • Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.




      Using Command-Line Arguments:



      This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.



      /* You'll need to do the usual fork() before this, and the usual waitpid() after
      * if you want to let it complete before proceeding.
      * Lots of Q&A entries on the site already showing the context.
      */
      execl("/bin/sh", "-c", user_provided_shell_script,
      "sh", /* this is $0 in the script */
      torrent_name_str, /* this is $1 in the script */
      torrent_category_str, /* this is $2 in the script */
      path_str, /* this is $3 in the script */
      NUL);





      share|improve this answer




























        3












        3








        3







        In-Band: Escaping Arbitrary Data In An Unquoted Context



        Don't do this. See the "Out-Of-Band" section below.



        To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:




        • Prepend a ' (moving from the required initial unquoted context to a single-quoted context).

        • Replace each literal ' within the data with the string '"'"'. These characters work as follows:



          1. ' closes the initial single-quoted context.


          2. " enters a double-quoted context.


          3. ' is, in a double-quoted context, literal.


          4. " closes the double-quoted context.


          5. ' re-enters single-quoted context.



        • Append a ' (returning to the required initial single-quoted context).


        This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.



        However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.



        (One might ask why '"'"' is advised instead of '''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).





        Out-Of-Band: Environment Variables, Or Command-Line Arguments



        Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.



        In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).



        Using Environment Variables



        Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:



        setenv("torrent_name", torrent_name_str, 1);
        setenv("torrent_category", torrent_category_str, 1);
        setenv("save_path", path_str, 1);

        # shell script should use "$torrent_name", etc
        system(user_provided_shell_script);


        A few notes:




        • While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").

        • Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.




        Using Command-Line Arguments:



        This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.



        /* You'll need to do the usual fork() before this, and the usual waitpid() after
        * if you want to let it complete before proceeding.
        * Lots of Q&A entries on the site already showing the context.
        */
        execl("/bin/sh", "-c", user_provided_shell_script,
        "sh", /* this is $0 in the script */
        torrent_name_str, /* this is $1 in the script */
        torrent_category_str, /* this is $2 in the script */
        path_str, /* this is $3 in the script */
        NUL);





        share|improve this answer















        In-Band: Escaping Arbitrary Data In An Unquoted Context



        Don't do this. See the "Out-Of-Band" section below.



        To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:




        • Prepend a ' (moving from the required initial unquoted context to a single-quoted context).

        • Replace each literal ' within the data with the string '"'"'. These characters work as follows:



          1. ' closes the initial single-quoted context.


          2. " enters a double-quoted context.


          3. ' is, in a double-quoted context, literal.


          4. " closes the double-quoted context.


          5. ' re-enters single-quoted context.



        • Append a ' (returning to the required initial single-quoted context).


        This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.



        However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.



        (One might ask why '"'"' is advised instead of '''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).





        Out-Of-Band: Environment Variables, Or Command-Line Arguments



        Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.



        In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).



        Using Environment Variables



        Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:



        setenv("torrent_name", torrent_name_str, 1);
        setenv("torrent_category", torrent_category_str, 1);
        setenv("save_path", path_str, 1);

        # shell script should use "$torrent_name", etc
        system(user_provided_shell_script);


        A few notes:




        • While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").

        • Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.




        Using Command-Line Arguments:



        This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.



        /* You'll need to do the usual fork() before this, and the usual waitpid() after
        * if you want to let it complete before proceeding.
        * Lots of Q&A entries on the site already showing the context.
        */
        execl("/bin/sh", "-c", user_provided_shell_script,
        "sh", /* this is $0 in the script */
        torrent_name_str, /* this is $1 in the script */
        torrent_category_str, /* this is $2 in the script */
        path_str, /* this is $3 in the script */
        NUL);






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 '18 at 5:12

























        answered Nov 18 '18 at 20:36









        Charles DuffyCharles Duffy

        176k25200254




        176k25200254

























            0














            Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.



            There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).






            share|improve this answer
























            • I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

              – Charles Duffy
              Nov 18 '18 at 20:35













            • @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

              – tadman
              Nov 18 '18 at 20:41











            • @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

              – R..
              Nov 18 '18 at 20:41






            • 1





              @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

              – Charles Duffy
              Nov 18 '18 at 20:44













            • @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

              – Charles Duffy
              Nov 18 '18 at 20:46
















            0














            Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.



            There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).






            share|improve this answer
























            • I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

              – Charles Duffy
              Nov 18 '18 at 20:35













            • @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

              – tadman
              Nov 18 '18 at 20:41











            • @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

              – R..
              Nov 18 '18 at 20:41






            • 1





              @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

              – Charles Duffy
              Nov 18 '18 at 20:44













            • @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

              – Charles Duffy
              Nov 18 '18 at 20:46














            0












            0








            0







            Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.



            There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).






            share|improve this answer













            Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.



            There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 18 '18 at 20:19









            tadmantadman

            154k18175208




            154k18175208













            • I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

              – Charles Duffy
              Nov 18 '18 at 20:35













            • @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

              – tadman
              Nov 18 '18 at 20:41











            • @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

              – R..
              Nov 18 '18 at 20:41






            • 1





              @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

              – Charles Duffy
              Nov 18 '18 at 20:44













            • @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

              – Charles Duffy
              Nov 18 '18 at 20:46



















            • I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

              – Charles Duffy
              Nov 18 '18 at 20:35













            • @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

              – tadman
              Nov 18 '18 at 20:41











            • @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

              – R..
              Nov 18 '18 at 20:41






            • 1





              @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

              – Charles Duffy
              Nov 18 '18 at 20:44













            • @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

              – Charles Duffy
              Nov 18 '18 at 20:46

















            I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

            – Charles Duffy
            Nov 18 '18 at 20:35







            I strongly disagree -- instead of trying to escape data to pass it inline with code, it should be passed out-of-band from that code, so no escaping is needed.

            – Charles Duffy
            Nov 18 '18 at 20:35















            @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

            – tadman
            Nov 18 '18 at 20:41





            @CharlesDuffy That's what I'm trying to say with the second part and exec which has options of avoiding escaping.

            – tadman
            Nov 18 '18 at 20:41













            @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

            – R..
            Nov 18 '18 at 20:41





            @CharlesDuffy: Why? That's purely a style preference. Shell escaping is trivial and not subject to botching.

            – R..
            Nov 18 '18 at 20:41




            1




            1





            @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

            – Charles Duffy
            Nov 18 '18 at 20:44







            @R.., the "why" is the long history of people and projects that get this "trivial" task wrong. I've found a few such exploitable bugs in the wild myself (years ago, now fixed, but one of them in a major Apache library and another in a tool intended to be used for remote execution of commands from a Java frontend).

            – Charles Duffy
            Nov 18 '18 at 20:44















            @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

            – Charles Duffy
            Nov 18 '18 at 20:46





            @R.., ...POSIX escaping is pretty easy to do right, but the other thing to keep in mind is that not every shell implements only POSIX-baseline syntax; folks who try to be clever and only do the minimal amount of escaping needed can run afoul of extensions.

            – Charles Duffy
            Nov 18 '18 at 20:46


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53364883%2fhow-to-safely-pass-an-arbitrary-text-as-parameter-to-a-program-in-a-shell-script%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini