Why is random noise assumed to be normally distributed? [duplicate]











up vote
10
down vote

favorite
3













This question already has an answer here:




  • Why is Gaussian noise called so?

    2 answers




From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?



I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.



One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?










share|improve this question













marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community Nov 8 at 0:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
    – Dan Boschen
    Nov 6 at 23:37






  • 2




    I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
    – A_A
    Nov 7 at 8:55








  • 2




    While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
    – MSalters
    Nov 7 at 11:05






  • 2




    One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
    – Dave
    Nov 7 at 21:58










  • Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
    – JosephDoggie
    Nov 7 at 22:05















up vote
10
down vote

favorite
3













This question already has an answer here:




  • Why is Gaussian noise called so?

    2 answers




From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?



I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.



One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?










share|improve this question













marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community Nov 8 at 0:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
    – Dan Boschen
    Nov 6 at 23:37






  • 2




    I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
    – A_A
    Nov 7 at 8:55








  • 2




    While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
    – MSalters
    Nov 7 at 11:05






  • 2




    One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
    – Dave
    Nov 7 at 21:58










  • Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
    – JosephDoggie
    Nov 7 at 22:05













up vote
10
down vote

favorite
3









up vote
10
down vote

favorite
3






3






This question already has an answer here:




  • Why is Gaussian noise called so?

    2 answers




From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?



I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.



One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?










share|improve this question














This question already has an answer here:




  • Why is Gaussian noise called so?

    2 answers




From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?



I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.



One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?





This question already has an answer here:




  • Why is Gaussian noise called so?

    2 answers








noise gaussian






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 6 at 23:25









zeal

5114




5114




marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community Nov 8 at 0:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community Nov 8 at 0:20


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1




    some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
    – Dan Boschen
    Nov 6 at 23:37






  • 2




    I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
    – A_A
    Nov 7 at 8:55








  • 2




    While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
    – MSalters
    Nov 7 at 11:05






  • 2




    One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
    – Dave
    Nov 7 at 21:58










  • Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
    – JosephDoggie
    Nov 7 at 22:05














  • 1




    some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
    – Dan Boschen
    Nov 6 at 23:37






  • 2




    I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
    – A_A
    Nov 7 at 8:55








  • 2




    While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
    – MSalters
    Nov 7 at 11:05






  • 2




    One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
    – Dave
    Nov 7 at 21:58










  • Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
    – JosephDoggie
    Nov 7 at 22:05








1




1




some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37




some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37




2




2




I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55






I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55






2




2




While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05




While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05




2




2




One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58




One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58












Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05




Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05










5 Answers
5






active

oldest

votes

















up vote
11
down vote













Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:




This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.




Perhaps the issue here is what ‘random’ means?



To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other.  ‘Random’ simply means that we can't predict exactly what the next value will be.  But that doesn't mean we can't make probabilistic statements about it.



Consider two experiments:




  1. If you throw a (fair) die, then it could show any number from 1 to 6.  We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).


  2. If you throw two dice and take their sum, that can be any number from 2 to 12.  Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely.  (For example, 7 is six times more likely than 12.)  So in this case it has a non-uniform distribution.  (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)



So there's no contradiction: both cases are random and have a known distribution.



In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns…  Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.



(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution.  So you see that crop up a lot.)






share|improve this answer



















  • 3




    This is the answer that addresses the confusion the original questioner has. More like this please.
    – JonathanZ
    Nov 7 at 21:17










  • @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
    – zeal
    Nov 8 at 1:04


















up vote
9
down vote













the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.



A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.



The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.



The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.



One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.



I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.



There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.






share|improve this answer



















  • 1




    I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
    – Dan Boschen
    Nov 6 at 23:46










  • The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
    – Dilip Sarwate
    Nov 8 at 4:21










  • The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
    – Stanley Pawlukiewicz
    Nov 8 at 7:45


















up vote
8
down vote













normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.






share|improve this answer




























    up vote
    7
    down vote













    I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.



    Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:



    $$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
    text{Probability}&rlap{text{ability in the distribution}}\
    text{of distribution}&-2&-1&0&1\
    hline
    color{blue}{0.3}&0.4&0.2&0.3&0.1\
    color{blue}{0.2}&0.5&0.1&0.2&0.2\
    color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$



    Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:



    $$begin{array}{llll}rlap{text{Sample value and}}\
    rlap{text{its total probability}}\
    -2&-1&0&1\
    hline
    0.27&0.28&0.33&0.12
    end{array}$$



    The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:



    $$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
    0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
    0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
    0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$



    The laws of probability that were applied:



    $$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
    $$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$



    where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.



    With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.






    share|improve this answer






























      up vote
      -1
      down vote













      Noise is not random. It is fractal in nature.



      Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.






      share|improve this answer




























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        11
        down vote













        Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:




        This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.




        Perhaps the issue here is what ‘random’ means?



        To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other.  ‘Random’ simply means that we can't predict exactly what the next value will be.  But that doesn't mean we can't make probabilistic statements about it.



        Consider two experiments:




        1. If you throw a (fair) die, then it could show any number from 1 to 6.  We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).


        2. If you throw two dice and take their sum, that can be any number from 2 to 12.  Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely.  (For example, 7 is six times more likely than 12.)  So in this case it has a non-uniform distribution.  (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)



        So there's no contradiction: both cases are random and have a known distribution.



        In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns…  Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.



        (As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution.  So you see that crop up a lot.)






        share|improve this answer



















        • 3




          This is the answer that addresses the confusion the original questioner has. More like this please.
          – JonathanZ
          Nov 7 at 21:17










        • @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
          – zeal
          Nov 8 at 1:04















        up vote
        11
        down vote













        Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:




        This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.




        Perhaps the issue here is what ‘random’ means?



        To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other.  ‘Random’ simply means that we can't predict exactly what the next value will be.  But that doesn't mean we can't make probabilistic statements about it.



        Consider two experiments:




        1. If you throw a (fair) die, then it could show any number from 1 to 6.  We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).


        2. If you throw two dice and take their sum, that can be any number from 2 to 12.  Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely.  (For example, 7 is six times more likely than 12.)  So in this case it has a non-uniform distribution.  (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)



        So there's no contradiction: both cases are random and have a known distribution.



        In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns…  Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.



        (As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution.  So you see that crop up a lot.)






        share|improve this answer



















        • 3




          This is the answer that addresses the confusion the original questioner has. More like this please.
          – JonathanZ
          Nov 7 at 21:17










        • @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
          – zeal
          Nov 8 at 1:04













        up vote
        11
        down vote










        up vote
        11
        down vote









        Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:




        This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.




        Perhaps the issue here is what ‘random’ means?



        To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other.  ‘Random’ simply means that we can't predict exactly what the next value will be.  But that doesn't mean we can't make probabilistic statements about it.



        Consider two experiments:




        1. If you throw a (fair) die, then it could show any number from 1 to 6.  We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).


        2. If you throw two dice and take their sum, that can be any number from 2 to 12.  Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely.  (For example, 7 is six times more likely than 12.)  So in this case it has a non-uniform distribution.  (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)



        So there's no contradiction: both cases are random and have a known distribution.



        In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns…  Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.



        (As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution.  So you see that crop up a lot.)






        share|improve this answer














        Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:




        This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.




        Perhaps the issue here is what ‘random’ means?



        To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other.  ‘Random’ simply means that we can't predict exactly what the next value will be.  But that doesn't mean we can't make probabilistic statements about it.



        Consider two experiments:




        1. If you throw a (fair) die, then it could show any number from 1 to 6.  We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).


        2. If you throw two dice and take their sum, that can be any number from 2 to 12.  Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely.  (For example, 7 is six times more likely than 12.)  So in this case it has a non-uniform distribution.  (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)



        So there's no contradiction: both cases are random and have a known distribution.



        In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns…  Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.



        (As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution.  So you see that crop up a lot.)







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 7 at 15:59

























        answered Nov 7 at 10:22









        gidds

        2113




        2113








        • 3




          This is the answer that addresses the confusion the original questioner has. More like this please.
          – JonathanZ
          Nov 7 at 21:17










        • @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
          – zeal
          Nov 8 at 1:04














        • 3




          This is the answer that addresses the confusion the original questioner has. More like this please.
          – JonathanZ
          Nov 7 at 21:17










        • @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
          – zeal
          Nov 8 at 1:04








        3




        3




        This is the answer that addresses the confusion the original questioner has. More like this please.
        – JonathanZ
        Nov 7 at 21:17




        This is the answer that addresses the confusion the original questioner has. More like this please.
        – JonathanZ
        Nov 7 at 21:17












        @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
        – zeal
        Nov 8 at 1:04




        @gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
        – zeal
        Nov 8 at 1:04










        up vote
        9
        down vote













        the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.



        A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.



        The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.



        The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.



        One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.



        I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.



        There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.






        share|improve this answer



















        • 1




          I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
          – Dan Boschen
          Nov 6 at 23:46










        • The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
          – Dilip Sarwate
          Nov 8 at 4:21










        • The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
          – Stanley Pawlukiewicz
          Nov 8 at 7:45















        up vote
        9
        down vote













        the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.



        A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.



        The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.



        The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.



        One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.



        I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.



        There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.






        share|improve this answer



















        • 1




          I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
          – Dan Boschen
          Nov 6 at 23:46










        • The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
          – Dilip Sarwate
          Nov 8 at 4:21










        • The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
          – Stanley Pawlukiewicz
          Nov 8 at 7:45













        up vote
        9
        down vote










        up vote
        9
        down vote









        the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.



        A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.



        The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.



        The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.



        One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.



        I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.



        There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.






        share|improve this answer














        the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.



        A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.



        The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.



        The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.



        One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.



        I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.



        There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 7 at 2:15









        robert bristow-johnson

        10.3k31448




        10.3k31448










        answered Nov 6 at 23:41









        Stanley Pawlukiewicz

        5,6962421




        5,6962421








        • 1




          I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
          – Dan Boschen
          Nov 6 at 23:46










        • The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
          – Dilip Sarwate
          Nov 8 at 4:21










        • The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
          – Stanley Pawlukiewicz
          Nov 8 at 7:45














        • 1




          I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
          – Dan Boschen
          Nov 6 at 23:46










        • The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
          – Dilip Sarwate
          Nov 8 at 4:21










        • The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
          – Stanley Pawlukiewicz
          Nov 8 at 7:45








        1




        1




        I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
        – Dan Boschen
        Nov 6 at 23:46




        I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
        – Dan Boschen
        Nov 6 at 23:46












        The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
        – Dilip Sarwate
        Nov 8 at 4:21




        The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
        – Dilip Sarwate
        Nov 8 at 4:21












        The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
        – Stanley Pawlukiewicz
        Nov 8 at 7:45




        The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
        – Stanley Pawlukiewicz
        Nov 8 at 7:45










        up vote
        8
        down vote













        normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.






        share|improve this answer

























          up vote
          8
          down vote













          normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.






          share|improve this answer























            up vote
            8
            down vote










            up vote
            8
            down vote









            normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.






            share|improve this answer












            normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 7 at 2:26









            robert bristow-johnson

            10.3k31448




            10.3k31448






















                up vote
                7
                down vote













                I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.



                Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:



                $$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
                text{Probability}&rlap{text{ability in the distribution}}\
                text{of distribution}&-2&-1&0&1\
                hline
                color{blue}{0.3}&0.4&0.2&0.3&0.1\
                color{blue}{0.2}&0.5&0.1&0.2&0.2\
                color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$



                Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:



                $$begin{array}{llll}rlap{text{Sample value and}}\
                rlap{text{its total probability}}\
                -2&-1&0&1\
                hline
                0.27&0.28&0.33&0.12
                end{array}$$



                The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:



                $$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
                0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
                0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
                0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$



                The laws of probability that were applied:



                $$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
                $$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$



                where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.



                With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.






                share|improve this answer



























                  up vote
                  7
                  down vote













                  I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.



                  Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:



                  $$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
                  text{Probability}&rlap{text{ability in the distribution}}\
                  text{of distribution}&-2&-1&0&1\
                  hline
                  color{blue}{0.3}&0.4&0.2&0.3&0.1\
                  color{blue}{0.2}&0.5&0.1&0.2&0.2\
                  color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$



                  Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:



                  $$begin{array}{llll}rlap{text{Sample value and}}\
                  rlap{text{its total probability}}\
                  -2&-1&0&1\
                  hline
                  0.27&0.28&0.33&0.12
                  end{array}$$



                  The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:



                  $$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
                  0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
                  0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
                  0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$



                  The laws of probability that were applied:



                  $$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
                  $$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$



                  where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.



                  With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.






                  share|improve this answer

























                    up vote
                    7
                    down vote










                    up vote
                    7
                    down vote









                    I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.



                    Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:



                    $$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
                    text{Probability}&rlap{text{ability in the distribution}}\
                    text{of distribution}&-2&-1&0&1\
                    hline
                    color{blue}{0.3}&0.4&0.2&0.3&0.1\
                    color{blue}{0.2}&0.5&0.1&0.2&0.2\
                    color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$



                    Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:



                    $$begin{array}{llll}rlap{text{Sample value and}}\
                    rlap{text{its total probability}}\
                    -2&-1&0&1\
                    hline
                    0.27&0.28&0.33&0.12
                    end{array}$$



                    The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:



                    $$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
                    0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
                    0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
                    0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$



                    The laws of probability that were applied:



                    $$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
                    $$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$



                    where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.



                    With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.






                    share|improve this answer














                    I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.



                    Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:



                    $$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
                    text{Probability}&rlap{text{ability in the distribution}}\
                    text{of distribution}&-2&-1&0&1\
                    hline
                    color{blue}{0.3}&0.4&0.2&0.3&0.1\
                    color{blue}{0.2}&0.5&0.1&0.2&0.2\
                    color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$



                    Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:



                    $$begin{array}{llll}rlap{text{Sample value and}}\
                    rlap{text{its total probability}}\
                    -2&-1&0&1\
                    hline
                    0.27&0.28&0.33&0.12
                    end{array}$$



                    The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:



                    $$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
                    0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
                    0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
                    0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$



                    The laws of probability that were applied:



                    $$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
                    $$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$



                    where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.



                    With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 7 at 9:01

























                    answered Nov 7 at 8:41









                    Olli Niemitalo

                    7,2971233




                    7,2971233






















                        up vote
                        -1
                        down vote













                        Noise is not random. It is fractal in nature.



                        Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.






                        share|improve this answer

























                          up vote
                          -1
                          down vote













                          Noise is not random. It is fractal in nature.



                          Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.






                          share|improve this answer























                            up vote
                            -1
                            down vote










                            up vote
                            -1
                            down vote









                            Noise is not random. It is fractal in nature.



                            Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.






                            share|improve this answer












                            Noise is not random. It is fractal in nature.



                            Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 7 at 21:03









                            Mike Waters

                            1073




                            1073















                                這個網誌中的熱門文章

                                Xamarin.form Move up view when keyboard appear

                                Post-Redirect-Get with Spring WebFlux and Thymeleaf

                                Anylogic : not able to use stopDelay()