Why is random noise assumed to be normally distributed? [duplicate]
up vote
10
down vote
favorite
This question already has an answer here:
Why is Gaussian noise called so?
2 answers
From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?
I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.
One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?
noise gaussian
marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community♦ Nov 8 at 0:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
10
down vote
favorite
This question already has an answer here:
Why is Gaussian noise called so?
2 answers
From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?
I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.
One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?
noise gaussian
marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community♦ Nov 8 at 0:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
2
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
2
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
2
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05
add a comment |
up vote
10
down vote
favorite
up vote
10
down vote
favorite
This question already has an answer here:
Why is Gaussian noise called so?
2 answers
From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?
I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.
One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?
noise gaussian
This question already has an answer here:
Why is Gaussian noise called so?
2 answers
From residual in the linear regression to noise in signal processing are assumed to be normally distributed? By considering them as normally distributed we are kind of telling the pattern in the noise but shouldn't noise be considered random. This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed. Shouldn't the noise distribution be just random?
I believe there is some lacking in my understanding of the concept of statistical distribution which has lead me to this confusion, or I am looking at it all wrong.
One more example- when one augment data by adding Gaussian noise then it is not expected to change the overall distribution of data, why?
This question already has an answer here:
Why is Gaussian noise called so?
2 answers
noise gaussian
noise gaussian
asked Nov 6 at 23:25
zeal
5114
5114
marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community♦ Nov 8 at 0:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by MBaz, lennon310, A_A, AlexTP, Community♦ Nov 8 at 0:20
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
2
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
2
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
2
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05
add a comment |
1
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
2
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
2
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
2
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05
1
1
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
2
2
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
2
2
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
2
2
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05
add a comment |
5 Answers
5
active
oldest
votes
up vote
11
down vote
Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:
This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.
Perhaps the issue here is what ‘random’ means?
To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.
Consider two experiments:
If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)
So there's no contradiction: both cases are random and have a known distribution.
In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.
(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
add a comment |
up vote
9
down vote
the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.
A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.
The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.
The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.
One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.
I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.
There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
add a comment |
up vote
8
down vote
normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.
add a comment |
up vote
7
down vote
I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.
Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:
$$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
text{Probability}&rlap{text{ability in the distribution}}\
text{of distribution}&-2&-1&0&1\
hline
color{blue}{0.3}&0.4&0.2&0.3&0.1\
color{blue}{0.2}&0.5&0.1&0.2&0.2\
color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$
Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:
$$begin{array}{llll}rlap{text{Sample value and}}\
rlap{text{its total probability}}\
-2&-1&0&1\
hline
0.27&0.28&0.33&0.12
end{array}$$
The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:
$$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$
The laws of probability that were applied:
$$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
$$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$
where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.
With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.
add a comment |
up vote
-1
down vote
Noise is not random. It is fractal in nature.
Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
11
down vote
Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:
This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.
Perhaps the issue here is what ‘random’ means?
To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.
Consider two experiments:
If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)
So there's no contradiction: both cases are random and have a known distribution.
In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.
(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
add a comment |
up vote
11
down vote
Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:
This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.
Perhaps the issue here is what ‘random’ means?
To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.
Consider two experiments:
If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)
So there's no contradiction: both cases are random and have a known distribution.
In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.
(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
add a comment |
up vote
11
down vote
up vote
11
down vote
Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:
This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.
Perhaps the issue here is what ‘random’ means?
To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.
Consider two experiments:
If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)
So there's no contradiction: both cases are random and have a known distribution.
In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.
(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)
Starting at an even more basic level than the other (much smarter) answers, I'd like to pick up on this part of the question:
This seems contradictory to me as on one side it is random then on the other side their distribution is considered normally distributed.
Perhaps the issue here is what ‘random’ means?
To be clear: ‘random’ and ‘normally-distributed’ do not contradict each other. ‘Random’ simply means that we can't predict exactly what the next value will be. But that doesn't mean we can't make probabilistic statements about it.
Consider two experiments:
If you throw a (fair) die, then it could show any number from 1 to 6. We can't tell which number will come up, but we can say that all numbers are equally likely (i.e. the distribution is uniform).
If you throw two dice and take their sum, that can be any number from 2 to 12. Again, the sum is still random — we can't predict what it will be — but we can say that those values are not equally likely. (For example, 7 is six times more likely than 12.) So in this case it has a non-uniform distribution. (You can plot all the probabilities; they take on a peaked shape a bit like a normal distribution.)
So there's no contradiction: both cases are random and have a known distribution.
In fact, most things that are random tend to have a non-uniform distribution: electrical noise, weather, the wait for the next bus, voting patterns… Being able to make general statements about them without being able to predict the exact values is one of the strengths of statistics.
(As for why you often end up with a normal distribution, that's a result of the Central Limit Theorem, which says that when you combine many independent random variables, the result tends towards a Gaussian (normal) distribution. So you see that crop up a lot.)
edited Nov 7 at 15:59
answered Nov 7 at 10:22
gidds
2113
2113
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
add a comment |
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
3
3
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
This is the answer that addresses the confusion the original questioner has. More like this please.
– JonathanZ
Nov 7 at 21:17
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
@gidds Thank you for your answer this clear my doubt mildly. Could you please relate this to a problem where we have a binary classification task and both class data come from Gaussian distribution but I guess different parameter (else the classification task is just random and no mathematical can learn any pattern in such data) then what neural-network like model would learn inherently with respect to learning some pattern in data to perform classification?
– zeal
Nov 8 at 1:04
add a comment |
up vote
9
down vote
the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.
A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.
The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.
The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.
One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.
I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.
There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
add a comment |
up vote
9
down vote
the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.
A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.
The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.
The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.
One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.
I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.
There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
add a comment |
up vote
9
down vote
up vote
9
down vote
the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.
A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.
The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.
The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.
One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.
I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.
There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.
the place to look are the weak and strong law of large numbers, which is the basis of the central limit theorem, which states that if you add a large number of independent random variable with some mild conditions on the variance of those random numbers, the sum will become indistinguishable from a Normal Distribution.
A Normal Distribution also has the property of the maximum entropy of all distributions with bound variance.
The Normal Distribution is key in linear estimation but it should be noted that it isn’t the only distribution considered in Signal Processing while it may seem so to a newcomer.
The Normal is often a good model. Many physical noise mechanisms are Normally distributed. It also tends to admit closed form solutions.
One also encounters situations where the Normal assumption works despite not be a fully accurate assumption.
I don’t understand your last statement. Data has a distribution and adding Normal noise doesn’t change that distribution. The Signal and Noise distribution reflects both.
There are are also “refinements” or corrections to Normal Distributions like Gram Chalier series.
edited Nov 7 at 2:15
robert bristow-johnson
10.3k31448
10.3k31448
answered Nov 6 at 23:41
Stanley Pawlukiewicz
5,6962421
5,6962421
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
add a comment |
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
1
1
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
I think his last statement is observing the classical binary modulation distribution-- the distribution is of course changed, but represents two Gaussian curves one centered at a mean of $+sqrt{E}$ and the other at $-sqrt{E}$, with the same distribution from each mean.
– Dan Boschen
Nov 6 at 23:46
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The weak and/or strong laws of large numbers have nothing to do with the matter, and they are not needed in proving the central limit theorem either.
– Dilip Sarwate
Nov 8 at 4:21
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
The law of large numbers preceded the central limit theorem by 30 years. to say it had nothing to do with the matter is not correct
– Stanley Pawlukiewicz
Nov 8 at 7:45
add a comment |
up vote
8
down vote
normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.
add a comment |
up vote
8
down vote
normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.
add a comment |
up vote
8
down vote
up vote
8
down vote
normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.
normal distribution (i like to call it "gaussian") remains normal after addition of normally distributed numbers. so if gaussian goes into an LTI filter, a gaussian distribution comes out. but because of this central limit theorem, even if uniform p.d.f. random process goes into an LTI filter with a long and dense impulse response, what will come out tends to be normally distributed. so the LTI system really only changes some parameters, like the power spectrum or autocorrelation of the signal. an LTI filter can turn a uniform p.d.f. white random process into gaussian p.d.f. pink noise.
answered Nov 7 at 2:26
robert bristow-johnson
10.3k31448
10.3k31448
add a comment |
add a comment |
up vote
7
down vote
I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.
Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:
$$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
text{Probability}&rlap{text{ability in the distribution}}\
text{of distribution}&-2&-1&0&1\
hline
color{blue}{0.3}&0.4&0.2&0.3&0.1\
color{blue}{0.2}&0.5&0.1&0.2&0.2\
color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$
Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:
$$begin{array}{llll}rlap{text{Sample value and}}\
rlap{text{its total probability}}\
-2&-1&0&1\
hline
0.27&0.28&0.33&0.12
end{array}$$
The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:
$$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$
The laws of probability that were applied:
$$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
$$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$
where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.
With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.
add a comment |
up vote
7
down vote
I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.
Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:
$$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
text{Probability}&rlap{text{ability in the distribution}}\
text{of distribution}&-2&-1&0&1\
hline
color{blue}{0.3}&0.4&0.2&0.3&0.1\
color{blue}{0.2}&0.5&0.1&0.2&0.2\
color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$
Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:
$$begin{array}{llll}rlap{text{Sample value and}}\
rlap{text{its total probability}}\
-2&-1&0&1\
hline
0.27&0.28&0.33&0.12
end{array}$$
The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:
$$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$
The laws of probability that were applied:
$$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
$$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$
where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.
With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.
add a comment |
up vote
7
down vote
up vote
7
down vote
I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.
Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:
$$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
text{Probability}&rlap{text{ability in the distribution}}\
text{of distribution}&-2&-1&0&1\
hline
color{blue}{0.3}&0.4&0.2&0.3&0.1\
color{blue}{0.2}&0.5&0.1&0.2&0.2\
color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$
Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:
$$begin{array}{llll}rlap{text{Sample value and}}\
rlap{text{its total probability}}\
-2&-1&0&1\
hline
0.27&0.28&0.33&0.12
end{array}$$
The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:
$$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$
The laws of probability that were applied:
$$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
$$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$
where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.
With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.
I'll try to clear one possible source of confusion. If picking each sample value from a single distribution feels "not random enough", then let's try to make things "more random" by adding another layer of randomness. This will be found to be futile.
Imagine that for each sample the noise is random in the sense that it comes from a distribution that is randomly selected for that sample from a list of possible distributions, each with their own probability of occurrence and a list of probabilities for the possible sample values. Keeping it simple with just three distributions and four possible sample values:
$$begin{array}{l|llll}&rlap{text{Sample value and its prob-}}\
text{Probability}&rlap{text{ability in the distribution}}\
text{of distribution}&-2&-1&0&1\
hline
color{blue}{0.3}&0.4&0.2&0.3&0.1\
color{blue}{0.2}&0.5&0.1&0.2&0.2\
color{blue}{0.5}&0.1&0.4&0.4&0.1end{array}$$
Here we have actually a distribution of distributions. But there is a single distribution that says everything about the probabilities of the values for that sample:
$$begin{array}{llll}rlap{text{Sample value and}}\
rlap{text{its total probability}}\
-2&-1&0&1\
hline
0.27&0.28&0.33&0.12
end{array}$$
The total probabilities were obtained as sums of conditional probabilities of the sample values over the possible distributions:
$$0.4timescolor{blue}{0.3} + 0.5timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.27\
0.2timescolor{blue}{0.3} + 0.1timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.28\
0.3timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.4timescolor{blue}{0.5} = 0.33\
0.1timescolor{blue}{0.3} + 0.2timescolor{blue}{0.2} + 0.1timescolor{blue}{0.5} = 0.12$$
The laws of probability that were applied:
$$P(A_icap B_j) = P(A_i|B_j)color{blue}{P(B_j)}quadtext{conditional probability}$$
$$P(A_i) = sum_jP(A_icap B_j)quadtext{total probability}$$
where $A_i$ are the events of the $itext{th}$ sample value occurring, and $B_j$ are mutually exclusive and exhaustive events of choosing the $jtext{th}$ distribution.
With continuous distributions, similar things would take place, because those can be modeled as discrete distributions in the limit that the number of possible events approaches infinity.
edited Nov 7 at 9:01
answered Nov 7 at 8:41
Olli Niemitalo
7,2971233
7,2971233
add a comment |
add a comment |
up vote
-1
down vote
Noise is not random. It is fractal in nature.
Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.
add a comment |
up vote
-1
down vote
Noise is not random. It is fractal in nature.
Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.
add a comment |
up vote
-1
down vote
up vote
-1
down vote
Noise is not random. It is fractal in nature.
Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.
Noise is not random. It is fractal in nature.
Mandelbrot discovered that while working at IBM. And knowing that led to the improvement of dial-up modems, among other things. Before that, 9600 baud was out of reach.
answered Nov 7 at 21:03
Mike Waters
1073
1073
add a comment |
add a comment |
1
some questions first- are you familiar with the central limit theorem? This helps to understand why many processes from our natural environment are Gaussian distributed. To answer your second question, the distributions will convolve so depending on the distribution of the data it will change the distribution. However in this context we often consider the data to be "signal"- we are often interested in how noise compares to the signal. In this case the noise would be every sample deviation relative to where signal should be- which is the original noise, so has the same dist.
– Dan Boschen
Nov 6 at 23:37
2
I was going to leave an answer along the lines of the physical phenomena but @MBaz 's answer covers that. I think that the way this question is posed it is better to look at "reality" first and then look at the mathematics that are used to describe it. Checkout for example the Gaussian as a solution to the diffusion equation. This can help you, conceptually, to see why it applies to so many things in nature.
– A_A
Nov 7 at 8:55
2
While noise is often assumed to be Gaussian, it's not universally assumed. If the physical process generating the noise is known, a more appropriate model can be used.
– MSalters
Nov 7 at 11:05
2
One common case where noise is not Gaussian is quantization noise. When you digitize an analog signal there will be a difference between the analog value and the digital value that can be represented by the resolution of the A/D converter (8 bits, 12 bits, 16 bits, etc.) That noise is distributed uniformly across the quantization interval.
– Dave
Nov 7 at 21:58
Sometimes this is done just to make a problem mathematically tractable, although one hopes that it is also a 'realistic' assumption.
– JosephDoggie
Nov 7 at 22:05