PCA error infinite or missing values in 'x'
I have really been struggling with using R to analyze financial data. I am new to programming in general, really, except very accustomed to doing work in Excel. Consequently, I have spent a lot of time (probably too much time) formatting my CSV file, just so I could minimize the hassle when working in R, but this hasn't worked.
Here is my code for PCA analysis. I have only gotten it to work when I have used smaller data files with no N/As or blanks, but I need to know how to handle these in R.
returns <- read.csv("PCA Data File.csv", skip = 1, header = T)
#standardize the variables
returns.pca <- prcomp(returns[2:ncol(returns)], scale = TRUE)
The result is:
Error in svd(x, nu = 0) : infinite or missing values in 'x'
Many questions arise from this, the first being how do you resolve this? Second, how do I explore my data to make sure missing values are properly addressed or replaced? Is it the fact that my data is a data.frame and not matrix that is causing the issue?
I am not sure how to attach the CSV file, but here are the first few rows from the file (there are 241 rows):
Date Returns Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 Var33 Var34 Var35 Var36 Var37 Var38 Var39 Var40 Var41 Var42 Var43 Var44 Var45 Var46 Var47 Var48 Var49 Var50 Var51 Var52 Var53 Var54 Var55 Var56 Var57 Var58 Var59 Var60 Var61
6/30/2014 0.48 18.12 9.44 107.43 19.53 1.92 11.54 0.99 3.33 98.83 0.44 2.59 3.42 105.15 308.59 80.44 1.36 0.94 102.07 1.69 331.47 53656.02 21897.39 11022.87 23144.90 15131.80 0.59 2.70 1.35 0.58 0.33 0.25 103.38 1.67 2.59 3.42 1.75 0.10 1.09 2.00 -0.11 1.24 2.08 0.22 138780.00
5/31/2014 1.52 17.63 9.44 107.18 14.36 1.96 12.48 1.01 3.49 98.60 0.37 2.55 3.39 101.79 306.79 79.96 1.37 0.93 101.84 1.68 324.69 53122.21 21159.31 10558.07 22584.93 14343.14 0.59 2.62 1.40 0.52 0.41 0.11 103.39 1.58 2.55 3.39 1.81 0.09 1.11 1.96 -0.07 1.15 2.29 0.47 3.50 1.49 138492.00 171.04 11302.80 4322654.00 55.40 -44.39 441.59 1000.70 117.44 11.60 6.50 1.50 0.50
4/30/2014 1.07 17.40 9.45 107.11 22.93 1.96 14.20 1.02 3.49 98.24 0.40 2.69 3.52 102.03 308.63 79.85 1.38 0.93 102.51 1.67 323.24 51470.08 21660.07 10399.85 22598.44 14475.33 0.61 2.67 1.53 0.53 0.47 0.06 103.47 1.69 2.69 3.52 1.82 0.09 1.49 2.08 0.02 1.16 2.04 -4.63 0.04 3.50 1.42 138268.00 171.58 11227.50 4296049.00 54.90 -47.04 425.02 204.90 117.57 11.60 27.30 6.60 1.80 1.40
3/31/2014 0.50 17.51 9.51 106.40 25.98 1.95 14.84 1.09 3.65 98.40 0.38 2.72 3.62 100.51 303.49 79.87 1.38 0.91 102.36 1.66 316.98 47046.98 20839.70 10097.38 21980.77 14694.83 0.61 2.72 1.59 0.52 0.48 0.04 103.44 1.63 2.72 3.62 1.99 0.08 1.73 2.10 0.00 1.13 2.02 0.91 3.30 1.20 137964.00 171.47 11169.00 4226971.00 53.70 -44.18 452.77 608.80 117.39 11.70 15.10 27.30 6.80 1.60 0.20
2/28/2014 1.76 17.10 9.52 106.27 25.35 1.96 15.47 1.13 3.88 98.46 0.31 2.70 3.66 100.68 294.91 80.44 1.37 0.90 102.12 1.66 315.92 47367.89 20039.38 10048.23 22188.31 14617.57 0.60 2.74 1.66 0.44 0.44 0.01 103.45 1.50 2.69 3.66 2.16 0.07 1.82 2.10 -0.05 1.04 1.87 0.91 3.10 1.08 137761.00 169.34 11133.50 4159972.00 53.20 -42.59 383.36 -48.40 116.28 11.70 27.30 6.90 1.70 1.70
r
add a comment |
I have really been struggling with using R to analyze financial data. I am new to programming in general, really, except very accustomed to doing work in Excel. Consequently, I have spent a lot of time (probably too much time) formatting my CSV file, just so I could minimize the hassle when working in R, but this hasn't worked.
Here is my code for PCA analysis. I have only gotten it to work when I have used smaller data files with no N/As or blanks, but I need to know how to handle these in R.
returns <- read.csv("PCA Data File.csv", skip = 1, header = T)
#standardize the variables
returns.pca <- prcomp(returns[2:ncol(returns)], scale = TRUE)
The result is:
Error in svd(x, nu = 0) : infinite or missing values in 'x'
Many questions arise from this, the first being how do you resolve this? Second, how do I explore my data to make sure missing values are properly addressed or replaced? Is it the fact that my data is a data.frame and not matrix that is causing the issue?
I am not sure how to attach the CSV file, but here are the first few rows from the file (there are 241 rows):
Date Returns Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 Var33 Var34 Var35 Var36 Var37 Var38 Var39 Var40 Var41 Var42 Var43 Var44 Var45 Var46 Var47 Var48 Var49 Var50 Var51 Var52 Var53 Var54 Var55 Var56 Var57 Var58 Var59 Var60 Var61
6/30/2014 0.48 18.12 9.44 107.43 19.53 1.92 11.54 0.99 3.33 98.83 0.44 2.59 3.42 105.15 308.59 80.44 1.36 0.94 102.07 1.69 331.47 53656.02 21897.39 11022.87 23144.90 15131.80 0.59 2.70 1.35 0.58 0.33 0.25 103.38 1.67 2.59 3.42 1.75 0.10 1.09 2.00 -0.11 1.24 2.08 0.22 138780.00
5/31/2014 1.52 17.63 9.44 107.18 14.36 1.96 12.48 1.01 3.49 98.60 0.37 2.55 3.39 101.79 306.79 79.96 1.37 0.93 101.84 1.68 324.69 53122.21 21159.31 10558.07 22584.93 14343.14 0.59 2.62 1.40 0.52 0.41 0.11 103.39 1.58 2.55 3.39 1.81 0.09 1.11 1.96 -0.07 1.15 2.29 0.47 3.50 1.49 138492.00 171.04 11302.80 4322654.00 55.40 -44.39 441.59 1000.70 117.44 11.60 6.50 1.50 0.50
4/30/2014 1.07 17.40 9.45 107.11 22.93 1.96 14.20 1.02 3.49 98.24 0.40 2.69 3.52 102.03 308.63 79.85 1.38 0.93 102.51 1.67 323.24 51470.08 21660.07 10399.85 22598.44 14475.33 0.61 2.67 1.53 0.53 0.47 0.06 103.47 1.69 2.69 3.52 1.82 0.09 1.49 2.08 0.02 1.16 2.04 -4.63 0.04 3.50 1.42 138268.00 171.58 11227.50 4296049.00 54.90 -47.04 425.02 204.90 117.57 11.60 27.30 6.60 1.80 1.40
3/31/2014 0.50 17.51 9.51 106.40 25.98 1.95 14.84 1.09 3.65 98.40 0.38 2.72 3.62 100.51 303.49 79.87 1.38 0.91 102.36 1.66 316.98 47046.98 20839.70 10097.38 21980.77 14694.83 0.61 2.72 1.59 0.52 0.48 0.04 103.44 1.63 2.72 3.62 1.99 0.08 1.73 2.10 0.00 1.13 2.02 0.91 3.30 1.20 137964.00 171.47 11169.00 4226971.00 53.70 -44.18 452.77 608.80 117.39 11.70 15.10 27.30 6.80 1.60 0.20
2/28/2014 1.76 17.10 9.52 106.27 25.35 1.96 15.47 1.13 3.88 98.46 0.31 2.70 3.66 100.68 294.91 80.44 1.37 0.90 102.12 1.66 315.92 47367.89 20039.38 10048.23 22188.31 14617.57 0.60 2.74 1.66 0.44 0.44 0.01 103.45 1.50 2.69 3.66 2.16 0.07 1.82 2.10 -0.05 1.04 1.87 0.91 3.10 1.08 137761.00 169.34 11133.50 4159972.00 53.20 -42.59 383.36 -48.40 116.28 11.70 27.30 6.90 1.70 1.70
r
Paste in the output ofdput(head(returns, 10))rather than the current copy-paste.
– Thomas
Jul 29 '14 at 19:48
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22
add a comment |
I have really been struggling with using R to analyze financial data. I am new to programming in general, really, except very accustomed to doing work in Excel. Consequently, I have spent a lot of time (probably too much time) formatting my CSV file, just so I could minimize the hassle when working in R, but this hasn't worked.
Here is my code for PCA analysis. I have only gotten it to work when I have used smaller data files with no N/As or blanks, but I need to know how to handle these in R.
returns <- read.csv("PCA Data File.csv", skip = 1, header = T)
#standardize the variables
returns.pca <- prcomp(returns[2:ncol(returns)], scale = TRUE)
The result is:
Error in svd(x, nu = 0) : infinite or missing values in 'x'
Many questions arise from this, the first being how do you resolve this? Second, how do I explore my data to make sure missing values are properly addressed or replaced? Is it the fact that my data is a data.frame and not matrix that is causing the issue?
I am not sure how to attach the CSV file, but here are the first few rows from the file (there are 241 rows):
Date Returns Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 Var33 Var34 Var35 Var36 Var37 Var38 Var39 Var40 Var41 Var42 Var43 Var44 Var45 Var46 Var47 Var48 Var49 Var50 Var51 Var52 Var53 Var54 Var55 Var56 Var57 Var58 Var59 Var60 Var61
6/30/2014 0.48 18.12 9.44 107.43 19.53 1.92 11.54 0.99 3.33 98.83 0.44 2.59 3.42 105.15 308.59 80.44 1.36 0.94 102.07 1.69 331.47 53656.02 21897.39 11022.87 23144.90 15131.80 0.59 2.70 1.35 0.58 0.33 0.25 103.38 1.67 2.59 3.42 1.75 0.10 1.09 2.00 -0.11 1.24 2.08 0.22 138780.00
5/31/2014 1.52 17.63 9.44 107.18 14.36 1.96 12.48 1.01 3.49 98.60 0.37 2.55 3.39 101.79 306.79 79.96 1.37 0.93 101.84 1.68 324.69 53122.21 21159.31 10558.07 22584.93 14343.14 0.59 2.62 1.40 0.52 0.41 0.11 103.39 1.58 2.55 3.39 1.81 0.09 1.11 1.96 -0.07 1.15 2.29 0.47 3.50 1.49 138492.00 171.04 11302.80 4322654.00 55.40 -44.39 441.59 1000.70 117.44 11.60 6.50 1.50 0.50
4/30/2014 1.07 17.40 9.45 107.11 22.93 1.96 14.20 1.02 3.49 98.24 0.40 2.69 3.52 102.03 308.63 79.85 1.38 0.93 102.51 1.67 323.24 51470.08 21660.07 10399.85 22598.44 14475.33 0.61 2.67 1.53 0.53 0.47 0.06 103.47 1.69 2.69 3.52 1.82 0.09 1.49 2.08 0.02 1.16 2.04 -4.63 0.04 3.50 1.42 138268.00 171.58 11227.50 4296049.00 54.90 -47.04 425.02 204.90 117.57 11.60 27.30 6.60 1.80 1.40
3/31/2014 0.50 17.51 9.51 106.40 25.98 1.95 14.84 1.09 3.65 98.40 0.38 2.72 3.62 100.51 303.49 79.87 1.38 0.91 102.36 1.66 316.98 47046.98 20839.70 10097.38 21980.77 14694.83 0.61 2.72 1.59 0.52 0.48 0.04 103.44 1.63 2.72 3.62 1.99 0.08 1.73 2.10 0.00 1.13 2.02 0.91 3.30 1.20 137964.00 171.47 11169.00 4226971.00 53.70 -44.18 452.77 608.80 117.39 11.70 15.10 27.30 6.80 1.60 0.20
2/28/2014 1.76 17.10 9.52 106.27 25.35 1.96 15.47 1.13 3.88 98.46 0.31 2.70 3.66 100.68 294.91 80.44 1.37 0.90 102.12 1.66 315.92 47367.89 20039.38 10048.23 22188.31 14617.57 0.60 2.74 1.66 0.44 0.44 0.01 103.45 1.50 2.69 3.66 2.16 0.07 1.82 2.10 -0.05 1.04 1.87 0.91 3.10 1.08 137761.00 169.34 11133.50 4159972.00 53.20 -42.59 383.36 -48.40 116.28 11.70 27.30 6.90 1.70 1.70
r
I have really been struggling with using R to analyze financial data. I am new to programming in general, really, except very accustomed to doing work in Excel. Consequently, I have spent a lot of time (probably too much time) formatting my CSV file, just so I could minimize the hassle when working in R, but this hasn't worked.
Here is my code for PCA analysis. I have only gotten it to work when I have used smaller data files with no N/As or blanks, but I need to know how to handle these in R.
returns <- read.csv("PCA Data File.csv", skip = 1, header = T)
#standardize the variables
returns.pca <- prcomp(returns[2:ncol(returns)], scale = TRUE)
The result is:
Error in svd(x, nu = 0) : infinite or missing values in 'x'
Many questions arise from this, the first being how do you resolve this? Second, how do I explore my data to make sure missing values are properly addressed or replaced? Is it the fact that my data is a data.frame and not matrix that is causing the issue?
I am not sure how to attach the CSV file, but here are the first few rows from the file (there are 241 rows):
Date Returns Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 Var33 Var34 Var35 Var36 Var37 Var38 Var39 Var40 Var41 Var42 Var43 Var44 Var45 Var46 Var47 Var48 Var49 Var50 Var51 Var52 Var53 Var54 Var55 Var56 Var57 Var58 Var59 Var60 Var61
6/30/2014 0.48 18.12 9.44 107.43 19.53 1.92 11.54 0.99 3.33 98.83 0.44 2.59 3.42 105.15 308.59 80.44 1.36 0.94 102.07 1.69 331.47 53656.02 21897.39 11022.87 23144.90 15131.80 0.59 2.70 1.35 0.58 0.33 0.25 103.38 1.67 2.59 3.42 1.75 0.10 1.09 2.00 -0.11 1.24 2.08 0.22 138780.00
5/31/2014 1.52 17.63 9.44 107.18 14.36 1.96 12.48 1.01 3.49 98.60 0.37 2.55 3.39 101.79 306.79 79.96 1.37 0.93 101.84 1.68 324.69 53122.21 21159.31 10558.07 22584.93 14343.14 0.59 2.62 1.40 0.52 0.41 0.11 103.39 1.58 2.55 3.39 1.81 0.09 1.11 1.96 -0.07 1.15 2.29 0.47 3.50 1.49 138492.00 171.04 11302.80 4322654.00 55.40 -44.39 441.59 1000.70 117.44 11.60 6.50 1.50 0.50
4/30/2014 1.07 17.40 9.45 107.11 22.93 1.96 14.20 1.02 3.49 98.24 0.40 2.69 3.52 102.03 308.63 79.85 1.38 0.93 102.51 1.67 323.24 51470.08 21660.07 10399.85 22598.44 14475.33 0.61 2.67 1.53 0.53 0.47 0.06 103.47 1.69 2.69 3.52 1.82 0.09 1.49 2.08 0.02 1.16 2.04 -4.63 0.04 3.50 1.42 138268.00 171.58 11227.50 4296049.00 54.90 -47.04 425.02 204.90 117.57 11.60 27.30 6.60 1.80 1.40
3/31/2014 0.50 17.51 9.51 106.40 25.98 1.95 14.84 1.09 3.65 98.40 0.38 2.72 3.62 100.51 303.49 79.87 1.38 0.91 102.36 1.66 316.98 47046.98 20839.70 10097.38 21980.77 14694.83 0.61 2.72 1.59 0.52 0.48 0.04 103.44 1.63 2.72 3.62 1.99 0.08 1.73 2.10 0.00 1.13 2.02 0.91 3.30 1.20 137964.00 171.47 11169.00 4226971.00 53.70 -44.18 452.77 608.80 117.39 11.70 15.10 27.30 6.80 1.60 0.20
2/28/2014 1.76 17.10 9.52 106.27 25.35 1.96 15.47 1.13 3.88 98.46 0.31 2.70 3.66 100.68 294.91 80.44 1.37 0.90 102.12 1.66 315.92 47367.89 20039.38 10048.23 22188.31 14617.57 0.60 2.74 1.66 0.44 0.44 0.01 103.45 1.50 2.69 3.66 2.16 0.07 1.82 2.10 -0.05 1.04 1.87 0.91 3.10 1.08 137761.00 169.34 11133.50 4159972.00 53.20 -42.59 383.36 -48.40 116.28 11.70 27.30 6.90 1.70 1.70
r
r
edited Dec 22 '18 at 15:10
Ben Bolker
135k13227317
135k13227317
asked Jul 29 '14 at 19:46
user2662565user2662565
1742519
1742519
Paste in the output ofdput(head(returns, 10))rather than the current copy-paste.
– Thomas
Jul 29 '14 at 19:48
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22
add a comment |
Paste in the output ofdput(head(returns, 10))rather than the current copy-paste.
– Thomas
Jul 29 '14 at 19:48
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22
Paste in the output of
dput(head(returns, 10)) rather than the current copy-paste.– Thomas
Jul 29 '14 at 19:48
Paste in the output of
dput(head(returns, 10)) rather than the current copy-paste.– Thomas
Jul 29 '14 at 19:48
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22
add a comment |
2 Answers
2
active
oldest
votes
It looks like your data has problems with missing values for some of the dates so you have to do some data cleanup. The code below is an example of how you might do this for the rows you provided. Only two dates seem to be complete so continuing on to the PCA analysis didn't make much sense.
I've loaded you input data from above into the variable xx.
xx <- sub("n"," ",xx) # delete n in data
xy <- unlist(strsplit(xx,split=" ")) # change string to character vector
start_of_new_date <- grep("[0-9]/[0-9]{2}/2014",xy) # find start of new dates in data
diff(start_of_new_date) # notice that the number of values between dates are not all 62 so some lines are missing values
ar <- matrix(c(c("Date", xy[1:61]), xy[168:291]), nrow=3,byrow=TRUE ) # convert only complete dates, March and April, to matrix
df <- data.frame(Date=ar[2:3,1], ar[2:3,2:62], stringsAsFactors=FALSE) # convert dates and data to data frame
colnames(df) <- c("Date",ar[1,2:62]) # make var strings column names in data frame
df[,2:62] <- sapply(df[,2:62], as.numeric) # convert data columns from character to numeric
dfs <- scale(df[,2:62]) # example only; running scale on two row data columns is meaningless since all will scale to same values
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
add a comment |
Possible duplicate of Error in svd(x, nu = 0) : 0 extent dimensions
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25023693%2fpca-error-infinite-or-missing-values-in-x%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It looks like your data has problems with missing values for some of the dates so you have to do some data cleanup. The code below is an example of how you might do this for the rows you provided. Only two dates seem to be complete so continuing on to the PCA analysis didn't make much sense.
I've loaded you input data from above into the variable xx.
xx <- sub("n"," ",xx) # delete n in data
xy <- unlist(strsplit(xx,split=" ")) # change string to character vector
start_of_new_date <- grep("[0-9]/[0-9]{2}/2014",xy) # find start of new dates in data
diff(start_of_new_date) # notice that the number of values between dates are not all 62 so some lines are missing values
ar <- matrix(c(c("Date", xy[1:61]), xy[168:291]), nrow=3,byrow=TRUE ) # convert only complete dates, March and April, to matrix
df <- data.frame(Date=ar[2:3,1], ar[2:3,2:62], stringsAsFactors=FALSE) # convert dates and data to data frame
colnames(df) <- c("Date",ar[1,2:62]) # make var strings column names in data frame
df[,2:62] <- sapply(df[,2:62], as.numeric) # convert data columns from character to numeric
dfs <- scale(df[,2:62]) # example only; running scale on two row data columns is meaningless since all will scale to same values
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
add a comment |
It looks like your data has problems with missing values for some of the dates so you have to do some data cleanup. The code below is an example of how you might do this for the rows you provided. Only two dates seem to be complete so continuing on to the PCA analysis didn't make much sense.
I've loaded you input data from above into the variable xx.
xx <- sub("n"," ",xx) # delete n in data
xy <- unlist(strsplit(xx,split=" ")) # change string to character vector
start_of_new_date <- grep("[0-9]/[0-9]{2}/2014",xy) # find start of new dates in data
diff(start_of_new_date) # notice that the number of values between dates are not all 62 so some lines are missing values
ar <- matrix(c(c("Date", xy[1:61]), xy[168:291]), nrow=3,byrow=TRUE ) # convert only complete dates, March and April, to matrix
df <- data.frame(Date=ar[2:3,1], ar[2:3,2:62], stringsAsFactors=FALSE) # convert dates and data to data frame
colnames(df) <- c("Date",ar[1,2:62]) # make var strings column names in data frame
df[,2:62] <- sapply(df[,2:62], as.numeric) # convert data columns from character to numeric
dfs <- scale(df[,2:62]) # example only; running scale on two row data columns is meaningless since all will scale to same values
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
add a comment |
It looks like your data has problems with missing values for some of the dates so you have to do some data cleanup. The code below is an example of how you might do this for the rows you provided. Only two dates seem to be complete so continuing on to the PCA analysis didn't make much sense.
I've loaded you input data from above into the variable xx.
xx <- sub("n"," ",xx) # delete n in data
xy <- unlist(strsplit(xx,split=" ")) # change string to character vector
start_of_new_date <- grep("[0-9]/[0-9]{2}/2014",xy) # find start of new dates in data
diff(start_of_new_date) # notice that the number of values between dates are not all 62 so some lines are missing values
ar <- matrix(c(c("Date", xy[1:61]), xy[168:291]), nrow=3,byrow=TRUE ) # convert only complete dates, March and April, to matrix
df <- data.frame(Date=ar[2:3,1], ar[2:3,2:62], stringsAsFactors=FALSE) # convert dates and data to data frame
colnames(df) <- c("Date",ar[1,2:62]) # make var strings column names in data frame
df[,2:62] <- sapply(df[,2:62], as.numeric) # convert data columns from character to numeric
dfs <- scale(df[,2:62]) # example only; running scale on two row data columns is meaningless since all will scale to same values
It looks like your data has problems with missing values for some of the dates so you have to do some data cleanup. The code below is an example of how you might do this for the rows you provided. Only two dates seem to be complete so continuing on to the PCA analysis didn't make much sense.
I've loaded you input data from above into the variable xx.
xx <- sub("n"," ",xx) # delete n in data
xy <- unlist(strsplit(xx,split=" ")) # change string to character vector
start_of_new_date <- grep("[0-9]/[0-9]{2}/2014",xy) # find start of new dates in data
diff(start_of_new_date) # notice that the number of values between dates are not all 62 so some lines are missing values
ar <- matrix(c(c("Date", xy[1:61]), xy[168:291]), nrow=3,byrow=TRUE ) # convert only complete dates, March and April, to matrix
df <- data.frame(Date=ar[2:3,1], ar[2:3,2:62], stringsAsFactors=FALSE) # convert dates and data to data frame
colnames(df) <- c("Date",ar[1,2:62]) # make var strings column names in data frame
df[,2:62] <- sapply(df[,2:62], as.numeric) # convert data columns from character to numeric
dfs <- scale(df[,2:62]) # example only; running scale on two row data columns is meaningless since all will scale to same values
answered Jul 30 '14 at 17:08
WaltSWaltS
4,26711119
4,26711119
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
add a comment |
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
I am only using columns 2:ncol(returns) so that I exclude the date. Shouldn't this make it so the date is irrelevant to this?
– user2662565
Jul 30 '14 at 17:28
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
Sorry, I had taken the data string in your post to be the value of returns, not the file contents. Your using read.csv to try to bring this in but there aren't any comma's so it wouldn't separate the values properly. Two thoughts: First, look at the contents of returns to see if they look correct. Second, explain a little more how you're generating this file from Excel.
– WaltS
Jul 30 '14 at 21:44
add a comment |
Possible duplicate of Error in svd(x, nu = 0) : 0 extent dimensions
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
add a comment |
Possible duplicate of Error in svd(x, nu = 0) : 0 extent dimensions
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
add a comment |
Possible duplicate of Error in svd(x, nu = 0) : 0 extent dimensions
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
Possible duplicate of Error in svd(x, nu = 0) : 0 extent dimensions
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
edited May 23 '17 at 10:31
Community♦
11
11
answered Jan 10 '16 at 21:31
Joshua BurkhartJoshua Burkhart
16717
16717
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25023693%2fpca-error-infinite-or-missing-values-in-x%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Paste in the output of
dput(head(returns, 10))rather than the current copy-paste.– Thomas
Jul 29 '14 at 19:48
Have a look here stat.ethz.ch/pipermail/r-help/2008-January/150896.html
– konvas
Jul 30 '14 at 7:56
I think I have been to that page before. In any case, now I get this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance
– user2662565
Jul 30 '14 at 13:34
Found another post addressing this error: Error in prcomp.default(na.omit(returns[2:ncol(returns)]), scale = TRUE) : cannot rescale a constant/zero column to unit variance Updated with following: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na..rm=TRUE) != 0], scale = TRUE)) Error in FUN(newX[, i], ...) : unused argument (na..rm = TRUE) Received this error: > returns.pca <- prcomp(na.omit(returns[,apply(returns[2:ncol(returns)], 2, var, na.rm=TRUE) != 0], scale = TRUE)) Error in svd(x, nu = 0) : a dimension is zero
– user2662565
Jul 30 '14 at 15:22