21 minute read

Introduction

In statistical modeling and machine learning, regression methods play a crucial role in understanding relationships between variables and making predictions. One powerful technique that has gained significant attention is LASSO, which stands for Least Absolute Shrinkage and Selection Operator. LASSO is a type of linear regression that not only aims to improve prediction accuracy but also enhances the interpretability of the model by enforcing sparsity in the coefficients.

The importance of regression methods in statistical modeling cannot be overstated. They are essential tools for data scientists and researchers to make informed decisions, understand underlying patterns, and predict future outcomes. Traditional regression methods like Ordinary Least Squares (OLS) often struggle with high-dimensional data and multicollinearity, leading to overfitting and complex models that are hard to interpret.

LASSO addresses these challenges by incorporating both variable selection and regularization into the regression process. This results in simpler models that are more generalizable to new data. By setting some coefficients to zero, LASSO effectively selects a subset of relevant features, making the model easier to understand and interpret.

This article will delve into the details of LASSO regression, exploring its mathematical formulation, key features, and advantages over other methods. We will also discuss when to use LASSO, its practical applications, and the scenarios where it might not be the best choice. Finally, we will look at alternatives to LASSO, such as Elastic Net, which addresses some of its limitations.

The structure of the article is as follows:

  • What is LASSO?: Definition, mathematical formulation, and key features
  • Why Use LASSO?: Advantages and comparison with other methods
  • When to Use LASSO?: Suitable scenarios and practical applications
  • When Not to Use LASSO?: Limitations and alternatives
  • Conclusion: Summary of key points and final thoughts

By the end of this article, readers will have a comprehensive understanding of LASSO regression and its role in statistical modeling and data analysis.

What is LASSO?

Definition

LASSO, which stands for Least Absolute Shrinkage and Selection Operator, is a type of linear regression that enhances the prediction accuracy and interpretability of statistical models by performing both variable selection and regularization. It is particularly useful when dealing with high-dimensional data where the number of predictors is large, possibly even greater than the number of observations.

LASSO regression modifies the typical linear regression model by adding a constraint on the sum of the absolute values of the model parameters (coefficients). This constraint has the effect of shrinking some coefficients to exactly zero, effectively selecting a simpler model that includes only the most significant predictors. The LASSO optimization problem can be expressed as:

\[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 + \lambda \sum_{j=1}^{p} |\beta_j| \right)\]

where:

  • \(y_i\) is the response variable,
  • \(X_i\) are the predictor variables,
  • \(\beta\) are the coefficients of the model,
  • \(\lambda\) is the regularization parameter that controls the amount of shrinkage applied to the coefficients.

Comparison with Traditional Regression Methods

Traditional regression methods, such as Ordinary Least Squares (OLS), aim to minimize the sum of squared residuals to fit a linear model:

\[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 \right)\]

While OLS provides unbiased estimates, it often suffers from high variance, especially in the presence of multicollinearity (when predictors are highly correlated) and when dealing with high-dimensional data. This can lead to overfitting, where the model performs well on training data but poorly on new, unseen data.

LASSO addresses these issues by adding a penalty term \(\lambda \sum_{j=1}^{p} \|\beta_j\|\) to the regression objective. This penalty term encourages sparsity, meaning it drives some of the coefficients to zero, thus performing variable selection. The key benefits of LASSO over traditional methods include:

  • Improved prediction accuracy: By reducing overfitting, LASSO tends to produce models that generalize better to new data.
  • Enhanced interpretability: By selecting only a subset of the available predictors, LASSO models are often simpler and easier to interpret.

LASSO regression modifies the standard linear regression approach by adding a regularization term that helps in both variable selection and reducing model complexity, making it a powerful tool for dealing with high-dimensional data and improving model generalizability.

Mathematical Formulation

LASSO regression incorporates a regularization term to the traditional linear regression model, which not only aims to minimize the residual sum of squares but also imposes a constraint on the sum of the absolute values of the coefficients. This regularization encourages sparsity in the model parameters, effectively performing variable selection.

The LASSO optimization problem can be introduced as follows:

Given a set of observations \((X_i, y_i)\) for \(i = 1, 2, \ldots, n\), where \(X_i\) represents the predictor variables and \(y_i\) represents the response variable, the objective is to minimize the following cost function:

\[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 + \lambda \sum_{j=1}^{p} \|\beta_j\| \right)\]

Here:

  • \(y_i\) is the response variable for the \(i\)-th observation.
  • \(X_i\) is the vector of predictor variables for the \(i\)-th observation.
  • \(\beta\) represents the vector of coefficients for the model.
  • \(\lambda\) is the regularization parameter that controls the amount of shrinkage applied to the coefficients.
  • \(n\) is the number of observations.
  • \(p\) is the number of predictor variables.

The first term in the cost function, \(\frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2\), represents the residual sum of squares, which measures the fit of the model to the data. The second term, \(\lambda \sum_{j=1}^{p} \|\beta_j\|\), is the L1 penalty term that imposes a constraint on the sum of the absolute values of the coefficients.

The regularization parameter \(\lambda\) plays a crucial role in the LASSO model:

  • When \(\lambda = 0\), the LASSO model reduces to the ordinary least squares (OLS) regression.
  • As \(\lambda\) increases, more coefficients are shrunk towards zero, leading to a sparser model.

By solving this optimization problem, LASSO not only fits the data but also performs variable selection by setting some of the coefficients to zero, thereby simplifying the model and improving its interpretability.

Description of the Variables and Parameters Involved

In the LASSO regression optimization problem, several variables and parameters play crucial roles. Understanding these components is essential for grasping how LASSO works and how it achieves variable selection and regularization.

Response Variable \(y_i\)

  • Definition: The response variable, also known as the dependent variable, is the outcome or the variable being predicted or explained.
  • Role: In the context of LASSO regression, \(y_i\) represents the actual observed values for each observation \(i.\)

Predictor Variables \(X_i\)

  • Definition: Predictor variables, also known as independent variables or features, are the inputs used to predict the response variable.
  • Role: \(X_i\) denotes the vector of predictor variables for the \(i\)-th observation. Each \(X_i\) can contain multiple features (e.g., \(X_{i1}, X_{i2}, \ldots, X_{ip}\)).

Coefficients \(\beta\)

  • Definition: Coefficients are the parameters of the regression model that represent the relationship between each predictor variable and the response variable.
  • Role: \(\beta\) is the vector of coefficients \(\beta_1, \beta_2, \ldots, \beta_p\) in the model. In LASSO regression, the values of \(\beta\) are determined in such a way that some coefficients can be exactly zero, leading to variable selection.

Regularization Parameter \(\lambda\)

  • Definition: The regularization parameter controls the amount of shrinkage applied to the coefficients.
  • Role: \(\lambda\) determines the strength of the penalty applied to the sum of the absolute values of the coefficients \((\sum_{j=1}^{p} \|\beta_j\|)\). A higher value of \(\lambda\) results in more coefficients being shrunk to zero, thereby increasing sparsity.

Number of Observations \(n\)

  • Definition: The total number of data points or samples in the dataset.
  • Role: \(n\) is used to normalize the residual sum of squares term in the cost function, ensuring that the scale of the penalty is consistent regardless of the dataset size.

Number of Predictors \(p\)

  • Definition: The total number of predictor variables or features in the dataset.
  • Role: \(p\) indicates the dimensionality of the coefficient vector \(\beta\). In high-dimensional data settings, \(p\) can be much larger than \(n\), making LASSO particularly useful.

LASSO Optimization Problem

Revisiting the LASSO optimization problem with these components in mind:

\[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 + \lambda \sum_{j=1}^{p} |\beta_j| \right)\]
  • The term \(\frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2\) represents the residual sum of squares, measuring the fit of the model.
  • The term \(\lambda \sum_{j=1}^{p} \|\beta_j\|\) is the regularization component, encouraging sparsity in the coefficients.

Understanding these variables and parameters helps in comprehending how LASSO regression achieves a balance between fitting the model to the data and maintaining model simplicity through variable selection and regularization.

Key Features

LASSO regression distinguishes itself from other regression methods through its unique ability to perform both variable selection and regularization. Here are the key features that make LASSO a powerful tool for statistical modeling:

Variable Selection

  • Definition: Variable selection refers to the process of identifying and including only the most relevant predictors in the model.
  • Mechanism: In LASSO, the L1 penalty term \(\lambda \sum_{j=1}^{p} \|\beta_j\|\) in the optimization problem encourages the coefficients of less important variables to shrink to zero. This effectively excludes these variables from the model, simplifying it by retaining only the predictors that have significant contributions.
  • Benefit: By selecting a subset of predictors, LASSO improves the interpretability of the model. It helps in identifying the most influential variables, making the model easier to understand and analyze.

Regularization

  • Definition: Regularization is the technique of adding a penalty to the model to prevent overfitting by discouraging excessively large coefficients.
  • Mechanism: The L1 penalty in LASSO imposes a constraint on the sum of the absolute values of the coefficients. This penalty term controls the complexity of the model by shrinking the coefficients, which helps in reducing the model’s variance without increasing its bias significantly.
  • Benefit: Regularization improves the generalizability of the model by ensuring that it performs well not only on the training data but also on new, unseen data. It mitigates the risk of overfitting, especially in high-dimensional datasets where the number of predictors \(p\) is large.

Sparsity of Coefficients

  • Definition: Sparsity refers to the condition where most of the coefficients in the model are zero.
  • Mechanism: The L1 penalty in LASSO promotes sparsity by driving many of the coefficients to zero. This results in a model that includes only a few non-zero coefficients, corresponding to the most important predictors.
  • Benefit: Sparse models are not only easier to interpret but also computationally efficient. They require less storage space and fewer computational resources, making them suitable for high-dimensional data analysis. Additionally, sparse models often have better predictive performance due to their simplicity and reduced risk of overfitting.

LASSO’s key features of variable selection, regularization, and sparsity of coefficients work together to create models that are both interpretable and generalizable. These features make LASSO an attractive choice for many statistical modeling and machine learning applications, particularly in high-dimensional settings.

Why Use LASSO?

Advantages

LASSO regression offers several compelling advantages that make it a valuable tool in the realm of statistical modeling and machine learning. Here are the key benefits:

Improved Prediction Accuracy

  • Mechanism: LASSO incorporates a regularization term that penalizes the size of the coefficients, which helps to prevent overfitting. By shrinking the coefficients of less important predictors towards zero, LASSO reduces the model’s complexity.
  • Outcome: This regularization leads to models that generalize better to new, unseen data, resulting in improved prediction accuracy. In high-dimensional datasets, where overfitting is a significant concern, LASSO’s ability to produce more robust models is particularly beneficial.

Enhanced Interpretability of Models

  • Mechanism: Through its variable selection capability, LASSO sets many coefficients to exactly zero. This exclusion of irrelevant predictors simplifies the model.
  • Outcome: The resulting model is easier to interpret and understand since it only includes the most significant variables. Enhanced interpretability is crucial for domains where understanding the relationship between predictors and the response variable is as important as making accurate predictions, such as in biomedical research and social sciences.

Effective Handling of High-Dimensional Data

  • Mechanism: High-dimensional data refers to datasets with a large number of predictor variables (p) relative to the number of observations (n). Traditional regression methods like Ordinary Least Squares (OLS) struggle in such scenarios due to overfitting and computational challenges.
  • Outcome: LASSO’s regularization and variable selection capabilities make it well-suited for high-dimensional data. By selecting a sparse set of predictors, LASSO reduces the dimensionality of the problem, leading to more manageable and computationally efficient models. This is particularly useful in fields like genomics, where datasets with thousands of predictors are common.

LASSO regression provides significant advantages in terms of prediction accuracy, model interpretability, and handling high-dimensional data. These benefits make LASSO a preferred choice for many practical applications, from finance and marketing to biology and engineering, where robust and interpretable models are essential.

Comparison with Other Methods

Comparison with Ordinary Least Squares (OLS)

  • Ordinary Least Squares (OLS): OLS is a traditional regression method that aims to minimize the sum of squared residuals between observed and predicted values.

    \[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 \right)\]
  • Advantages of OLS:
    • Simplicity: OLS is straightforward to implement and understand.
    • Unbiased Estimates: Under certain conditions, OLS provides unbiased estimates of the coefficients.
  • Limitations of OLS:
    • Overfitting: OLS can overfit the data, especially when the number of predictors (p) is large relative to the number of observations (n).
    • Multicollinearity: OLS performs poorly when predictor variables are highly correlated, leading to inflated variance of the coefficient estimates.
  • LASSO vs. OLS:
    • Regularization: LASSO adds an L1 penalty to the OLS objective, which helps to prevent overfitting by shrinking some coefficients to zero.
    • Variable Selection: Unlike OLS, LASSO performs variable selection by setting some coefficients to zero, resulting in a simpler and more interpretable model.

Comparison with Ridge Regression

  • Ridge Regression: Ridge regression, like LASSO, aims to prevent overfitting by adding a regularization term to the OLS objective. However, it uses an L2 penalty, which shrinks the coefficients but does not set them to zero.

    \[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 \right)\]
  • Advantages of Ridge Regression:
    • Multicollinearity: Ridge regression can handle multicollinearity better than OLS by stabilizing the coefficient estimates.
    • Regularization: The L2 penalty helps to prevent overfitting, similar to LASSO.
  • Limitations of Ridge Regression:
    • No Variable Selection: Ridge regression does not set coefficients to zero, so it does not perform variable selection. All predictors remain in the model, which can be a drawback for interpretability.
  • LASSO vs. Ridge Regression:
    • Penalty Type: LASSO uses an L1 penalty, which can set coefficients to zero, while Ridge uses an L2 penalty, which only shrinks coefficients.
    • Variable Selection: LASSO performs variable selection, making it more suitable for models where interpretability and simplicity are important. Ridge regression retains all predictors, making it less suitable for variable selection but potentially more stable in the presence of multicollinearity.

While OLS is a straightforward method prone to overfitting and issues with multicollinearity, LASSO and Ridge Regression introduce regularization to address these problems. LASSO, with its L1 penalty, offers the added benefit of variable selection, creating simpler and more interpretable models. Ridge Regression, with its L2 penalty, does not perform variable selection but provides a stable solution in the presence of multicollinearity.

When to Use LASSO?

Suitable Scenarios

LASSO regression is particularly useful in several specific scenarios where traditional regression methods may fall short.

One key scenario is high-dimensional data. High-dimensional data refers to datasets where the number of predictor variables (( p )) is large, potentially even larger than the number of observations (( n )). In such cases, traditional regression methods like Ordinary Least Squares (OLS) can suffer from overfitting, leading to poor generalization on new data. LASSO effectively handles high-dimensional data by incorporating regularization, which shrinks the coefficients of less important variables to zero, reducing the model’s complexity and improving its generalizability.

Another scenario where LASSO proves advantageous is in the presence of multicollinearity. Multicollinearity occurs when predictor variables are highly correlated with each other, leading to unstable coefficient estimates in regression models. OLS regression models can exhibit high variance in the presence of multicollinearity, making the model’s predictions unreliable. By adding an L1 penalty to the regression objective, LASSO reduces the impact of multicollinearity by shrinking some coefficients to zero and effectively removing redundant predictors. This stabilizes the model and enhances its predictive performance.

LASSO is also beneficial when there is a need for feature selection. Feature selection involves identifying and retaining only the most relevant predictor variables in the model. Including too many irrelevant predictors can lead to complex models that are difficult to interpret and may not generalize well. LASSO’s L1 penalty encourages sparsity in the model by setting the coefficients of irrelevant predictors to zero. This results in a simpler model that includes only the most significant features, enhancing both interpretability and predictive accuracy.

Finally, LASSO is advantageous for improving the interpretability of the model. Interpretability refers to the ease with which the relationships between predictor variables and the response variable can be understood. Complex models with many predictors can be difficult to interpret, especially when the goal is to understand the underlying data-generating process. By selecting a subset of relevant predictors and setting others to zero, LASSO produces a more parsimonious model. This makes it easier to interpret the influence of individual predictors on the response variable, which is particularly important in fields like biomedical research, finance, and social sciences.

LASSO is highly suitable for scenarios involving high-dimensional data, multicollinearity, the need for feature selection, and the desire for model interpretability. Its ability to produce simpler, more interpretable models while maintaining predictive performance makes it a valuable tool in various applications such as genomics, finance, and machine learning.

When Not to Use LASSO?

Limitations

While LASSO regression offers many advantages, there are certain scenarios where it may not be the best choice.

One limitation of LASSO is its handling of highly correlated predictors. When predictor variables are highly correlated, LASSO tends to select one predictor from the group and ignore the others, which can lead to unstable and non-unique solutions. This behavior can be problematic in situations where it is important to consider all correlated predictors, as ignoring some may result in loss of valuable information.

Another potential limitation of LASSO is its propensity to produce non-unique solutions. Since LASSO can shrink multiple coefficients to zero, there can be several models with different sets of predictors that provide similar predictive performance. This lack of uniqueness can complicate the interpretation of the model and make it difficult to determine which predictors are truly important.

LASSO also faces challenges when dealing with small data sets. In situations where the number of observations is small relative to the number of predictors, LASSO may not perform well because the regularization term can dominate the objective function, leading to an oversimplified model that fails to capture the underlying relationships in the data. In such cases, alternative methods or modifications to the LASSO approach, such as Elastic Net, may be more appropriate.

LASSO may not be the best choice when dealing with highly correlated predictors, potential non-unique solutions, and small data sets. Being aware of these limitations is crucial for selecting the most appropriate regression method for a given dataset and research question.

Alternatives

While LASSO regression is a powerful tool, its limitations necessitate the consideration of alternative methods that can address some of its drawbacks.

One prominent alternative to LASSO is the Elastic Net. The Elastic Net combines the properties of both LASSO and Ridge Regression by incorporating both L1 and L2 penalties in the regression objective. The Elastic Net optimization problem can be formulated as:

\[\min_{\beta} \left( \frac{1}{2n} \sum_{i=1}^{n} (y_i - X_i \beta)^2 + \lambda_1 \sum_{j=1}^{p} |\beta_j| + \lambda_2 \sum_{j=1}^{p} \beta_j^2 \right)\]

In this formulation, \(\lambda_1\) controls the L1 penalty, promoting sparsity and variable selection, while \(\lambda_2\) controls the L2 penalty, which helps in dealing with multicollinearity and stabilizing the coefficient estimates.

The Elastic Net is particularly useful when dealing with datasets that have highly correlated predictors. Unlike LASSO, which tends to select one predictor from a group of correlated predictors and discard the others, the Elastic Net can select or shrink together correlated predictors, providing a more balanced and robust model. This makes the Elastic Net a more versatile and effective choice in many practical scenarios.

Besides the Elastic Net, other regularization methods include:

  • Ridge Regression: Ridge Regression uses an L2 penalty, which shrinks coefficients but does not set them to zero. This method is useful for handling multicollinearity but does not perform variable selection, leading to models that include all predictors.

  • Principal Component Regression (PCR): PCR involves transforming the predictors into a set of uncorrelated components using Principal Component Analysis (PCA) and then performing regression on these components. This method helps in dealing with multicollinearity and reducing dimensionality but can be less interpretable due to the transformation of predictors.

  • Partial Least Squares Regression (PLSR): PLSR is similar to PCR but aims to find components that not only explain the predictors’ variance but also have a high correlation with the response variable. This method can be more interpretable than PCR and useful for dealing with multicollinearity.

While LASSO is a valuable regression method, alternatives like the Elastic Net, Ridge Regression, PCR, and PLSR offer different strengths and can be more suitable depending on the specific characteristics of the dataset and the research goals. Each method has its own advantages and limitations, and the choice of method should be guided by the nature of the data and the specific requirements of the analysis.

Conclusion

In summary, LASSO (Least Absolute Shrinkage and Selection Operator) regression is a powerful tool for statistical modeling and machine learning, particularly useful in high-dimensional data scenarios. By incorporating an L1 penalty, LASSO performs both variable selection and regularization, leading to models that are not only accurate but also interpretable and generalizable. We have explored the definition and mathematical formulation of LASSO, discussed its key features such as variable selection, regularization, and sparsity of coefficients, and highlighted its advantages over traditional methods like Ordinary Least Squares (OLS) and Ridge Regression.

LASSO’s ability to handle multicollinearity, perform feature selection, and produce simpler, more interpretable models makes it a valuable tool in various applications, from genomics and finance to social sciences and engineering. However, it is essential to be aware of its limitations, such as issues with highly correlated predictors, potential for non-unique solutions, and challenges with small datasets. In such cases, alternatives like the Elastic Net, which combines the strengths of LASSO and Ridge Regression, or other regularization methods like Ridge Regression, Principal Component Regression (PCR), and Partial Least Squares Regression (PLSR) might be more appropriate.

LASSO’s versatility and effectiveness underscore its utility in modern data analysis. It provides a robust framework for building predictive models that balance complexity and interpretability. Researchers and practitioners are encouraged to delve deeper into the theoretical foundations and practical implementations of LASSO to fully leverage its capabilities.

For further reading and exploration, the following resources are recommended:

  • Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.
  • Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.

These resources provide comprehensive insights into LASSO regression and its applications, helping readers to develop a deeper understanding and practical skills in using this valuable technique.

References

  • Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.
  • Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
  • Buhlmann, P., & Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer.
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least Angle Regression. The Annals of Statistics, 32(2), 407-499.
  • Bühlmann, P., Kalisch, M., & Meier, L. (2014). High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1, 255-278.
  • Tibshirani, R., & Taylor, J. (2011). The Solution Path of the Generalized Lasso. The Annals of Statistics, 39(3), 1335-1371.
  • Fan, J., & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348-1360.
  • Meinshausen, N., & Bühlmann, P. (2010). Stability Selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.
  • Tibshirani, R. J. (2013). The Lasso Problem and Uniqueness. Electronic Journal of Statistics, 7, 1456-1490.
  • Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “Degrees of Freedom” of the Lasso. The Annals of Statistics, 35(5), 2173-2192.
  • Donoho, D. L., & Johnstone, I. M. (1994). Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika, 81(3), 425-455.
  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55-67.
  • Candès, E. J., & Tao, T. (2007). The Dantzig Selector: Statistical Estimation When p is Much Larger than n. The Annals of Statistics, 35(6), 2313-2351.
  • Chen, S., Donoho, D. L., & Saunders, M. A. (2001). Atomic Decomposition by Basis Pursuit. SIAM Review, 43(1), 129-159.
  • Wainwright, M. J. (2009). Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using (\ell_1)-Constrained Quadratic Programming (Lasso). IEEE Transactions on Information Theory, 55(5), 2183-2202.
  • Figueiredo, M. A. T., Nowak, R. D., & Wright, S. J. (2007). Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE Journal of Selected Topics in Signal Processing, 1(4), 586-597.