当前位置:Gxlcms > Python > 用TensorFlow实现lasso回归和岭回归算法的示例

用TensorFlow实现lasso回归和岭回归算法的示例

时间:2021-07-01 10:21:17 帮助过:61人阅读

这篇文章主要介绍了关于用TensorFlow实现lasso回归和岭回归算法的示例,有着一定的参考价值,现在分享给大家,有需要的朋友可以参考一下

也有些正则方法可以限制回归算法输出结果中系数的影响,其中最常用的两种正则方法是lasso回归和岭回归。

lasso回归和岭回归算法跟常规线性回归算法极其相似,有一点不同的是,在公式中增加正则项来限制斜率(或者净斜率)。这样做的主要原因是限制特征对因变量的影响,通过增加一个依赖斜率A的损失函数实现。

对于lasso回归算法,在损失函数上增加一项:斜率A的某个给定倍数。我们使用TensorFlow的逻辑操作,但没有这些操作相关的梯度,而是使用阶跃函数的连续估计,也称作连续阶跃函数,其会在截止点跳跃扩大。一会就可以看到如何使用lasso回归算法。

对于岭回归算法,增加一个L2范数,即斜率系数的L2正则。

  1. # LASSO and Ridge Regression
  2. # lasso回归和岭回归
  3. #
  4. # This function shows how to use TensorFlow to solve LASSO or
  5. # Ridge regression for
  6. # y = Ax + b
  7. #
  8. # We will use the iris data, specifically:
  9. # y = Sepal Length
  10. # x = Petal Width
  11. # import required libraries
  12. import matplotlib.pyplot as plt
  13. import sys
  14. import numpy as np
  15. import tensorflow as tf
  16. from sklearn import datasets
  17. from tensorflow.python.framework import ops
  18. # Specify 'Ridge' or 'LASSO'
  19. regression_type = 'LASSO'
  20. # clear out old graph
  21. ops.reset_default_graph()
  22. # Create graph
  23. sess = tf.Session()
  24. ###
  25. # Load iris data
  26. ###
  27. # iris.data = [(Sepal Length, Sepal Width, Petal Length, Petal Width)]
  28. iris = datasets.load_iris()
  29. x_vals = np.array([x[3] for x in iris.data])
  30. y_vals = np.array([y[0] for y in iris.data])
  31. ###
  32. # Model Parameters
  33. ###
  34. # Declare batch size
  35. batch_size = 50
  36. # Initialize placeholders
  37. x_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)
  38. y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
  39. # make results reproducible
  40. seed = 13
  41. np.random.seed(seed)
  42. tf.set_random_seed(seed)
  43. # Create variables for linear regression
  44. A = tf.Variable(tf.random_normal(shape=[1,1]))
  45. b = tf.Variable(tf.random_normal(shape=[1,1]))
  46. # Declare model operations
  47. model_output = tf.add(tf.matmul(x_data, A), b)
  48. ###
  49. # Loss Functions
  50. ###
  51. # Select appropriate loss function based on regression type
  52. if regression_type == 'LASSO':
  53. # Declare Lasso loss function
  54. # 增加损失函数,其为改良过的连续阶跃函数,lasso回归的截止点设为0.9。
  55. # 这意味着限制斜率系数不超过0.9
  56. # Lasso Loss = L2_Loss + heavyside_step,
  57. # Where heavyside_step ~ 0 if A < constant, otherwise ~ 99
  58. lasso_param = tf.constant(0.9)
  59. heavyside_step = tf.truep(1., tf.add(1., tf.exp(tf.multiply(-50., tf.subtract(A, lasso_param)))))
  60. regularization_param = tf.multiply(heavyside_step, 99.)
  61. loss = tf.add(tf.reduce_mean(tf.square(y_target - model_output)), regularization_param)
  62. elif regression_type == 'Ridge':
  63. # Declare the Ridge loss function
  64. # Ridge loss = L2_loss + L2 norm of slope
  65. ridge_param = tf.constant(1.)
  66. ridge_loss = tf.reduce_mean(tf.square(A))
  67. loss = tf.expand_dims(tf.add(tf.reduce_mean(tf.square(y_target - model_output)), tf.multiply(ridge_param, ridge_loss)), 0)
  68. else:
  69. print('Invalid regression_type parameter value',file=sys.stderr)
  70. ###
  71. # Optimizer
  72. ###
  73. # Declare optimizer
  74. my_opt = tf.train.GradientDescentOptimizer(0.001)
  75. train_step = my_opt.minimize(loss)
  76. ###
  77. # Run regression
  78. ###
  79. # Initialize variables
  80. init = tf.global_variables_initializer()
  81. sess.run(init)
  82. # Training loop
  83. loss_vec = []
  84. for i in range(1500):
  85. rand_index = np.random.choice(len(x_vals), size=batch_size)
  86. rand_x = np.transpose([x_vals[rand_index]])
  87. rand_y = np.transpose([y_vals[rand_index]])
  88. sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
  89. temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
  90. loss_vec.append(temp_loss[0])
  91. if (i+1)%300==0:
  92. print('Step #' + str(i+1) + ' A = ' + str(sess.run(A)) + ' b = ' + str(sess.run(b)))
  93. print('Loss = ' + str(temp_loss))
  94. print('\n')
  95. ###
  96. # Extract regression results
  97. ###
  98. # Get the optimal coefficients
  99. [slope] = sess.run(A)
  100. [y_intercept] = sess.run(b)
  101. # Get best fit line
  102. best_fit = []
  103. for i in x_vals:
  104. best_fit.append(slope*i+y_intercept)
  105. ###
  106. # Plot results
  107. ###
  108. # Plot regression line against data points
  109. plt.plot(x_vals, y_vals, 'o', label='Data Points')
  110. plt.plot(x_vals, best_fit, 'r-', label='Best fit line', linewidth=3)
  111. plt.legend(loc='upper left')
  112. plt.title('Sepal Length vs Pedal Width')
  113. plt.xlabel('Pedal Width')
  114. plt.ylabel('Sepal Length')
  115. plt.show()
  116. # Plot loss over time
  117. plt.plot(loss_vec, 'k-')
  118. plt.title(regression_type + ' Loss per Generation')
  119. plt.xlabel('Generation')
  120. plt.ylabel('Loss')
  121. plt.show()

输出结果:

Step #300 A = [[ 0.77170753]] b = [[ 1.82499862]]
Loss = [[ 10.26473045]]
Step #600 A = [[ 0.75908542]] b = [[ 3.2220633]]
Loss = [[ 3.06292033]]
Step #900 A = [[ 0.74843585]] b = [[ 3.9975822]]
Loss = [[ 1.23220456]]
Step #1200 A = [[ 0.73752165]] b = [[ 4.42974091]]
Loss = [[ 0.57872057]]
Step #1500 A = [[ 0.72942668]] b = [[ 4.67253113]]
Loss = [[ 0.40874988]]


通过在标准线性回归估计的基础上,增加一个连续的阶跃函数,实现lasso回归算法。由于阶跃函数的坡度,我们需要注意步长,因为太大的步长会导致最终不收敛。

相关推荐:

用TensorFlow实现戴明回归算法的示例

以上就是用TensorFlow实现lasso回归和岭回归算法的示例的详细内容,更多请关注Gxl网其它相关文章!

人气教程排行