当前位置:Gxlcms > Python > python实战之实现excel读取、统计、写入的示例讲解

python实战之实现excel读取、统计、写入的示例讲解

时间:2021-07-01 10:21:17 帮助过:3人阅读

这篇文章主要介绍了关于python实战之实现excel读取、统计、写入的示例讲解,有着一定的参考价值,现在分享给大家,有需要的朋友可以参考一下

背景

图像领域内的一个国内会议快要召开了,要发各种邀请邮件,之后要录入、统计邮件回复(参会还是不参会等)。如此重要的任务,老师就托付给我了。ps: 统计回复邮件的时候,能知道谁参会或谁不参会。

而我主要的任务,除了录入邮件回复,就是统计理事和普通会员的参会情况了(参会的、不参会的、没回复的)。录入邮件回复信息没办法只能人工操作,但如果统计也要人工的话,那工作量就太大了(比如在上百人的列表中搜索另外上百人在不在此列表中!!),于是就想到了用python来帮忙,花两天时间不断修改,写了6个版本。。。

摘要

version_1 基本实现了excel读取、统计、显示功能,但问题也有不少,像显示出来后还要自已复制、粘贴到excel表,而且set中还有nan这样的bug。

version_2 相比较version_1而言,此版本用set代替list,可以自动去重。

version_3 解决了set中出现nan的bug,而且还加入的excel写入的功能,但一次只能写入一张表,所以要运行两次才能写入两张表(sheet)。

version_4 的改进在于将version_3中写入两张表格的操作,集成在一个程序里,只需要运行一次便可写入两张表,但也总是会写入两张表,万一你只想写入一张表呢??

version_5 相对之前版本的最大改进在于将程序模块化,更具可读性了; 对修复set中出现nan的方法也进行了改进和简化; 而且可以自由控制写入多少张表了。

version_final 相比较version_5,修复了一个bug,之前需要先验知识,现在更通用一点(prep函数取代了set2list函数)。

version_1

基本实现了excel读取、统计、显示功能,但问题也有不少,像显示出来后还要自已复制、粘贴到excel表,而且set中还有nan这样的值。

  1. #version_1
  2. import os
  3. import numpy as np
  4. import pandas as pd
  5. os.chdir('C:\\Users\\dell\\Desktop\\0711任务')
  6. print(os.getcwd())
  7. data = pd.read_excel('for_python.xlsx','Sheet2')
  8. return_set = set(data['回执名单'])
  9. demand_set = set(data['理事名单'])
  10. answer_list = []
  11. unanswer_list = []
  12. for each in demand_set:
  13. if each in return_set:
  14. answer_list.append(each)
  15. else:
  16. unanswer_list.append(each)
  17. notattend_set = set(data['回执名单'][-15:])
  18. nt = []
  19. for each in notattend_set:
  20. if each in answer_list:
  21. nt.append(each)
  22. def disp(ll, cap, num = True):
  23. print(cap)
  24. if num:
  25. for i, each in enumerate(ll):
  26. print(i+1,each)
  27. else:
  28. for each in enumerate(ll):
  29. print(each)
  30. disp(answer_list,'\n理事回执名单')
  31. disp(unanswer_list,'\n理事未回执名单')
  32. disp(nt,'\n理事回执说不参加名单')

version_2

相比较上一个版本,此版本用set代替list,可以自动去重。

  1. #version_2
  2. import os
  3. import numpy as np
  4. import pandas as pd
  5. os.chdir('C:\\Users\\dell\\Desktop\\0711任务')
  6. print(os.getcwd())
  7. data = pd.read_excel('for_python.xlsx','Sheet2')
  8. return_set = set(data['回执名单'])
  9. demand_set = set(data['理事名单'])
  10. answer_set = set([]) #理事回执名单
  11. unanswer_set = set([]) #理事未回执名单
  12. for each in demand_set:
  13. if each in return_set:
  14. answer_set.add(each)
  15. else:
  16. unanswer_set.add(each)
  17. notattend_set = set(data['回执名单'][-17:])
  18. nt = set([]) #理事回执说不参加名单
  19. for each in notattend_set:
  20. if each in answer_set:
  21. nt.add(each)
  22. ans_att_set = answer_set - nt #理事回执参加名单
  23. def disp(ss, cap, num = False):
  24. print(cap)
  25. if num:
  26. for i, each in enumerate(ss):
  27. print(i+1,each)
  28. else:
  29. for each in ss:
  30. print(each)
  31. #disp(answer_set,'\n理事回执名单')
  32. disp(ans_att_set,'\n理事回执说参加名单')
  33. disp(nt,'\n理事回执说不参加名单')
  34. disp(unanswer_set,'\n理事未回执名单')
  35. print(len(ans_att_set),len(nt),len(unanswer_set))

version_3

此版本解决了set中出现nan的bug,而且还加入的excel写入的功能,但一次只能写入一张表,所以要运行两次才能写入两张表(sheet)。

step_1

  1. import os
  2. import numpy as np
  3. import pandas as pd
  4. os.chdir('C:\\Users\\dell\\Desktop')
  5. print('work_directory: ', os.getcwd())
  6. data = pd.read_excel('理事与会员名单.xlsx','理事与会员名单')
  7. #1.载入excel,得到三个名单
  8. ans_attend_set = set(data['回执参加']) #回执参会名单
  9. N = len(ans_attend_set)
  10. ans_notatt_idx = [i for i in range(N) if type(data['回执不参加'][i]) == np.float][0]
  11. ans_notatt_set = set(data['回执不参加'][:ans_notatt_idx])#回执不参会名单
  12. concil_idx = [i for i in range(N) if type(data['理事名单'][i]) == np.float][0]
  13. concil_set = set(data['理事名单'][:concil_idx]) #理事名单
  14. #2.统计理事参会情况
  15. concil_attend_set = set([]) #理事回执参会名单
  16. concil_notatt_set = set([]) #理事回执不参会名单
  17. concil_notans_set = set([]) #理事未回执名单
  18. for each in concil_set:
  19. if each in ans_attend_set:
  20. concil_attend_set.add(each)
  21. elif each in ans_notatt_set:
  22. concil_notatt_set.add(each)
  23. else:
  24. concil_notans_set.add(each)
  25. #3. 显示结果
  26. def disp(ss, cap, num = True):
  27. #ss: 名单集合
  28. #cap: 开头描述
  29. print(cap,'({})'.format(len(ss)))
  30. for i in range(np.ceil(len(ss)/5).astype(int)):
  31. pre = i * 5
  32. nex = (i+1) * 5
  33. #调整显示格式
  34. dd = ''
  35. for each in list(ss)[pre:nex]:
  36. if len(each) == 2:
  37. dd = dd + ' ' + each
  38. elif len(each) == 3:
  39. dd = dd + ' ' + each
  40. else:
  41. dd = dd + '' + each
  42. print('{:3.0f} -{:3.0f} {}'.format(i*5+1,(i+1)*5,dd))
  43. disp(concil_attend_set,'\n参会理事')
  44. disp(concil_notatt_set,'\n不参会理事')
  45. disp(concil_notans_set,'\n未回执理事')
  46. #4. 将理事参会情况,写入excel
  47. df = pd.DataFrame(list(concil_attend_set),columns = ['参会理事'])
  48. df['']=pd.DataFrame([''])
  49. df['序号1'] = pd.DataFrame(np.arange(len(concil_notatt_set))+1)
  50. df['不参会理事'] = pd.DataFrame(list(concil_notatt_set))
  51. df['_']=pd.DataFrame([''])
  52. df['序号2'] = pd.DataFrame(np.arange(len(concil_notans_set))+1)
  53. df['未回执理事'] = pd.DataFrame(list(concil_notans_set))
  54. df.index = df.index + 1
  55. df.to_excel('理事和会员回执统计.xlsx', sheet_name='理事回执统计')
  56. print('\n\n写入excel成功~~')

step_2

  1. import os
  2. import numpy as np
  3. import pandas as pd
  4. os.chdir('C:\\Users\\dell\\Desktop')
  5. print('work_directory: ', os.getcwd())
  6. data = pd.read_excel('理事与会员名单.xlsx','理事与会员名单')
  7. #1.载入excel,得到三个名单
  8. ans_attend_set = set(data['回执参加']) #回执参会名单
  9. N = len(ans_attend_set)
  10. ans_notatt_idx = [i for i in range(N) if type(data['回执不参加'][i]) == np.float][0]
  11. ans_notatt_set = set(data['回执不参加'][:ans_notatt_idx])#回执不参会名单
  12. mem_idx = [i for i in range(N) if type(data['被推荐人'][i]) == np.float][0]
  13. mem_set = set(data['被推荐人'][:mem_idx]) #被推荐为会员代表名单
  14. #2.统计会员参会情况
  15. mem_attend_set = set([]) #回执参会会员
  16. mem_notatt_set = set([]) #回执不参会会员
  17. mem_notans_set = set([]) #未回执会员
  18. for each in mem_set:
  19. if each in ans_attend_set:
  20. mem_attend_set.add(each)
  21. elif each in ans_notatt_set:
  22. mem_notatt_set.add(each)
  23. else:
  24. mem_notans_set.add(each)
  25. #3. 显示结果
  26. def disp(ss, cap, num = True):
  27. #ss: 名单集合
  28. #cap: 开头描述
  29. print(cap,'({})'.format(len(ss)))
  30. for i in range(np.ceil(len(ss)/5).astype(int)):
  31. pre = i * 5
  32. nex = (i+1) * 5
  33. #调整显示格式
  34. dd = ''
  35. for each in list(ss)[pre:nex]:
  36. if len(each) == 2:
  37. dd = dd + ' ' + each
  38. elif len(each) == 3:
  39. dd = dd + ' ' + each
  40. else:
  41. dd = dd + '' + each
  42. print('{:3.0f} -{:3.0f} {}'.format(i*5+1,(i+1)*5,dd))
  43. disp(mem_attend_set,'\n参会会员')
  44. disp(mem_notatt_set,'\n不参会会员')
  45. disp(mem_notans_set,'\n未回执会员')
  46. #4. 将会员参会情况,写入excel
  47. if len(mem_attend_set) > len(mem_notans_set):
  48. print('#1')
  49. L = len(mem_attend_set)
  50. mem_notans_list = list(mem_notans_set)
  51. mem_notans_list.extend([''] * (L - len(mem_notans_set)))
  52. mem_attend_list = list(mem_attend_set)
  53. else:
  54. print('#2')
  55. L = len(mem_notans_set)
  56. mem_attend_list = list(mem_attend_set)
  57. mem_attend_list.extend([''] * (L - len(mem_attend_set)))
  58. mem_notans_list = list(mem_notans_set)
  59. df = pd.DataFrame(mem_attend_list,columns = ['参会会员'])
  60. df['']=pd.DataFrame([''])
  61. if len(mem_notatt_set) == 0:
  62. df['序号1'] = np.NaN
  63. df['不参会会员'] = np.NaN
  64. else:
  65. df['序号1'] = pd.DataFrame(np.arange(len(mem_notatt_set))+1)
  66. df['不参会会员'] = pd.DataFrame(list(mem_notatt_set))
  67. df['_']=pd.DataFrame([''])
  68. df['序号2'] = pd.DataFrame(np.arange(len(mem_notans_set))+1)
  69. df['未回执会员'] = pd.DataFrame(mem_notans_list)
  70. df.index = df.index + 1
  71. df0 = pd.read_excel('理事和会员回执统计.xlsx',sheet_name='理事回执统计')
  72. writer = pd.ExcelWriter('理事和会员回执统计.xlsx')
  73. df0.to_excel(writer, sheet_name='理事回执统计')
  74. df.to_excel(writer, sheet_name='会员回执统计')
  75. writer.save()
  76. print('\n\n写入excel成功~~')

version_4

version_4的改进在于将version_3中写入两张表格的操作,集成在一个程序里,只需要运行一次便可写入两张表,也总是会写入两张表。问题是要是你只想写入一张表呢??

  1. import os
  2. import numpy as np
  3. import pandas as pd
  4. os.chdir('C:\\Users\\dell\\Desktop')
  5. print('work_directory: ', os.getcwd())
  6. loadfile_sheet = ['理事与会员名单.xlsx','理事与会员名单']
  7. columns = ['回执参加','回执不参加','理事','会员']
  8. savefile_sheet = ['理事和会员回执统计.xlsx','理事回执统计','会员回执统计']
  9. display = [1,1]
  10. def main(loadfile_sheet,columns,savefile_sheet,display):
  11. #1. 载入excel,得到名单
  12. data = pd.read_excel(loadfile_sheet[0],loadfile_sheet[1])
  13. def first_nan_index(pd):
  14. for i, each in enumerate(pd):
  15. if type(each) == np.float:
  16. return i
  17. return i
  18. idx = first_nan_index(data[columns[0]])
  19. ans_attend_set = set(data[columns[0]][:idx])#回执参会名单
  20. idx = first_nan_index(data[columns[1]])
  21. ans_notatt_set = set(data[columns[1]][:idx])#回执不参会名单
  22. idx = first_nan_index(data[columns[2]])
  23. concil_set = set(data[columns[2]][:idx])#理事名单
  24. idx = first_nan_index(data[columns[3]])
  25. mem_set = set(data[columns[3]][:idx])#会员名单
  26. #2. 统计参会情况
  27. concil_attend_set = set([]) #回执参会理事
  28. concil_notatt_set = set([]) #回执不参会理事
  29. concil_notans_set = set([]) #未回执理事
  30. for each in concil_set:
  31. if each in ans_attend_set:
  32. concil_attend_set.add(each)
  33. elif each in ans_notatt_set:
  34. concil_notatt_set.add(each)
  35. else:
  36. concil_notans_set.add(each)
  37. mem_attend_set = set([]) #回执参会会员
  38. mem_notatt_set = set([]) #回执不参会会员
  39. mem_notans_set = set([]) #未回执会员
  40. for each in mem_set:
  41. if each in ans_attend_set:
  42. mem_attend_set.add(each)
  43. elif each in ans_notatt_set:
  44. mem_notatt_set.add(each)
  45. else:
  46. mem_notans_set.add(each)
  47. #3. 是否显示中间结果
  48. def disp(ss, cap, num = True):
  49. #ss: 名单集合
  50. #cap: 开头描述
  51. print(cap,'({})'.format(len(ss)))
  52. for i in range(np.ceil(len(ss)/5).astype(int)):
  53. pre = i * 5
  54. nex = (i+1) * 5
  55. #调整显示格式
  56. dd = ''
  57. for each in list(ss)[pre:nex]:
  58. if len(each) == 2:
  59. dd = dd + ' ' + each
  60. elif len(each) == 3:
  61. dd = dd + ' ' + each
  62. else:
  63. dd = dd + '' + each
  64. print('{:3.0f} -{:3.0f} {}'.format(i*5+1,(i+1)*5,dd))
  65. if display[0]:
  66. disp(concil_attend_set,'\n参会理事')
  67. disp(concil_notatt_set,'\n不参会理事')
  68. disp(concil_notans_set,'\n未回执理事')
  69. if display[1]:
  70. disp(mem_attend_set,'\n参会会员')
  71. disp(mem_notatt_set,'\n不参会会员')
  72. disp(mem_notans_set,'\n未回执会员')
  73. #4. 写入excel
  74. def trans_pd(df,ss,cap,i=1):
  75. if len(ss) == 0:
  76. df['序号{}'.format(i)] = np.NaN
  77. df[cap] = np.NaN
  78. else:
  79. df['序号{}'.format(i)] = pd.DataFrame(np.arange(len(ss))+1)
  80. df[cap] = pd.DataFrame(list(ss))
  81. df['_'*i]=pd.DataFrame([''])
  82. return df
  83. def set2list(mem_attend_set,mem_notans_set):
  84. if len(mem_attend_set) > len(mem_notans_set):
  85. L = len(mem_attend_set)
  86. mem_notans_list = list(mem_notans_set)
  87. mem_notans_list.extend([''] * (L - len(mem_notans_set)))
  88. mem_attend_list = list(mem_attend_set)
  89. else:
  90. L = len(mem_notans_set)
  91. mem_attend_list = list(mem_attend_set)
  92. mem_attend_list.extend([''] * (L - len(mem_attend_set)))
  93. mem_notans_list = list(mem_notans_set)
  94. return mem_attend_list,mem_notans_list
  95. mem_attend_list, mem_notans_list = set2list(mem_attend_set, mem_notans_set)
  96. df1 = pd.DataFrame(mem_attend_list,columns = ['参会会员'])
  97. df1['']=pd.DataFrame([''])
  98. df1 = trans_pd(df1,mem_notatt_set,'不参会会员')
  99. df1 = trans_pd(df1,mem_notans_set,'未回执会员',2)
  100. df1.index = df1.index + 1
  101. concil_attend_list, concil_notans_list = set2list(concil_attend_set, concil_notans_set)
  102. df2 = pd.DataFrame(concil_attend_list,columns = ['参会理事'])
  103. df2['']=pd.DataFrame([''])
  104. df2 = trans_pd(df2,concil_notatt_set,'不参会理事')
  105. df2 = trans_pd(df2,concil_notans_list,'未回执理事',2)
  106. df2.index = df2.index + 1
  107. writer = pd.ExcelWriter(savefile_sheet[0])
  108. df2.to_excel(writer, sheet_name=savefile_sheet[1])
  109. df1.to_excel(writer, sheet_name=savefile_sheet[2])
  110. writer.save()
  111. print('\n\n写入excel成功~~')
  112. if __name__ == '__main__':
  113. main(loadfile_sheet,columns,savefile_sheet,display)

version_5

version_5对修复set中出现nan的方法进行了改进和简化; 而且将程序模块化,更具可读性; 可以自由控制写入多少张表了。

  1. import os
  2. import numpy as np
  3. import pandas as pd
  4. os.chdir('C:\\Users\\dell\\Desktop')
  5. print('work_directory: ', os.getcwd())
  6. loadfile_sheet = ['理事与会员名单.xlsx','理事与会员名单']
  7. common_columns = ['回执参加','回执不参加']
  8. concerned_columns = ['理事','会员']
  9. disp_columns = ['参会','不参会','未回执']
  10. savefile_sheet = ['理事和会员回执统计.xlsx','理事回执统计','会员回执统计']
  11. def disp(ss, cap, num = True):
  12. #ss: 名单集合
  13. #cap: 开头描述
  14. print(cap,'({})'.format(len(ss)))
  15. for i in range(np.ceil(len(ss)/5).astype(int)):
  16. pre = i * 5
  17. nex = (i+1) * 5
  18. #调整显示格式
  19. dd = ''
  20. for each in list(ss)[pre:nex]:
  21. if len(each) == 2:
  22. dd = dd + ' ' + each
  23. elif len(each) == 3:
  24. dd = dd + ' ' + each
  25. else:
  26. dd = dd + '' + each
  27. print('{:3.0f} -{:3.0f} {}'.format(i*5+1,(i+1)*5,dd))
  28. def trans_pd(df,ss,cap,i=1):
  29. df['_'*i]=pd.DataFrame([''])
  30. if len(ss) == 0:
  31. df['序号{}'.format(i)] = np.NaN
  32. df[cap] = np.NaN
  33. else:
  34. df['序号{}'.format(i)] = pd.DataFrame(np.arange(len(ss))+1)
  35. df[cap] = pd.DataFrame(list(ss))
  36. return df
  37. def set2list(ss1,ss2):
  38. if len(ss1) > len(ss2):
  39. L = len(ss1)
  40. ss2_list = list(ss2)
  41. ss2_list.extend([''] * (L - len(ss2)))
  42. ss1_list = list(ss1)
  43. else:
  44. L = len(ss2)
  45. ss1_list = list(ss1)
  46. ss1_list.extend([''] * (L - len(ss1)))
  47. ss2_list = list(ss2)
  48. return ss1_list,ss2_list
  49. def get_df(loadfile_sheet,common_columns,concerned_column,disp_columns, display = True):
  50. #1. 载入excel
  51. data = pd.read_excel(loadfile_sheet[0],loadfile_sheet[1])
  52. common_set1 = set(data[common_columns[0]])
  53. common_set1.discard(np.NaN)
  54. common_set2 = set(data[common_columns[1]])
  55. common_set2.discard(np.NaN)
  56. concerned_set = set(data[concerned_column])
  57. concerned_set.discard(np.NaN)
  58. #2. 统计
  59. concerned_in_set_1 = set([])
  60. concerned_in_set_2 = set([])
  61. concerned_in_no_set = set([])
  62. for each in concerned_set:
  63. if each in common_set1:
  64. concerned_in_set_1.add(each)
  65. elif each in common_set2:
  66. concerned_in_set_2.add(each)
  67. else:
  68. concerned_in_no_set.add(each)
  69. #3. 显示
  70. if display:
  71. disp(concerned_in_set_1,'\n'+disp_columns[0]+concerned_column)
  72. disp(concerned_in_set_2,'\n'+disp_columns[1]+concerned_column)
  73. disp(concerned_in_no_set,'\n'+disp_columns[2]+concerned_column)
  74. #4. 返回DataFrame
  75. concerned_in_set_1_list, concerned_in_set_2_list = set2list(concerned_in_set_1, concerned_in_no_set)
  76. df = pd.DataFrame(concerned_in_set_1_list,columns = [disp_columns[0]])
  77. df = trans_pd(df,concerned_in_set_2,disp_columns[1])
  78. df = trans_pd(df,concerned_in_no_set,disp_columns[2],2)
  79. df.index = df.index + 1
  80. return df
  81. def save2excel(df, concerned_column, savefile_sheet):
  82. L = len(savefile_sheet) - 1
  83. idx = 0
  84. for i in np.arange(L)+1:
  85. if concerned_column in savefile_sheet[i]:
  86. idx = i
  87. break
  88. if idx != 0:
  89. names = locals()
  90. for i in np.arange(L)+1:
  91. if i != idx:
  92. names['df%s' % i] = pd.read_excel(savefile_sheet[0], sheet_name=savefile_sheet[i])
  93. writer = pd.ExcelWriter(savefile_sheet[0])
  94. for i in np.arange(L)+1:
  95. if i != idx:
  96. names['df%s' % i].to_excel(writer, sheet_name=savefile_sheet[i])
  97. else:
  98. df.to_excel(writer, sheet_name=savefile_sheet[i])
  99. writer.save()
  100. else:
  101. names = locals()
  102. for i in np.arange(L)+1:
  103. names['df%s' % i] = pd.read_excel(savefile_sheet[0], sheet_name=savefile_sheet[i])
  104. writer = pd.ExcelWriter(savefile_sheet[0])
  105. for i in np.arange(L)+1:
  106. names['df%s' % i].to_excel(writer, sheet_name=savefile_sheet[i])
  107. df.to_excel(writer, sheet_name=concerned_column)
  108. writer.save()
  109. print('writing success')
  110. if __name__ == '__main__':
  111. for concerned_column in concerned_columns:
  112. df = get_df(loadfile_sheet,common_columns,
  113. concerned_column,disp_columns, display = True)
  114. save2excel(df, concerned_column, savefile_sheet)

version_final

相比较version_5,修复了一个bug,之前需要先验知识,现在更通用一点(prep函数取代了set2list函数)。

  1. import os
  2. import numpy as np
  3. import pandas as pd
  4. os.chdir('C:\\Users\\dell\\Desktop')
  5. print('work_directory: ', os.getcwd())
  6. loadfile_sheet = ['理事与会员名单.xlsx','理事与会员名单']
  7. common_columns = ['回执参加','回执不参加']
  8. concerned_columns = ['理事','会员']
  9. disp_columns = ['参会','不参会','未回执']
  10. savefile_sheet = ['理事和会员回执统计.xlsx','理事回执统计','会员回执统计']
  11. def disp(ss, cap, num = True):
  12. #功能:显示名单
  13. #ss : 名单集合
  14. #cap :开头描述
  15. print(cap,'({})'.format(len(ss)))
  16. for i in range(np.ceil(len(ss)/5).astype(int)):
  17. pre = i * 5
  18. nex = (i+1) * 5
  19. #调整显示格式
  20. dd = ''
  21. for each in list(ss)[pre:nex]:
  22. if len(each) == 2:
  23. dd = dd + ' ' + each
  24. elif len(each) == 3:
  25. dd = dd + ' ' + each
  26. else:
  27. dd = dd + '' + each
  28. print('{:3.0f} -{:3.0f} {}'.format(i*5+1,(i+1)*5,dd))
  29. def trans_pd(df,ll,cap,i=1):
  30. #功能:生成三列--空列、序号列、数据列
  31. #df : DataFrame结构
  32. #ll : 列表
  33. #cap : 显示的列名
  34. #i : 控制空列的名字
  35. df['_'*i]=pd.DataFrame([''])
  36. if len(set(ll)) == 1:
  37. df['序号{}'.format(i)] = np.NaN
  38. df[cap] = np.NaN
  39. else:
  40. df['序号{}'.format(i)] = pd.DataFrame(np.arange(len(set(ll))-1)+1)
  41. df[cap] = pd.DataFrame(ll)
  42. return df
  43. def prep(ss, N):
  44. #功能:预处理,生成列表,并补齐到长度N
  45. #ss : 集体
  46. #N :长度
  47. ll = list(ss)
  48. L = len(ll)
  49. ll.extend([np.NaN] * (N-L))
  50. return ll
  51. def get_df(loadfile_sheet,common_columns,concerned_column,disp_columns, display = True):
  52. #1. 载入excel
  53. data = pd.read_excel(loadfile_sheet[0],loadfile_sheet[1])
  54. common_set1 = set(data[common_columns[0]])
  55. common_set2 = set(data[common_columns[1]])
  56. concerned_set = set(data[concerned_column])
  57. common_set1.discard(np.NaN)
  58. common_set2.discard(np.NaN)
  59. concerned_set.discard(np.NaN)
  60. #2. 统计
  61. concerned_in_set_1 = set([])
  62. concerned_in_set_2 = set([])
  63. concerned_in_no_set = set([])
  64. for each in concerned_set:
  65. if each in common_set1:
  66. concerned_in_set_1.add(each)
  67. elif each in common_set2:
  68. concerned_in_set_2.add(each)
  69. else:
  70. concerned_in_no_set.add(each)
  71. #3. 显示
  72. if display:
  73. disp(concerned_in_set_1,'\n'+disp_columns[0]+concerned_column)
  74. disp(concerned_in_set_2,'\n'+disp_columns[1]+concerned_column)
  75. disp(concerned_in_no_set,'\n'+disp_columns[2]+concerned_column)
  76. #4. 返回DataFrame
  77. N = np.max([len(concerned_in_set_1),len(concerned_in_set_2),len(concerned_in_no_set)])
  78. concerned_in_set_1_list = prep(concerned_in_set_1,N)
  79. concerned_in_set_2_list = prep(concerned_in_set_2,N)
  80. concerned_in_no_list = prep(concerned_in_no_set,N)
  81. df = pd.DataFrame(concerned_in_set_1_list,columns = [disp_columns[0]])
  82. df = trans_pd(df,concerned_in_set_2_list,disp_columns[1])
  83. df = trans_pd(df,concerned_in_no_list,disp_columns[2],2)
  84. df.index = df.index + 1
  85. return df
  86. def save2excel(df, concerned_column, savefile_sheet):
  87. L = len(savefile_sheet) - 1
  88. idx = 0
  89. for i in np.arange(L)+1:
  90. if concerned_column in savefile_sheet[i]:
  91. idx = i
  92. break
  93. if idx != 0: #如果有对应sheet
  94. names = locals()
  95. for i in np.arange(L)+1:
  96. if i != idx:
  97. names['df%s' % i] = pd.read_excel(savefile_sheet[0], sheet_name=savefile_sheet[i])
  98. writer = pd.ExcelWriter(savefile_sheet[0])
  99. for i in np.arange(L)+1:
  100. if i != idx:
  101. names['df%s' % i].to_excel(writer, sheet_name=savefile_sheet[i])
  102. else:
  103. df.to_excel(writer, sheet_name=savefile_sheet[i])
  104. writer.save()
  105. else: #如果没有对应sheet,创建一个新sheet
  106. names = locals()
  107. for i in np.arange(L)+1:
  108. names['df%s' % i] = pd.read_excel(savefile_sheet[0], sheet_name=savefile_sheet[i])
  109. writer = pd.ExcelWriter(savefile_sheet[0])
  110. for i in np.arange(L)+1:
  111. names['df%s' % i].to_excel(writer, sheet_name=savefile_sheet[i])
  112. df.to_excel(writer, sheet_name=concerned_column)
  113. writer.save()
  114. print('writing success')
  115. if __name__ == '__main__':
  116. for concerned_column in concerned_columns:
  117. df = get_df(loadfile_sheet,common_columns,
  118. concerned_column,disp_columns, display = True)
  119. save2excel(df, concerned_column, savefile_sheet)

相关推荐:

python 实现在Excel末尾增加新行


以上就是python实战之实现excel读取、统计、写入的示例讲解的详细内容,更多请关注Gxl网其它相关文章!

人气教程排行