Some Python Scripts
1. 利用map()函数,转换list中元素的类型
map(function, list)函数有点函数式编程的意思,对list列表执行function操作:
ground_truth = map(int, truth_line.split()[1:]) # 将str第二个开始的元素转换为int数组
2. 利用heapq,求list的topk
import heapq
tmp = heapq.nlargest(topk_num, list) # 得到topk list
for i in range(topk_num)
topk_index.append(list.index(tmp[i])) # 保存下标
3. 利用numpy,求list的average
import numpy as np
print np.array(result_list).mean()
4. 利用random,随机取list的某个值
import random
value = lines[random.randint(0, len(lines)-1)]
5. 对格式稍微复杂的3个文件,对应的域求平均
需求是对3个feature文件的对应域求平均,feature文件的格式如下:
0 0:0.0 1:0.0003 2:0.0 ….
(label featureID0:value featureID1:value featureID2:value ….)
label都相同,后边每一个featureID的值求平均。用shell写稍复杂,故用python进行处理:
import sys
outfile = open(sys.argv[1], 'w')
file1_lines = open(sys.argv[2], 'r').readlines() #大的数组按行存放所有的文本
file2_lines = open(sys.argv[3], 'r').readlines()
file3_lines = open(sys.argv[4], 'r').readlines()
for i in range(len(file1_lines)): #循环每一行
if i % 100 == 0:
print i
list1 = file1_lines[i].split()
list2 = file2_lines[i].split()
list3 = file3_lines[i].split()
outfile.write(str(list1[0]))
for j in range(1, len(list1)): #循环处理每一行的feature value
val1 = list1[j].split(':')[1]
val2 = list2[j].split(':')[1]
val3 = list3[j].split(':')[1]
avg = (float(val1) + float(val2) + float(val3)) / 3
outfile.write('\t' + str(j) + ':' + str(avg))
outfile.write('\n') #don't forget to end this line
outfile.close()