提问人:kyle stegman 提问时间:9/30/2023 最后编辑:marc_skyle stegman 更新时间:9/30/2023 访问量:77
在一系列 CSV 上循环函数
looping functions over series of csv's
问:
我这里有代码来查找强度图中特定峰的平均大小和stdev。我让它适用于单个文件,但我希望能够一次运行多个文件并将平均值和 stdev 合并为一个平均值和一个 stdev。我一直在使用目录让它工作时遇到问题。我也不能将它们合并到一个巨大的数据文件中,因为重叠会弄乱我的数据,所以我需要先单独计算它们,然后再将它们组合在一起。任何帮助将不胜感激!
这是我的代码:
import os
from tkinter import filedialog
from tkinter import Tk
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from shapely.geometry import LineString
def process_file(file_name):
df = pd.read_csv(file_name, skiprows=0) # fill with the position of your CSV file for the intensity plot
Distance = df.iloc[:, 0] # calls the column with the specific header
intensity = df.iloc[:, 1] # calls the column with the specific header
intensity_norm = (intensity - intensity.min())/ (intensity.max() - intensity.min()) # normalizing of the intensity
x = Distance
y = intensity_norm
#y2= np.full_like(intensity_norm, 1 / math.exp(2)) # array made to be the same size as the intensity filled with the scalar value 1/e^2
y2= np.full_like(intensity_norm, 0.5) # array made to be the same size as the intensity filled with the scalar value 0.5 (the full width half max)
plt.plot(x, intensity_norm) # intensity plot
#plt.axhline(y=1/math.exp(2), color='r', linestyle='-') # horizontal line at 0.5 (the full width half max)
plt.axhline(y=0.5, color='r', linestyle='-') # horizontal line at 1/e^2
first_line = LineString(np.column_stack((x, y2))) # makes the first linestring based off of the horizontal line
second_line = LineString(np.column_stack((x, y))) # makes line string based off of the normalized intensity plot
intersection = first_line.intersection(second_line) # finds the intersecting points between first and second linestring
xValues = [p.x for p in intersection.geoms] # calls only the X values of the intersecting points
#print(xValues)
diff_list = []
for i in range(1,len(xValues)): # for every number in the range of values excluding the first value because there is no number before it
xV = xValues[i] - xValues[i-1] #subtracts every number in the set by the previous value in the set
diff_list.append(xV) #creates a new list with the resulting values from previous line
size=[]
for x in range(len(diff_list)):
if 35<diff_list[x]<70: # remove every number smaller then a certain value as well as those greater then a certain value determined by you
size.append(diff_list[x]) #have size represent this new data set
average = np.average(size) # average all the sizes
stdev = np.std(size) # stdev of all the sizes
return average, stdev, N
root_folder = "C:\\Users\\Kyle\\Desktop\\test_test_test"
# files = ['file1.csv','file2.csv']
files = list(os.path.listdir(root_folder))
results = []
for file in files:
results.append(process_file(os.path.join(root_folder,file)))
total_average = 0
total_var = 0
total_N = 0
for (average, std_dev, N) in results:
total_average += average * N
total_var += std_dev**2 * N
total_N += N
total_average = total_average / total_N
total_std_dev = np.sqrt(total_var / total_N)
我尝试将它们组合在一起,但这会带来一些问题,即结果对于我正在寻找的内容不准确。我也尝试将它们放在一个目录中并运行代码,但这也不起作用
答:
2赞
Ahmed AEK
9/30/2023
#1
你需要学习使用函数,基本上把你的代码包装在一个函数中,这个函数将在一个文件上工作。
def process_file(file_name):
df = pd.read_csv(file_name, skiprows=0)
...
return average, std_dev, N
然后,您需要在多个文件上运行它
import os
root_folder = 'C:\my_folder'
# files = ['file1.csv','file2.csv']
files = list(os.listdir(root_folder))
results = []
for file in files:
results.append(process_file(os.path.join(root_folder,file)))
最后,您需要合并结果
total_average = 0
total_var = 0
total_N = 0
for (average, std_dev, N) in results:
total_average += average * N
total_var += std_dev**2 * N
total_N += N
total_average = total_average / total_N
total_std_dev = np.sqrt(total_var / total_N)
评论
0赞
kyle stegman
9/30/2023
如果我这样做,我该如何设置它查找文件的路径。还有没有办法将它设置在我不需要输入文件名的地方,我可以让它运行特定文件夹中的每个文件?
0赞
Ahmed AEK
9/30/2023
@kylestegman我已经修改了答案以处理文件夹中的每个文件。
0赞
kyle stegman
9/30/2023
我编辑以显示我如何实现您制定的步骤,但是我不断收到 AttributeError:模块“ntpath”没有 os.listdir 的属性“listdir”
0赞
Ahmed AEK
9/30/2023
@kylestegman不使用 .os.listdir
os.path.listdir
0赞
kyle stegman
9/30/2023
嗨,对不起,我是编码新手,非常感谢您的帮助。我修复了这个问题,现在我收到错误 N is not defined。在不知道文件数量的情况下,我将如何设置它?
评论