提问人:Lavy 提问时间:11/14/2023 最后编辑:Trenton McKinneyLavy 更新时间:11/15/2023 访问量:40
如何从一个直方图中的多个文件中获取数据?[复制]
How do I get data from several files in one histogram? [duplicate]
问:
对于数据分析,我在单独的直方图中显示各个文件夹中的数据。为了更好地了解测量值的分散度,我现在想在一个直方图中显示文件夹中的所有数据。不幸的是,这完全超出了我的技术能力。问题是这段代码不是我写的,而是我的一位同事发给我的。我已经对其进行了一些修改,使其适合我的问题,但不幸的是,我的 Python 知识还不足以满足更多需求。
所以这是到目前为止的代码:
# import libaries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import itertools
import pandas as pd
import scipy.interpolate as interp
import ptu_reader_JHU as ptu
import csv
from scipy.optimize import curve_fit
from glob import glob
# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")
for file in files:
# create Array with the timestamps of all markers
header, markertime = ptu.ptu_markertime(file)
# create array with the distances of the markers
markertimeshift=np.empty_like(markertime)
markertimeshift[:1]=0
markertimeshift[1:]=markertime[:-1]
markerdistance=np.subtract(markertime, markertimeshift)
# convert markerdistance into seconds
markerdistance = markerdistance*header['MeasDesc_GlobalResolution']
# remove the first marker (sometimes its very fast)
markerdist_removed=markerdistance[1:]
# create arrays with long and short distances
markerdist_group1=markerdist_removed[::2]
markerdist_group2=markerdist_removed[1::2]
# calculate variance of the line length
if max(markerdist_group1)>max(markerdist_group2):
average = np.average(markerdist_group1)
variance = np.var(markerdist_group1)
relvariance = variance/average
outlier = max((average-min(markerdist_group1), max(markerdist_group1)-average))
reloutlier = outlier/average
else:
average = np.average(markerdist_group2)
variance = np.var(markerdist_group2)
relvariance = variance/average
outlier = max((average-min(markerdist_group2), max(markerdist_group2)-average))
reloutlier = outlier/average
# find average of long distance
average_line=max((np.average(markerdist_group1), np.average(markerdist_group2)))
# determine the number of bins. its 10% of the larger distance (10% of the maximum duration of one line)
binwidth = average_line/10
binnr = np.floor((max(markerdistance)-min(markerdistance))/binwidth)
# create and save plots
for i in files:
plt.hist(markerdistance, bins=int(binnr))
plt.xlabel('Time of markers (s)')
plt.ylabel('Number of markers')
plt.show()
# export data in csv file
with open("P:\schlaefke\WRS_Test\PTU_Data.csv", "a", newline = "") as f:
writer = csv.writer(f)
writer.writerow([variance, relvariance, outlier, reloutlier])
我也尝试使用seaborn,但它也不起作用。我想我试图调用错误的变量:
import seaborn as sns
data_histogramm = pd.DataFrame(markerdistance, bins=int(binnr))
sns.histplot(data_histogramm, x="Time of markers (s)", y= "Number of markers")
我将不胜感激任何帮助或提示:)
答:
0赞
Tusher
11/14/2023
#1
您的代码中似乎存在一些问题。
下面是更正后的代码:
# import libraries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
import ptu_reader_JHU as ptu
import csv
from glob import glob
import seaborn as sns
# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")
# create empty list to store all marker distances
all_marker_distances = []
for file in files:
# ... (your existing code to process each file)
# append marker distances to the list
all_marker_distances.extend(markerdistance)
# determine the number of bins for the combined data
binwidth = average_line / 10
binnr = int(np.floor((max(all_marker_distances) - min(all_marker_distances)) / binwidth))
# create and save the plot
plt.hist(all_marker_distances, bins=binnr)
plt.xlabel('Time of markers (s)')
plt.ylabel('Number of markers')
plt.show()
# export data to a CSV file
with open("P:/schlaefke/WRS_Test/PTU_Data.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow([variance, relvariance, outlier, reloutlier])
现在为 Seaborn 创建一个 DataFrame
data_histogram = pd.DataFrame({'Time of markers (s)': all_marker_distances})
并使用Seaborn绘制直方图
sns.histplot(data_histogram, x="Time of markers (s)", bins=binnr)
plt.show()
确保根据您的需要调整 Seaborn 部件,因为 Seaborn 可能有您想要用于自定义的其他参数。
评论
0赞
Lavy
11/14/2023
到目前为止,非常感谢!所以这是正确的,确定数字并且代码的其余部分不再在 for 循环中?
评论