如何从一个直方图中的多个文件中获取数据?[复制]

How do I get data from several files in one histogram? [duplicate]

提问人:Lavy 提问时间:11/14/2023 最后编辑:Trenton McKinneyLavy 更新时间:11/15/2023 访问量:40

问:

对于数据分析,我在单独的直方图中显示各个文件夹中的数据。为了更好地了解测量值的分散度,我现在想在一个直方图中显示文件夹中的所有数据。不幸的是,这完全超出了我的技术能力。问题是这段代码不是我写的,而是我的一位同事发给我的。我已经对其进行了一些修改,使其适合我的问题,但不幸的是,我的 Python 知识还不足以满足更多需求。

所以这是到目前为止的代码:

# import libaries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import itertools
import pandas as pd
import scipy.interpolate as interp
import ptu_reader_JHU as ptu
import csv
from scipy.optimize import curve_fit
from glob import glob

# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")
for file in files:
    
    # create Array with the timestamps of all markers
    header, markertime = ptu.ptu_markertime(file)

    # create array with the distances of the markers
    markertimeshift=np.empty_like(markertime)
    markertimeshift[:1]=0
    markertimeshift[1:]=markertime[:-1]
    markerdistance=np.subtract(markertime, markertimeshift)

    # convert markerdistance into seconds
    markerdistance = markerdistance*header['MeasDesc_GlobalResolution']

    # remove the first marker (sometimes its very fast)
    markerdist_removed=markerdistance[1:]

    # create arrays with long and short distances
    markerdist_group1=markerdist_removed[::2]
    markerdist_group2=markerdist_removed[1::2]

    # calculate variance of the line length
    if max(markerdist_group1)>max(markerdist_group2):
            average = np.average(markerdist_group1)
            variance = np.var(markerdist_group1)
            relvariance = variance/average
            outlier = max((average-min(markerdist_group1), max(markerdist_group1)-average))
            reloutlier = outlier/average
        
    else:
            average = np.average(markerdist_group2)
            variance = np.var(markerdist_group2)
            relvariance = variance/average
            outlier = max((average-min(markerdist_group2), max(markerdist_group2)-average))
            reloutlier = outlier/average

    # find average of long distance
    average_line=max((np.average(markerdist_group1), np.average(markerdist_group2)))

    # determine the number of bins. its 10% of the larger distance (10% of the maximum duration of one line)
    binwidth = average_line/10

    binnr = np.floor((max(markerdistance)-min(markerdistance))/binwidth)

    # create and save plots

    for i in files:
        plt.hist(markerdistance, bins=int(binnr))
        plt.xlabel('Time of markers (s)')
        plt.ylabel('Number of markers')
    plt.show()

     # export data in csv file
    with open("P:\schlaefke\WRS_Test\PTU_Data.csv", "a", newline = "") as f:
        writer = csv.writer(f)
        writer.writerow([variance, relvariance, outlier, reloutlier])

我也尝试使用seaborn,但它也不起作用。我想我试图调用错误的变量:

import seaborn as sns

data_histogramm = pd.DataFrame(markerdistance, bins=int(binnr))
sns.histplot(data_histogramm, x="Time of markers (s)", y= "Number of markers")

我将不胜感激任何帮助或提示:)

python matplotlib plot 直方图 数据分析

评论


答:

0赞 Tusher 11/14/2023 #1

您的代码中似乎存在一些问题。

下面是更正后的代码:

# import libraries
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
import ptu_reader_JHU as ptu
import csv
from glob import glob
import seaborn as sns

# read all .ptu files in folder
files = glob("P:/schlaefke/WRS_Test/New_Data/Multirange_Grid_smallest_Area4/*.ptu")

# create empty list to store all marker distances
all_marker_distances = []

for file in files:
    # ... (your existing code to process each file)

    # append marker distances to the list
    all_marker_distances.extend(markerdistance)

# determine the number of bins for the combined data
binwidth = average_line / 10
binnr = int(np.floor((max(all_marker_distances) - min(all_marker_distances)) / binwidth))

# create and save the plot
plt.hist(all_marker_distances, bins=binnr)
plt.xlabel('Time of markers (s)')
plt.ylabel('Number of markers')
plt.show()

# export data to a CSV file
with open("P:/schlaefke/WRS_Test/PTU_Data.csv", "a", newline="") as f:
    writer = csv.writer(f)
    writer.writerow([variance, relvariance, outlier, reloutlier])

现在为 Seaborn 创建一个 DataFrame

data_histogram = pd.DataFrame({'Time of markers (s)': all_marker_distances})

并使用Seaborn绘制直方图

sns.histplot(data_histogram, x="Time of markers (s)", bins=binnr)
plt.show()

确保根据您的需要调整 Seaborn 部件,因为 Seaborn 可能有您想要用于自定义的其他参数。

评论

0赞 Lavy 11/14/2023
到目前为止,非常感谢!所以这是正确的,确定数字并且代码的其余部分不再在 for 循环中?