无法从 2D 直方图生成 bin 边界内的随机分布点

Trouble generating randomly distributed points within bin bounds from 2D histogram

提问人:Victoria 提问时间:7/19/2023 最后编辑:ReinderienVictoria 更新时间:7/19/2023 访问量:22

问:

我的目标是从 2D 直方图生成散点图,其中如果一个 bin 的计数为 n,则在 bin 边界内随机生成 n 个点。有点像这样:

Image of 2D histogram and corresponding scatterplot

但是,我在生成箱边界内的点以获得更均匀的分布时遇到了问题。例如,以下热图没有生成应有的散点图。

2D 直方图:

2D histogram

不反映 2D 直方图的散点图:

scatterplot that does NOT reflect 2D histogram

我在下面添加了代码和函数调用。如何修复我的代码,使其正确生成积分?

def repopulateScatterHelper(x,y,m):
    """
      generate a random point within bounds
    """
    # compute x and y axis min and max
    maxX = max(xedges) #the max x value from edges
    maxY = max(yedges)

    minX = min(xedges)
    minY = min(yedges)

    # compute bin boundaries
    x1 = float(x)/m * (maxX-minX) + minX
    x2 = float(x+1)/m *(maxX-minX) + minX

    y1 = float(y)/m * (maxY-minY) + minY
    y2 = float(y+1)/m * (maxY-minY) + minY

    # generate random point within bin boundaries
    the_x = uniform(x1, x2)
    the_y = uniform(y1, y2)
    return the_x, the_y
# enddef

def repopulateScatter(H, m):
  """
    @params
      H - 2D array of counts
      m - number of bins along each axis
    @returns
      new_x, new_y - Generated corresponding x and y coordinates of points

  """
  new_x = []
  new_y = []
  for i in range(0,m): # rows
      for j in range(0,m): #colomns
          if H[i][j] > 0: # if count is greater than zero, generate points
              for point in range(0, int(H[i][j])):
                  x_i, y_i = repopulateScatterHelper(i,j,m)
                  new_x.append(x_i)
                  new_y.append(y_i)
              #endfor
          #endif
      #endfor
  #endfor

  return new_x,new_y
#enddef

def plotHistToScatter(new_x, new_y):
   """
      new_x, new_y - x,y coordinates to plot
   """
  new_x = np.array(new_x)
  new_y = np.array(new_y)

  # plot data points
  fig, ax = plt.subplots()
  ax.scatter(new_x,new_y)

  # add LOBF to plot  - https://www.statology.org/line-of-best-fit-python/
  a, b = np.polyfit(new_x,new_y, 1)
  a = float(a)

  plt.plot(new_x, a*new_x+b, color = "red")
  print("DP LOBF:", a , "*(x) +" , b)

  # label the plot
  plt.xlabel(xAxisLabel)
  plt.ylabel(yAxisLabel)
  plt.title("heatmap to scatterplot for " + xAxisLabel + ' vs ' + yAxisLabel + "epsilon =" + str(epsilon))

  plt.show()
#enddef

我的函数调用是:

H, xedges, yedges = np.histogram2d(df[xAxisLabel],df[yAxisLabel], bins=(m, m)) # plot 2D histogram
new_x,new_y = repopulateScatter(H,m) 
plotHistToScatter(new_x, new_y)

我试图更改 repopulateScatterHelper() 函数来修复它,但是我没有成功。

Python 散点图 数据生成

评论

0赞 Reinderien 7/19/2023
我不清楚散点图将如何使您受益。热图或等值线图将提供更多信息。
1赞 Reinderien 7/19/2023
撇开这一点不谈,更明显的问题是:为什么要在积分上撒谎?为什么不绘制生成直方图的原始数据呢?
0赞 spo 7/19/2023
我同意@Reinderien的评论。但是我还是尝试了您的代码,并使用 This 给了我一个预期的散点图。您确定正确传递了变量吗?m = 70 randx = np.random.randint(0,20, m) randy = np.random.randint(0,20, m) xAxisLabel = 'xAxis' yAxisLabel = 'yAxis' epsilon = '0.01'
1赞 Victoria 7/19/2023
我正在为散点图实现差分私有机制。为了能够获得所需的实用程序,我需要创建看起来像散点图的东西。这与其说是撒谎,不如说是为了让一个人对图表的个人贡献受到保护。
1赞 Victoria 7/19/2023
@spo,我重新审视了我的变量,我输入了错误的计数。谢谢你指出这一点!

答: 暂无答案