通过并行循环中子例程中传递的索引处理共享数组

Processing a shared array by a passed index in a subroutine in a parallel loop

提问人:DJNZ 提问时间:11/10/2023 最后编辑:Ian BushDJNZ 更新时间:11/22/2023 访问量:79

问:

在并行循环中,我使用一个子例程处理一个共享数组,我将数组和当前 private-do 索引作为参数传递给该子例程,但程序崩溃并出现数组越界错误。如何正确调用子例程来处理共享数组并将并行循环索引传递给它?

代码在主。F:

      PROGRAM TESTER
         USE OMP_LIB
         USE PRINTER
       
         INTEGER, PARAMETER:: N = 5
                  
         REAL*4,DIMENSION(:),ALLOCATABLE, SAVE :: ARG_1, ARG_2
         REAL*4,DIMENSION(:),ALLOCATABLE:: RES
          
C=======================================================================
C$OMP THREADPRIVATE(ARG_1, ARG_2)
C=======================================================================         

         ALLOCATE(RES(N))
         PRINT *,'MAIN: "RES" IS ALLOCATED = ', 
     >      ALLOCATED(RES)

C$OMP PARALLEL PRIVATE(I) SHARED(RES) NUM_THREADS(2)  

         ALLOCATE(ARG_1(N))
         PRINT *,'MAIN: "ARG_1" IS ALLOCATED = ', 
     >      ALLOCATED(ARG_1)
         
         ALLOCATE(ARG_2(N))
         PRINT *,'MAIN: "ARG_2" IS ALLOCATED = ', 
     >      ALLOCATED(ARG_2)

C Step 1:Initialize working arrays:        
         CALL WORK1(ARG_1,N, ARG_2,N) 
         CALL WORK2(ARG_1,N, ARG_2,N)
        
C Step 2: Print working arrays: 
         CALL PRINT_ARR(ARG_1,N)
         CALL PRINT_ARR(ARG_2,N)

         PRINT *,'===================================='
         
C Step 3: Parallel Loop:
c-----------------------------------------------------------------------
C$OMP DO 
         DO I=1,N
            CALL WORK3(RES,I,ARG_1(I),ARG_2(I))
         ENDDO
C$OMP END DO
         CALL PRINT_ARR(RES,N)
c-----------------------------------------------------------------------
C$OMP END PARALLEL
         DEALLOCATE(ARG_1,ARG_2)
         DEALLOCATE(RES)
    
      END PROGRAM TESTER   

工作代码。F 文件:

      SUBROUTINE WORK1(ARG_ARR_1,DIM_1,ARG_ARR_2,DIM_2)
         INTEGER DIM_1, DIM_2,I,J
         REAL*4 ARG_ARR_2(DIM_2)
         REAL*4 ARG_ARR_1(DIM_1)
         REAL*4 ARG1, ARG2
         REAL*4,DIMENSION(:),ALLOCATABLE:: ARG_ARR_3
         
         SAVE
c-----------------------------------------------------------------------
C$OMP THREADPRIVATE (I)         
c-----------------------------------------------------------------------
         DO I=1,DIM_1
            ARG_ARR_1(I)= 1.0
         ENDDO
         RETURN
      ENTRY WORK2 (ARG_ARR_1,DIM_1,ARG_ARR_2,DIM_2)  
         DO I=1,DIM_2
            ARG_ARR_2(I)= 2.0
         ENDDO
         RETURN
      ENTRY WORK3 (ARG_ARR_3,J,ARG1,ARG2)
         ARG_ARR_3(J)= ARG1+ARG2
         RETURN
      END SUBROUTINE WORK1

和 module.f 代码:

      MODULE PRINTER
     
         CONTAINS 

            SUBROUTINE PRINT_ARR(ARR_VAR,SIZE)
               REAL*4,DIMENSION(:),ALLOCATABLE:: ARR_VAR
               INTEGER SIZE
               INTEGER,SAVE:: J
c-----------------------------------------------------------------------
C$OMP THREADPRIVATE(J)               
c-----------------------------------------------------------------------
               DO J=1,SIZE
                  PRINT *,'ARR_VAR(',J,')=',ARR_VAR(J)
               ENDDO    
               FLUSH(6)            
            END SUBROUTINE PRINT_ARR

      END MODULE PRINTER

我的编译和运行命令:

gfortran -fopenmp -O0 -g -fcheck=all -fbacktrace -c module1.f work.F main.F
gfortran -fopenmp *.o -o a.x
./a.x

我的输出:

 MAIN: "RES" IS ALLOCATED =  T
 MAIN: "ARG_1" IS ALLOCATED =  T
 MAIN: "ARG_2" IS ALLOCATED =  T
 ARR_VAR(           1 )=   1.00000000    
 ARR_VAR(           2 )=   1.00000000    
 ARR_VAR(           3 )=   1.00000000    
 ARR_VAR(           4 )=   1.00000000    
 ARR_VAR(           5 )=   1.00000000    
 ARR_VAR(           1 )=   2.00000000    
 ARR_VAR(           2 )=   2.00000000    
 ARR_VAR(           3 )=   2.00000000    
 ARR_VAR(           4 )=   2.00000000    
 ARR_VAR(           5 )=   2.00000000    
 ====================================
 MAIN: "ARG_1" IS ALLOCATED =  T

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
 MAIN: "ARG_2" IS ALLOCATED =  T
 ARR_VAR(           1 )=   1.00000000    
 ARR_VAR(           2 )=   1.00000000    
 ARR_VAR(           3 )=   1.00000000    
 ARR_VAR(           4 )=   1.00000000    
 ARR_VAR(           5 )=   1.00000000    
 ARR_VAR(           1 )=   2.00000000    
 ARR_VAR(           2 )=   2.00000000    
 ARR_VAR(           3 )=   2.00000000    
 ARR_VAR(           4 )=   2.00000000    
 ARR_VAR(           5 )=   2.00000000    
 ====================================
At line 21 of file work.F
Fortran runtime error: Index '4' of dimension 1 of array 'arg_arr_3' above upper bound of 2

Error termination. Backtrace:
#0  0x7f1e90ed3ad0 in ???
#1  0x7f1e90ed2c35 in ???
#2  0x7f1e90c8051f in ???
    at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x55d38d47e43d in master.0.work1
    at .../work.F:21
#4  0x55d38d47e04f in work3_
    at .../work.F:20
#5  0x55d38d47dae6 in MAIN__._omp_fn.0
    at .../main.F:44
#6  0x7f1e90e7aa15 in ???
#7  0x55d38d47d45b in tester
    at .../main.F:18
#8  0x55d38d47d58d in main
    at .../main.F:2
Segmentation fault (core dumped)

我使用 gfortran: gcc 版本 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Fortran OpenMP

评论

2赞 PierU 11/10/2023
1)我不知道为什么需要该语句,但是保存的变量不能很好地用于多线程。没有你,你不需要声明为;2) 你有没有把例程放在一个模块里?如果未使用显式接口调用例程,则无法定义可分配的虚拟参数。SAVEWORK1()SAVEITHREADPRIVATEWORK1()
0赞 DJNZ 11/10/2023
非常感谢您的评论!所有这些代码都是大型遗留代码的玩具模型,我正在对其进行现代化改造,并使用该模型来规划未来的更改。1) 我不得不描述为私有,因为实践表明,在调用过程中,索引数组的所有内容都必须标记为 threadprivate,以避免运行时错误“循环索引已更改”。2)我不知道虚拟参数只能在模块中分配,谢谢。但问题的全部意义在于,我不知道如何将 RES 数组传递给 WORK3 子例程。不,子例程不在模块中。IWORK1
1赞 Ian Bush 11/10/2023
帮自己一个大忙,在你考虑可分配之前去掉替代点,更不用说并行了。即使它是由标准定义的(我怀疑是这样,但希望不是),古老和早期现代功能的混合很可能导致编译器错误地狱。Entry
1赞 Ian Bush 11/10/2023
有关传递可分配对象的更多详细信息,请参阅 stackoverflow.com/questions/13496510/...
2赞 PierU 11/10/2023
“实践表明,在调用过程中,索引数组的所有内容都必须标记为 ThreadPrivate,以避免运行时错误”循环索引已更改“:这是错误的,问题出在语句上。如果没有,则不需要 .SAVESAVETHREADPRIVATE

答:

1赞 PierU 11/10/2023 #1

我在这里没有带来完整的解决方案,但很难在评论中详细说明。

  • 首先,您必须删除 中的语句,这是多线程的潜在杀手。SAVEWORK1()
  • 然后,不再需要线程私有化。I
  • 你不需要参数上的 allocate 属性(无论如何它都不会起作用,除非你把例程放在模块中):ARG_ARR_3
      SUBROUTINE WORK1(ARG_ARR_1,DIM_1,ARG_ARR_2,DIM_2)
         INTEGER DIM_1, DIM_2,I,J
         REAL*4 ARG_ARR_2(DIM_2)
         REAL*4 ARG_ARR_1(DIM_1)
         REAL*4 ARG1, ARG2
         REAL*4 ARG_ARR_3(*)

         DO I=1,DIM_1
            ARG_ARR_1(I)= 1.0
         ENDDO
         RETURN
      ENTRY WORK2 (ARG_ARR_1,DIM_1,ARG_ARR_2,DIM_2)  
         DO I=1,DIM_2
            ARG_ARR_2(I)= 2.0
         ENDDO
         RETURN
      ENTRY WORK3 (ARG_ARR_3,J,ARG1,ARG2)
         ARG_ARR_3(J)= ARG1+ARG2
         RETURN
      END SUBROUTINE WORK1

此外,在您的主程序中,这是矫枉过正的:旨在获取并行区域之间的持久私有变量。我看不出这里有什么需要。保持简单,改为声明:THREADPRIVATE(ARG_1, ARG_2)threadprivate

C$OMP PARALLEL PRIVATE(I,ARG_1,ARG_2) SHARED(RES) NUM_THREADS(2)  

最后,应放置在平行区域的末尾之前。DEALLOCATE(ARG_1,ARG_2)

试试那个...但这绝对是一个糟糕的设计(是过去的复兴,以及固定形式的源)。ENTRY

评论

0赞 DJNZ 11/22/2023
感谢您对平行区域之前的注释。deallocate
0赞 DJNZ 11/22/2023 #2

非常感谢您的回答和评论,@PierU和@IanBush!对于长时间的回复,我深表歉意! 我想补充一下我关于关键词的话,以及:threadprivatesave

  • 我正在使用的代码在大约 70+ 个文件中有 ~100k 行,其中包含具有数百或数千行的子例程,并且几乎所有这些子例程都有一个独立的语句,没有变量列表并作用于范围内的所有局部变量。在我的示例中,我模拟了这种环境,对于循环计数器/数组迭代器等关键变量,我被迫使用语句。SAVEthreadprivate
  • 但是,您绝对正确,应尽可能少地使用此构造(+变量列表)。在我的目标代码中,这些子例程在大型 OpenMP 循环中被调用,并显式标记它们的一些局部变量,因为这是不可能的。SAVETHREADPRIVATEprivate
  • 所说的一切也适用于使用:它是遗留环境的一部分,我想将所有代码放在模块中并避免很多问题,但现在我没有时间这样做。entry

因此,我能够制定自己的解决方案,我将在下面介绍。关键在于对共享变量的正确描述:数组 Y 必须共享(作为模块变量)。

主要。F:

      PROGRAM TESTER
         USE OMP_LIB
         USE PRINTER
                  
         REAL*4,DIMENSION(:),ALLOCATABLE, SAVE :: ARG_1, ARG_2
         REAL*4,DIMENSION(:),ALLOCATABLE:: RES
          
C=======================================================================
C$OMP THREADPRIVATE(ARG_1, ARG_2)
C=======================================================================         

         ALLOCATE(RES(N))
         ALLOCATE(Y(N))
         PRINT *,'MAIN: "RES" IS ALLOCATED = ', 
     >      ALLOCATED(RES)
c-----------------------------------------------------------------------
C$OMP PARALLEL PRIVATE(I) SHARED(Y) NUM_THREADS(2)  
c-----------------------------------------------------------------------
         ALLOCATE(ARG_1(N))
         PRINT *,'MAIN: "ARG_1" IS ALLOCATED = ', 
     >      ALLOCATED(ARG_1)
         
         ALLOCATE(ARG_2(N))
         PRINT *,'MAIN: "ARG_2" IS ALLOCATED = ', 
     >      ALLOCATED(ARG_2)

C Initialize working arrays:        
         CALL WORK1(ARG_1,N, ARG_2,N) 
         CALL WORK2(ARG_1,N, ARG_2,N)
        
C Step 1: Print working arrays: 
         CALL PRINT_ARR(ARG_1,N)
         CALL PRINT_ARR(ARG_2,N)

         PRINT *,'===================================='
         FLUSH(6)

C Step 2: Parallel Loop:
c-----------------------------------------------------------------------
C$OMP DO 
         DO I=1,N
c            RES(I)=ARG_1(I) + ARG_2(I)
            CALL WORK3(I,ARG_1(I),ARG_2(I))
         ENDDO
C$OMP END DO
         CALL PRINT_ARR(Y,N)
         DEALLOCATE(ARG_1,ARG_2)
c-----------------------------------------------------------------------
C$OMP END PARALLEL
c-----------------------------------------------------------------------
         
         DEALLOCATE(RES)
    
      END PROGRAM TESTER  

工作。F:

      SUBROUTINE WORK1(ARG1_W1,DIM_1,ARG2_W2,DIM_2)
         USE PRINTER
c------ Input arguments: -----------------------------------------------         

         INTEGER DIM_1, DIM_2, J
         REAL*4 ARG2_W2(DIM_2)
         REAL*4 ARG1_W1(DIM_1)
c dummy arguments for WORK3:         
         REAL*4 ARG1_W3, ARG2_W3
c------ Locals: -------------------------------------------------------- 
        
         INTEGER I
         SAVE I
c------ OpenMP spells: -------------------------------------------------         
c$OMP THREADPRIVATE (I)         
c-----------------------------------------------------------------------
         DO I=1,DIM_1
            ARG1_W1(I) = 1.0
         ENDDO
         RETURN
      ENTRY WORK2 (ARG1_W1,DIM_1,ARG2_W2,DIM_2)  
         DO I=1,DIM_2
            ARG2_W2(I) = 2.0
         ENDDO
         RETURN
      ENTRY WORK3 (J,ARG1_W3,ARG2_W3)

         Y(J)= ARG1_W3 + ARG2_W3
         RETURN
      END SUBROUTINE WORK1

module1.f:

      MODULE PRINTER
         INTEGER, PARAMETER:: N = 5
c NB: array Y is shared!
         REAL*4,DIMENSION(:),ALLOCATABLE::Y
         
         CONTAINS 

            SUBROUTINE PRINT_ARR(ARR_VAR,SIZE)
               REAL*4,DIMENSION(:),ALLOCATABLE:: ARR_VAR
               INTEGER SIZE
               INTEGER,SAVE:: J
c------ OpenMP spells: -------------------------------------------------               
c$OMP THREADPRIVATE(J)       
c-----------------------------------------------------------------------        
               DO J=1,SIZE
                  PRINT *,'ARR_VAR(',J,')=',ARR_VAR(J)
               ENDDO    
               FLUSH(6)            
            END SUBROUTINE PRINT_ARR

      END MODULE PRINTER

我的编译和运行命令:

sudo rm -R -f {*.o,*.x,*.mod}
gfortran -fopenmp -O0 -g -fcheck=all -fbacktrace -c module1.f work.F main.F
gfortran -fopenmp *.o -o a.x
./a.x

我的输出(由于多个线程,输出中可能存在一些混乱):

 MAIN: "RES" IS ALLOCATED =  T
 MAIN: "ARG_1" IS ALLOCATED =  T
 MAIN: "ARG_2" IS ALLOCATED =  T
 ARR_VAR(           1 )=   1.00000000    
 ARR_VAR(           2 )=   1.00000000    
 ARR_VAR(           3 )=   1.00000000    
 ARR_VAR(           4 )=   1.00000000    
 ARR_VAR(           5 )=   1.00000000    
 ARR_VAR(           1 )=   2.00000000    
 ARR_VAR(           2 )=   2.00000000    
 ARR_VAR(           3 )=   2.00000000    
 ARR_VAR(           4 )=   2.00000000    
 ARR_VAR(           5 )=   2.00000000    
 ====================================
 MAIN: "ARG_1" IS ALLOCATED =  T
 MAIN: "ARG_2" IS ALLOCATED =  T
 ARR_VAR(           1 )=   1.00000000    
 ARR_VAR(           2 )=   1.00000000    
 ARR_VAR(           3 )=   1.00000000    
 ARR_VAR(           4 )=   1.00000000    
 ARR_VAR(           5 )=   1.00000000    
 ARR_VAR(           1 )=   2.00000000    
 ARR_VAR(           2 )=   2.00000000    
 ARR_VAR(           3 )=   2.00000000    
 ARR_VAR(           4 )=   2.00000000    
 ARR_VAR(           5 )=   2.00000000    
 ====================================
 ARR_VAR(           1 )=   3.00000000    
 ARR_VAR(           2 )=   3.00000000    
 ARR_VAR(           3 )=   3.00000000    
 ARR_VAR(           4 )=   3.00000000    
 ARR_VAR(           5 )=   3.00000000    
 ARR_VAR(           1 )=   3.00000000    
 ARR_VAR(           2 )=   3.00000000    
 ARR_VAR(           3 )=   3.00000000    
 ARR_VAR(           4 )=   3.00000000    
 ARR_VAR(           5 )=   3.00000000