問題解惑:MPI Application rank 0 exited before MPI_Finalize() with status 2
2016-09-23 by:CAE仿真在線 來(lái)源:互聯(lián)網(wǎng)
這種問題是fluent多線程問題,一旦出現(xiàn)這種問題整個(gè)fluent就死掉了,所有的數(shù)據(jù)都無(wú)法保存,問題很嚴(yán)重。
但是問題一般情況不是多線程本身的問題,而是因?yàn)榫€程里面運(yùn)行的計(jì)算過程出現(xiàn)了問題。
1、MPI_Finalize() with status 2 原因之一:出現(xiàn)負(fù)體積
只要出現(xiàn)負(fù)體積,線程的計(jì)算就無(wú)法進(jìn)行下去了,這時(shí)候線程要拋出異常終止
Error at Node 3: Update-Dynamic-Mesh failed. Negative cell volume detected.
WARNING: 2 cells with non-positive volume detected.MPI Application rank 0 exited before MPI_Finalize() with status 2
2、MPI_Finalize() with status 2 原因之二:任何原因出現(xiàn)發(fā)散或速度或移動(dòng)推進(jìn)速度過快的情況,如Courant數(shù)超大
112 more time steps
Updating solution at time level N...
Global Courant Number [Explicit VOF Criteria] : 471.06
Error at Node 0: Global courant number is greater than 250.00 The
velocity field is probably diverging. Please check the solution
and reduce the time-step if necessary.
Error at Node 1: Global courant number is greater than 250.00 The
velocity field is probably diverging. Please check the solution
and reduce the time-step if necessary.
Error at Node 2: Global courant number is greater than 250.00 The
velocity field is probably diverging. Please check the solution
and reduce the time-step if necessary.
Error at Node 3: Global courant number is greater than 250.00 The
velocity field is probably diverging. Please check the solution
and reduce the time-step if necessary.
MPI Application rank 0 exited before MPI_Finalize() with status 2
===============Message from the Cortex Process================================
Fatal error in one of the compute processes.
==============================================================================
Error: Cortex received a fatal signal (unrecognized signal).
Error Object: ()
Error: There is no active application.
Error Object: (case-modified?)
Error: No journal response to dialog box message:'There is no active application.'
. Internally, cancelled the dialog.
Error: There is no active application.
Error Object: (rp-var-value 'physical-time-step)
Error: There is no active application.
Error Object: (rp-var-value 'delta-time-sampled)
3、MPI_Finalize() with status 2 原因之三:內(nèi)存過度緊張,多線程中只要任何一個(gè)線程無(wú)法分配到足夠的內(nèi)存,就會(huì)終止
一般情況下,MPI_Finalize() with status 出現(xiàn)之前會(huì)有錯(cuò)誤信息,如上面e文所示,但是有些情況是沒有的
如下圖所示
上面的計(jì)算是在進(jìn)行一次正常的動(dòng)網(wǎng)格重構(gòu)完成后,進(jìn)入下一次迭代求解計(jì)算的時(shí)候,直接出現(xiàn)了問題。
筆者檢查機(jī)器此時(shí)內(nèi)存占用已經(jīng)達(dá)到92%的水平,為了進(jìn)一步驗(yàn)證這個(gè)猜測(cè),本人馬上用一個(gè)可以正常計(jì)算的case,在內(nèi)存90%以上的情況下進(jìn)行計(jì)算
開始可以計(jì)算一步,第二部就直接MPI_Finalize() with status 2
fluent多線程mpi異常退出問題,還有多種不同的status,如-1最多,其實(shí)只有兩種類型的錯(cuò)誤,一種是腳本錯(cuò)誤,一種是物理模型數(shù)據(jù)錯(cuò)誤
前者如 journal file 腳本 udf腳本,這些錯(cuò)誤一般會(huì)導(dǎo)致-1或其他-值,后者就是發(fā)散、超指標(biāo)等導(dǎo)致異常物理指標(biāo)的情況終止。
情況很多,各位要具體問題具體分析,先看出現(xiàn)問題之前的log,如果有l(wèi)og這就是問題根源,如果沒有l(wèi)og提示,很可能就是內(nèi)存問題
還有一種情況是在多線程計(jì)算中其實(shí)內(nèi)部不同線程一直在持續(xù)通訊,如果計(jì)算過程網(wǎng)絡(luò)環(huán)境變化,直接就回出問題,下圖是計(jì)算過程通訊圖,如果網(wǎng)絡(luò)改變了,比如ip或網(wǎng)卡屬性變化,在計(jì)算期間是不允許的。
相關(guān)標(biāo)簽搜索:問題解惑:MPI Application rank 0 exited before MPI_Finalize() with status 2 Fluent培訓(xùn) Fluent流體培訓(xùn) Fluent軟件培訓(xùn) fluent技術(shù)教程 fluent在線視頻教程 fluent資料下載 fluent分析理論 fluent化學(xué)反應(yīng) fluent軟件下載 UDF編程代做 Fluent、CFX流體分析 HFSS電磁分析