profile
viewpoint

issue openedNVIDIA/AMGX

Unsteady solver using AMGX prints "Using Normal MPI (Hostbuffer) communicator..." data repeatedly

Hi marsaev

I think the sentence buffer is being printed too many times and clutters the log file, I think we talked about this before Is there any update on removing this to a higher level ? here is what I get when I run a simulation [ 0.00%] ncyc 1; Crank -146.995 (deg); dt= 6.250000000e-07 Using Normal MPI (Hostbuffer) communicator... [ 0.00%] ncyc 2; Crank -146.989 (deg); dt= 7.812500000e-07 Using Normal MPI (Hostbuffer) communicator... [ 0.01%] ncyc 3; Crank -146.982 (deg); dt= 9.765625000e-07 Using Normal MPI (Hostbuffer) communicator... [ 0.01%] ncyc 4; Crank -146.972 (deg); dt= 1.220703125e-06 Using Normal MPI (Hostbuffer) communicator... [ 0.01%] ncyc 5; Crank -146.961 (deg); dt= 1.525878906e-06 Using Normal MPI (Hostbuffer) communicator... [ 0.02%] ncyc 6; Crank -146.946 (deg); dt= 1.907348633e-06 Using Normal MPI (Hostbuffer) communicator...

created time in 25 days

issue commentNVIDIA/AMGX

feature request : enable multiple instances of the resource handle

No Sir, the previous issue is resolved, I modified the title as well to reflect the need. All those questions that you have asked are relelated to previous issues, basically my last issue is that I need to be able to create multiple resources handles. and free them without getting weird errors.

Jaberh

comment created time in 2 months

push eventGEM3D/PittPack

jaber

commit sha 2f5e2ff494ca3464f75a791121ee9784df2b5815

WIP: clean up and fix compiler crashes for pgi/19.10

view details

push time in 2 months

issue commentNVIDIA/AMGX

multiple solver instance error in clean up

Hi Marat, Unfortunately, the communicator can change between the runs, as there might be some of the ranks that do not participate in the communication due to having zero elements, if this was also not unique, it would give much more flexibility for multi-physics simulations without the need for putting logic for the work around.

Jaberh

comment created time in 3 months

IssuesEvent

issue closedNVIDIA/AMGX

multiple solver instance error in clean up

Hi Marat I had to open a new issue as I was not sure if you get notifications for the issue that is closed. I have one more question, So the solver supports multi stream solves (such as solving for solid and fluid at the same setup, not related to multi-stream in cuda), I have an interface class and hence I construct several objects using different configs, the solution is correct but since now I free resources (although they belong to different instances of the same class), I get an error in the clean up phase as follows, If I have a single object there are no issues.

AMGX_solver_destroy() 
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!

if I comment out AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources)); then the error changes to, which makes sense as it detects the non freed resources handle.

*** Process received signal ***
 Signal: Segmentation fault (11)
Signal code:  (128)
Failing at address: (nil)
[ 0] /lib64/libpthread.so.0(+0xf630)[0x7f85af5b5630]
[ 1] /lib64/libcuda.so.1(+0x1f3b8d)[0x7f856b8e1b8d]
[ 2] /lib64/libcuda.so.1(+0x1ddbc7)[0x7f856b8cbbc7]
[ 3] /lib64/libcuda.so.1(+0xf9b4b)[0x7f856b7e7b4b]
[ 4] /lib64/libcuda.so.1(cuEventDestroy_v2+0x59)[0x7f856b969ae9]
[ 5] centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x5e8dd0)[0x7f858178cdd0]
[ 6] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x61cea4)[0x7f85817c0ea4]
[ 7] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x29aad)[0x7f85811cdaad]
[ 8] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x2ae16)[0x7f85811cee16]
[ 9] /centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(cublasDestroy_v2+0xe7)[0x7f858125cf77]
[10] libamgxsh.so(_ZN4amgx6Cublas14destroy_handleEv+0x25)[0x7f8588d15085]
[11] libamgxsh.so(_ZN4amgx9ResourcesD1Ev+0x5d)[0x7f8588d14ead]
[12] lib/libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx9ResourcesELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x12)[0x7f8587dcff32]
[13] libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xba)[0x7f8587dd308a]
[14] libamgxsh.so(_ZNSt8_Rb_treeIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEESt4pairIKS6_St10shared_ptrIS5_EESt10_Select1stISB_ESt4lessIS6_ESaISB_EE8_M_eraseEPSt13_Rb_tree_nodeISB_E+
[15] libamgxsh.so(_ZN4amgx10MemManagerIJNS_11CWrapHandleIP28AMGX_resources_handle_structNS_9ResourcesEEEEED1Ev+0x2c)[0x7f8587e42e5c]
[16] /lib64/libc.so.6(+0x39ce9)[0x7f8586377ce9]
[17] /lib64/libc.so.6(+0x39d37)[0x7f8586377d37]
[18] /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f858636055c]
[19]
*** End of error message ***

Hopefully this is the last test case, I would like to know your advice on this before debugging further I highly doubt that order of destruction is the issue as it would affect the single object case but I am listing it here just for the sake of completeness

      AMGX_SAFE_CALL(AMGX_vector_destroy(m_rhs));
      AMGX_SAFE_CALL(AMGX_vector_destroy(m_solution));
      AMGX_SAFE_CALL(AMGX_matrix_destroy(m_matrix));
       AMGX_SAFE_CALL(AMGX_solver_destroy(m_solver));
      AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources));
      AMGX_SAFE_CALL(AMGX_config_destroy(m_config));

It is probably related to

  size_t n_erased = get_mode_bookkeeper<Envelope>().erase(envl);
   bool flag = get_mem_manager<LetterW>().template free<LetterW>(letter);

closed time in 3 months

Jaberh

issue commentNVIDIA/AMGX

multiple solver instance error in clean up

Thanks for the refs! yes, from the first email to use " one resource " for every instance ....

Jaberh

comment created time in 3 months

issue commentNVIDIA/AMGX

AMGX_matrix_upload_all_global fails with 'out of memory' error

OS can limit the amount of pinned memory.

joconnor22

comment created time in 3 months

issue commentNVIDIA/AMGX

AMGX_matrix_upload_all_global fails with 'out of memory' error

Irrelevant question. Are you using pinned memory?

joconnor22

comment created time in 3 months

issue commentNVIDIA/AMGX

AMGX_matrix_upload_all_global fails with 'out of memory' error

Just a friendly note, you can use cudaMemGetInfo(&free_mem, &total); before that call to see the real available memory rather than theoretical memory.

joconnor22

comment created time in 3 months

issue commentNVIDIA/AMGX

multiple solver instance error in clean up

Yes, m_ prefix denotes the members for every instance of the class. I arrange the global initialize and finalize globally like a singleton design so they are called only once, the handles only cause the issue, the config file for solvers are different as one solves for flow and the other solves for structure so need to have a separate file for each and hence m_config, m_resources, etc are object specific

Jaberh

comment created time in 3 months

issue openedNVIDIA/AMGX

multiple solver instance error in clean up

Hi Marat I had to open a new issue as I was not sure if you get notifications for the issue that is closed. I have one more question, So the solver supports multi stream solves (such as solving for solid and fluid at the same setup, not related to multi-stream in cuda), I have an interface class and hence I construct several objects using different configs, the solution is correct but since now I free resources (although they belong to different instances of the same class), I get an error in the clean up phase as follows, If I have a single object there are no issues.

AMGX_solver_destroy() 
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!

if I comment out AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources)); then the error changes to, which makes sense as it detects the non freed resources handle.

*** Process received signal ***
 Signal: Segmentation fault (11)
Signal code:  (128)
Failing at address: (nil)
[ 0] /lib64/libpthread.so.0(+0xf630)[0x7f85af5b5630]
[ 1] /lib64/libcuda.so.1(+0x1f3b8d)[0x7f856b8e1b8d]
[ 2] /lib64/libcuda.so.1(+0x1ddbc7)[0x7f856b8cbbc7]
[ 3] /lib64/libcuda.so.1(+0xf9b4b)[0x7f856b7e7b4b]
[ 4] /lib64/libcuda.so.1(cuEventDestroy_v2+0x59)[0x7f856b969ae9]
[ 5] centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x5e8dd0)[0x7f858178cdd0]
[ 6] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x61cea4)[0x7f85817c0ea4]
[ 7] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x29aad)[0x7f85811cdaad]
[ 8] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x2ae16)[0x7f85811cee16]
[ 9] /centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(cublasDestroy_v2+0xe7)[0x7f858125cf77]
[10] libamgxsh.so(_ZN4amgx6Cublas14destroy_handleEv+0x25)[0x7f8588d15085]
[11] libamgxsh.so(_ZN4amgx9ResourcesD1Ev+0x5d)[0x7f8588d14ead]
[12] lib/libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx9ResourcesELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x12)[0x7f8587dcff32]
[13] libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xba)[0x7f8587dd308a]
[14] libamgxsh.so(_ZNSt8_Rb_treeIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEESt4pairIKS6_St10shared_ptrIS5_EESt10_Select1stISB_ESt4lessIS6_ESaISB_EE8_M_eraseEPSt13_Rb_tree_nodeISB_E+
[15] libamgxsh.so(_ZN4amgx10MemManagerIJNS_11CWrapHandleIP28AMGX_resources_handle_structNS_9ResourcesEEEEED1Ev+0x2c)[0x7f8587e42e5c]
[16] /lib64/libc.so.6(+0x39ce9)[0x7f8586377ce9]
[17] /lib64/libc.so.6(+0x39d37)[0x7f8586377d37]
[18] /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f858636055c]
[19]
*** End of error message ***

Hopefully this is the last test case, I would like to know your advice on this before debugging further I highly doubt that order of destruction is the issue as it would affect the single object case but I am listing it here just for the sake of completeness

      AMGX_SAFE_CALL(AMGX_vector_destroy(m_rhs));
      AMGX_SAFE_CALL(AMGX_vector_destroy(m_solution));
      AMGX_SAFE_CALL(AMGX_matrix_destroy(m_matrix));
       AMGX_SAFE_CALL(AMGX_solver_destroy(m_solver));
      AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources));
      AMGX_SAFE_CALL(AMGX_config_destroy(m_config));

It is probably related to

  size_t n_erased = get_mode_bookkeeper<Envelope>().erase(envl);
   bool flag = get_mem_manager<LetterW>().template free<LetterW>(letter);

created time in 3 months

issue commentNVIDIA/AMGX

new version gives error on parsing *.JSON inputs (cuda/10.2.89)

I have one more question, So the solver supprts multi stream solves (such as solving for solid and fluid at the same setup, not related to multi-stream in cuda), I have an interface class and hence I construct several objects using different configs, the solution is correct but since now I free resources (although they belong to different instances of the same class), I get an error in the clean up phase as follows, If I have a single object there are no issues.

AMGX_solver_destroy() 
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!

if I comment out AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources)); then the error changes to, which makes sense as it detects the non freed resources handle.

*** Process received signal ***
 Signal: Segmentation fault (11)
Signal code:  (128)
Failing at address: (nil)
[ 0] /lib64/libpthread.so.0(+0xf630)[0x7f85af5b5630]
[ 1] /lib64/libcuda.so.1(+0x1f3b8d)[0x7f856b8e1b8d]
[ 2] /lib64/libcuda.so.1(+0x1ddbc7)[0x7f856b8cbbc7]
[ 3] /lib64/libcuda.so.1(+0xf9b4b)[0x7f856b7e7b4b]
[ 4] /lib64/libcuda.so.1(cuEventDestroy_v2+0x59)[0x7f856b969ae9]
[ 5] centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x5e8dd0)[0x7f858178cdd0]
[ 6] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x61cea4)[0x7f85817c0ea4]
[ 7] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x29aad)[0x7f85811cdaad]
[ 8] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x2ae16)[0x7f85811cee16]
[ 9] /centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(cublasDestroy_v2+0xe7)[0x7f858125cf77]
[10] libamgxsh.so(_ZN4amgx6Cublas14destroy_handleEv+0x25)[0x7f8588d15085]
[11] libamgxsh.so(_ZN4amgx9ResourcesD1Ev+0x5d)[0x7f8588d14ead]
[12] lib/libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx9ResourcesELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x12)[0x7f8587dcff32]
[13] libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xba)[0x7f8587dd308a]
[14] libamgxsh.so(_ZNSt8_Rb_treeIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEESt4pairIKS6_St10shared_ptrIS5_EESt10_Select1stISB_ESt4lessIS6_ESaISB_EE8_M_eraseEPSt13_Rb_tree_nodeISB_E+
[15] libamgxsh.so(_ZN4amgx10MemManagerIJNS_11CWrapHandleIP28AMGX_resources_handle_structNS_9ResourcesEEEEED1Ev+0x2c)[0x7f8587e42e5c]
[16] /lib64/libc.so.6(+0x39ce9)[0x7f8586377ce9]
[17] /lib64/libc.so.6(+0x39d37)[0x7f8586377d37]
[18] /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f858636055c]
[19]
*** End of error message ***

Hopefully this is the last test case

Jaberh

comment created time in 3 months

issue commentNVIDIA/AMGX

new version gives error on parsing *.JSON inputs (cuda/10.2.89)

I think the most robust way to handle this is via communicators, it happens a lot in internal combustion engine simulations as the number of mesh in different phases changes drastically

Jaberh

comment created time in 3 months

issue commentNVIDIA/AMGX

new version gives error on parsing *.JSON inputs (cuda/10.2.89)

Hi Marat., I had one more question on the previous comment, most importantly I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this? I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I don't know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Can AMGX handle solving several disconnect graphs? In my example in deadlocks

Jaberh

comment created time in 3 months

issue commentNVIDIA/AMGX

new version gives error on parsing *.JSON inputs (cuda/10.2.89)

one more question, is it possible to disable the print out of Using Normal MPI (Hostbuffer) communicator... it is unnecessary for realistic big case runs just clutters the log file, thanks again for your support

Jaberh

comment created time in 3 months

issue commentNVIDIA/AMGX

new version gives error on parsing *.JSON inputs (cuda/10.2.89)

Thanks for the followup, I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this? I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I dont know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Spasiba for your help.

Jaberh

comment created time in 3 months

more