Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#2688 closed user story (fixed)

PETSc 3.6 Support

Reported by: GaryM Owned by:
Priority: normal Milestone: Iteration I3
Component: infrastructure Keywords:
Cc: robinsonm Estimated pair-hours remaining: 0
Pair-hours expended on ticket: 5 Initial estimate of effort: 4
Editable by public: yes

Description

PETSc 3.6 released on 11/6/15

There look like there could be a few changes we need to cope with: http://www.mcs.anl.gov/petsc/documentation/changes/36.html

Change History (21)

comment:1 follow-ups: Changed 5 years ago by louiecn

Dealing with

Removed -pc_hypre_type euclid due to bit-rot

might be annoying. I asked Shameer in Feb what to do about it but he didn't have any suggestions.

comment:3 follow-up: Changed 5 years ago by jmpf@…

  • Estimated pair-hours remaining changed from 4 to 3
  • Pair-hours expended on ticket changed from 0 to 1

Next:

  • Update hostconfig machine files to include this version

comment:4 in reply to: ↑ 3 Changed 5 years ago by jmpf@…

Replying to jmpf@…::

  • Update hostconfig machine files to include this version

Config done in r25204

Next:

linalg/src/LinearSystem.cpp: In member function 'void LinearSystem::RemoveNullSpace()':
linalg/src/LinearSystem.cpp:466:13: error: 'KSPSetNullSpace' was not declared in this scope
linalg/src/LinearSystem.cpp: In member function '_p_Vec* LinearSystem::Solve(Vec)':
linalg/src/LinearSystem.cpp:809:13: error: 'KSPSetNullSpace' was not declared in this scope

comment:5 Changed 5 years ago by jmpf@…

  • Estimated pair-hours remaining changed from 3 to 2
  • Pair-hours expended on ticket changed from 1 to 3

Fixed compilation problems in r25209 by changing the way null spaces are set from 3.3 upwards.

There are still failing tests though.

comment:6 Changed 5 years ago by jmpf@…

r25210 has new logic for PETSc citations

comment:7 Changed 5 years ago by jameso@…

This makes everthing compile on OSX with the latest Petsc. Thanks a million!!

Last edited 5 years ago by jameso@… (previous) (diff)

comment:8 Changed 5 years ago by jmpf@…

  • Pair-hours expended on ticket changed from 3 to 5

Failure of pde/test/TestSimpleNonlinearEllipticSolver.hpp which looks like

PETSC ERROR: Vec is locked read only

is caused by the boundary-conditions class used inside the PETSc SNES solver. PETSc SNES is locking the solution vector as read-only, the boundary-conditions class is using it in a read-only way but the use of VecGetArray within DistributedVector now causes an error.

  • r25222 makes a failing unit test to show what VecGetArray within DistributedVector does when a vector has been locked.
  • r25223 make a read-only VecGetArrayRead implementation of DistributedVector.
  • r25224 uses the new implementation to fix the boundary-conditions class and hence the PDE test.

comment:9 Changed 5 years ago by jmpf@…

r25225/r25226 provides a refinement to the implementation of PetscVecTools::DoInterleavedVecScatter which is aware that the first argument should be treated read-only. There is a unit-test which emulated this behaviour.

This fixes the remaining tests for PETSc 3.6:

  1. linalg/test/TestLinearSystem.hpp
  2. linalg/test/TestPCLDUFactorisation.hpp
  3. linalg/test/TestPCBlockDiagonal.hpp
  4. notforrelease/test/TestBidomainParaParaProblem.hpp

comment:10 Changed 5 years ago by jmpf@…

  • Estimated pair-hours remaining changed from 2 to 0.1

Next step: add to a robert/lofty cron.

comment:11 Changed 5 years ago by rafb@…

Another confirmation of Petsc 3.6 working on latest OS X/homebrew setup. Thanks Joe!

comment:12 Changed 5 years ago by jmpf@…

Put in a new cron line as of r25237 and uploaded to /etc/cron.d/chaste

Note: we should drop back on the number of variants when PETSc 2.3.3 support can be stopped.

comment:14 Changed 5 years ago by jmpf@…

  • Estimated pair-hours remaining changed from 0.1 to 0
  • Resolution set to fixed
  • Status changed from new to closed

The 3.6 variant has now been run in the lofty/robert rotation. A couple of test suites (non-cached mesh and a ventilation tutorial) had errors on Sunday but these are spurious or have failed in other configurations.

comment:15 Changed 5 years ago by jmpf@…

(A note that Gary made InstallGuides/DependencyVersions reflect the regression testing.)

comment:16 in reply to: ↑ 1 ; follow-up: Changed 5 years ago by louiecn

Replying to louiecn:

Removed -pc_hypre_type euclid due to bit-rot

I can't see this was addressed anywhere - why wasn't it a problem? Two preconditioners use it:

source:trunk/linalg/src/PCLDUFactorisation.cpp?rev=23343#L276 source:trunk/linalg/src/PCBlockDiagonal.cpp?rev=23343#L219

Two tests also set this directly, do they complain on 3.6?

source:trunk/notforrelease/test/performance/TestBenchmarkKSPIterationCount.hpp?rev=23380#L128 source:trunk/notforrelease/test/TestFastChasteBenchmarksForPreDiCT.hpp?rev=24885#L337

Looking at this lofty nightly the linalg tests that are supposed to be using LDU factorisation don't, because HYPRE isn't installed (note all the warnings and unused options):

TestLinearSystem.hpp TestPCLDUFactorisation.hpp

In fact from a cursory glance the second test in the latter is very weak - it times without and "with" LDU, but obviously it's not actually using LDU (I don't know what PETSc's fallback is if you ask for HYPRE and it isn't installed).

Same story with TestPCBlockDiagonal.hpp for PCBlockDiagonal.

Last edited 5 years ago by louiecn (previous) (diff)

comment:17 in reply to: ↑ 16 ; follow-up: Changed 5 years ago by jmpf@…

Replying to louiecn:

Replying to louiecn:

Removed -pc_hypre_type euclid due to bit-rot

I can't see this was addressed anywhere - why wasn't it a problem? Two preconditioners use it:

We aren't fully testing HYPRE. See #2676

comment:18 in reply to: ↑ 17 Changed 5 years ago by louiecn

Replying to jmpf@…:

We aren't fully testing HYPRE. See #2676

Plans to do a PETSc 3.6 + HYPRE build?

Last edited 5 years ago by louiecn (previous) (diff)

comment:19 in reply to: ↑ 1 ; follow-up: Changed 5 years ago by louiecn

Replying to louiecn:

Removed -pc_hypre_type euclid due to bit-rot

Archer is upgrading to 3.6 at the end of the month so I asked the petsc-maint mailing list for advice on this today and got back:

The replacement for Euclid in Hypre is Pilut: http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html

Pilut was added way back in 2.1.2, so I wonder if simply replacing euclid with pilut would make sense?

comment:20 in reply to: ↑ 19 Changed 5 years ago by louiecn

Replying to louiecn:

Pilut was added way back in 2.1.2, so I wonder if simply replacing euclid with pilut would make sense?

I tried naively swapping euclid for pilut and got DIVERGED_ITS. This will take some expertise.

I asked migb about it. He suggested using the adjacent commented-out boomeramg bit instead. I tested it and it does work, but it's a lot slower in my test case (about +25%). As a stopgap, the simplest solution would be using one or the other block depending on PETSc version. In the longer term it would be good to try pilut.

In other news there was a reply to the quote above...

Nonsense. From the Hypre Users Manual:

6.14 PILUT: Parallel Incomplete Factorization Note: this code is no longer supported by the hypre team. We recommend to use Euclid instead, which is more versatile and in general more efficient, especially when used with many processors.

Barry removed Euclid about a year ago, due to "bit rot", despite it ostensibly still being supported by Hypre while PILUT is not.

followed by a reply from Barry himself...

If -pc_hypre_type boomeramg works well for your problem then you should by all means use it since it is hard to beat.

For problems where it does not work well then PCASM is likely the way to go with multiple choices for a the preconditioner on each block. The default which is PCILU on each block could be fine

The issue is that there are no definitive recipes for choice of preconditioners, just some rules of thumb and experimentation to see what works well for your problems (and even if you change your problem a bit the best preconditioner would change).

comment:21 Changed 5 years ago by louiecn

I went for migb's suggestion in r26352.

I double checked the speed and it seems to be the same, so I think I was mistaken before about a slowdown.

Note: See TracTickets for help on using tickets.