ALADIN and ARPEGE : Major code changes and stategy for code porting for the next months

(Claude FISCHER)

I will try here to list and comment the major code changes that will strike us, aladinists, in the next months. This text should be an incitative for choosing a strategy for code porting as well as for dissemination of interesting porting results among us.

1.  The ARPEGE CY23 (June 2000 )

* new Observation Data Base (ODB)

This feature is the successor of CMA-arrays for handling the observation data sets in the code. For the time being, ODB has only replaced CMA in the core of the model (in particular, robsar has disappeared), but the ODB structure will progressively replace CMA also in scripts and observation file formats.

* removal of I/O options from obsolete Cray SSD/memory systems:

Some of you know the existence of the "LIO" logicals, which are quite various (see namct0). For instance, they control the i/o onto files stored on the rapid-access SSD memory which exists on Cray.. A recent investigation, worked out by Ryad El Khatib, showed that nobody uses these options in our community anymore. Thus, they will be removed. which will simplify the reading of the spectral transform code (removal of LIO keys) and eliminate the corresponding module "yomio ".

* removal of multitasking option LMLTSK:

This was the old Cray multitask facility, still used in Brussels and Casablanca. However, Morocco should change its platform quite soon, in 2000. So only Belgium is in a position where it would not be able to run the new cycles efficiently after this summer. Hence, Luc Gérard is having contacts with Cray persons for the possible installation of an openMP library, but this issue will be rather medium-term.

* openMP ( OMP):

The OMP directives have appeared in the code in cy22. They were blindly adapted into ALADIN (however, the parallel code structures allowed for a rather automatic transposition of arp/ifs into ald parts). OMP directives can be interpreted by the compiler to produce microtasked code at loop level, provided one compiles with adequate OMP options. No tests of OMP were done for the time being in ALADIN, but a basic strategy is readily defined:
a/ analyze OMP directives in arp/ifs, understand the microtasking,
b/ verify the adaptations already performed in aladin, correct if needed,
c/ run and debug !

Appended are some performances provided kindly by Mats Hamrud from ECMWF. It turns out that OMP allows for executing the code almost with the same cpu time than multitasking, but with a slight increase in memory, by 5-10 %.

2.  CY24 (end 2000)

* externalisation (modularisation) of transforms:

It is intended to modularize much more the spectral transforms, so that they can become a "toolkit" useful for different types of applications. The principle of the transforms won't be changed, but the interface with the upper level model routines will change , so that this modification introduces some heavy recoding for all. The phasing of ALADIN to ARPEGE will require a careful analysis because our north/south transforms are not 100% similar to ARPEGE ones (mean/wind, conversion between u,v and vor,div).

* message passing version/distributed memory version:

The way the distributed memory version was introduced inside arp/ald in the old days makes it impossible to draw any precize interface between shared and distributed memory versions. Differences splash throughout the code , from setup level down to scan2m, with some reflections on low level routines. Very understandably, ECMWF and some developpers elsewhere do not wish to go on maintaining 2 parallel codes. As a consequence, the SM code will disappear very soon, by the complete removal of the key LMESSP.

This cleaning is planned for cy24, so that it would appear in late 2000 in ARPEGE, and in ALADIN somewhere in early 2001 (cycle AL15).

3.  Tests performed so far

* openMP: none

* removal of LMESSP:

Tests to run aladin with LMESSP=TRUE/NPROC=1 but no MPI library (because there "should" be no active message passing if only 1 processor is provided !) have been worked out by Jean-Marc Audoin, Andrey Bogatchev and Jozef Vivoda.

What to retain from those 3 experiences ? Firstly, Jozef's tests indicate that there is a possibility to run the basic configurations without MPI libraries but with lmessp=true. However, in the absence of OMP tests, this code version on 1 processor might perform somewhat less efficient than the good old SM code. The installation of MPI libraries is encouraged on all platforms with 4 processors or more. You will for sure face difficulties in the starting up (installation of mpe/mpi, checking communications on your platform ...). The Hungarian/Slovenian portings show that this is not impossible even if you are running neither on a VPP nor on a SX ! Standard, free versions of MPI exist (MPICH).

For the time being, we have introduced a logical LMPOFF that can be switched on to desactivate MPI calls in e001 and fullpos, in order to run the mono-processor code without MPI library (cycle AL13).

Sone more details on the tests can be found hereafter (experiments of J.-M. Audoin and A. Bogatchev).

4.  Strategy and further tests for 2000

Each center is heavily invited to think about its own goals and strategy. The questions one should answer are:
a/ is it enough for me to run a "minimum" version - that is nproc=1 without message passing, nor openMP ?
b/ if speed-up is required, which one: openMP, message passing or both (both is for grapes of processors, each grape being SM/openMP while grapes would communicate via MPI) ?
c/ dates for tests and plans for switching to new environment.

Here we can add that the message passing version has now proven its reliability and efficiency on multiproc. platforms, even for a modest number of processors. The experiences in Toulouse, Ljubljana and Budapest are good. Prague also could switch to DM version if required.

5.  Coordination aspects

It would be great to have regularly informations about your future tests and plans. I propose that the information should be sent via the alabobo list so that we can have a good overview about difficulties.

I gave you in section (1) the main frame of dates for the code changes, which will for sure influence your own decisions (though you probably wished to work on something else, but this is life !).

6.  Appendix: Figures of performances (provided by Mats Hamrud, ECMWF)

conf001/arpege-ifs/ T63. tests run on J90. The execution times (not shown) were very similar using Cray macro-tasking, OPENMP or message passing.

Macro-tasked

ntasks=1

nproma=255

nrproma=255

Maximum memory used : 20,6035 MWords

ntasks=4

nproma=255

nrproma=255

Maximum memory used : 30,2031 MWords

ntasks=8

nproma=255

nrproma=255

Maximum memory used : 42,0977 MWords

ntasks=16

nproma=255

nrproma=255

Maximum memory used : 64,2773 MWords

ntasks=32

nproma=255

nrproma=255

Maximum memory used : 109,5898 MWords

OpenMP

nproma=255

nrproma=255

omp_num_threads=1

Maximum memory used : 45,4883 MWords

nproma=255

nrproma=255

omp_num_threads=4

Maximum memory used : 54,5547 MWords

nproma=255

nrproma=255

omp_num_threads=8

Maximum memory used : 63,5859 MWords

nproma=255

nrproma=255

omp_num_threads=16

Maximum memory used : 78,3633 MWords

nproma=255

nrproma=255

omp_num_threads=28

Maximum memory used : 82,5313 MWords

MPI

1x1

1x1

nproma=255

nrproma=255

Maximum memory used : 52,5430 MWords

4x1

4x1

nproma=255

nrproma=255

Maximum memory used : 71,5469 MWords

4x1

2x2

nproma=255

nrproma=255

Maximum memory used : 71,3398 MWords

8x1

8x1

nproma=255

nrproma=255

Maximum memory used : 102,1016 MWords

8x1

4x2

nproma=255

nrproma=255

Maximum memory used : 95,2773 MWords

4x2

2x4

nproma=255

nrproma=255

Maximum memory used : 94,4844 MWords

4x2

4x2

nproma=255

nrproma=255

Maximum memory used : 95,0117 MWords

16x1

16x1

nproma=255

nrproma=255

Maximum memory used 139,2031 MWords

16x1

1x16

nproma=255

nrproma=255

Maximum memory used : 144,0352 MWords

16x1

4x4

nproma=255

nrproma=255

Maximum memory used : 139,8086 MWords

16x1

8x2

nproma=255

nrproma=255

Maximum memory used : 140,3438 MWords

8x2

4x4

nproma=255

nrproma=255

Maximum memory used : 136,0781 MWords

4x4

4x4

nproma=255

nrproma=255

Maximum memory used : 135,6367 MWords

15x2

15x2

nproma=255

nrproma=255

Maximum memory used : 216,0547 MWords




Home