Toward the assimilation of radar
in AROME and ALADIN : a discussion paper

F. Bouttier
Météo-France/CNRM/GMAP . September 2003

This paper has benefited from substantial contributions by M. Jurasek (SHMI), V. Ducrocq (Météo-France, CNRM/GMME) and P. Tabary (Météo-France, DSO - direction of observing systems).

1. Introduction

Radars are essential for mesoscale data assimilation, as they are the only operational network that can provide information about the structure of clouds and boundary layer in the presence of precipitation or deep convection. The assimilation of radar data is not implemented at all in the ARPEGE/IFS/ALADIN/AROME software so far, it could only be performed in a very approximate way via the retrieval of humidity or winds. Within a few years, radar data processing will be developed to a point of technical and scientific complexity similar to that of satellite radiances. This paper discusses what is the best way to go for the next few years in the ALADIN community. A strategic objective is to improve the initialization of the future AROME model with the operational radar network available over Europe and North Africa (subject to acquisition constraints) during the 2004-2010 period.

2. Which radar data ?

Radar information can be presented to the data assimilation system in several forms :

- reflectivities,

- instantaneous rain rates,

- cumulated rainfall,

- Doppler radial wind-component and related quantities (shear / turbulence),

- VAD vertical profile of wind vectors inferred from Doppler information,

- microphysical content information using multiple polarization.

Each form has advantages and drawbacks, is appropriate to different kinds of atmospheric models and analysis algorithms, and may or may not be used depending on the type of radar available.

Doppler data is relatively easy to assimilate in 3d-var is being thoroughly studied in HIRLAM, but is not available on most European sites. Hence, it will not be discussed in this paper, and the reader is referred to recent HIRLAM Newsletters to convince his/herself that radar winds would be rather straightforward to implement once available.

Multiple polarization is very useful to improve the quality and robustness of radar data at producer level but the direct use of polarization information in NWP is still rather a research topic and requires the availability of a detailed microphysics (e.g. as in the future AROME model).

Reflectivities are available almost everywhere in Europe, are not much used in 3d-var and pose some interesting problems (see Fig. 1). So it is suggested to start by concentrating on the introduction of radar reflectivities in ALADIN 3d-var. Radar networks are going to keep improving so we will go on extending the software whenever new kinds of radar data become available in a nearly operational configuration.

The physical interpretation of reflectivities is much easier if they are produced at several elevations, i.e. with volumic radar scanning. This is not available everywhere, so we should have a general approach that will allow the use of single-elevation data ("PPI" images) as well as multiple-site data.

Figure 1 : International composite radar-reflectivity image over France on 28 August 2003, 13:30 UTC. The image reveals some of the problems of radar data : occurrence of gross error (electronic problem corrupting a whole radar disc), pseudo-random spurious echoes in clear areas around the Mediterranean sea, inconsistencies between the French and UK data, inconsistencies between two neighbouring radars (visible as a bow of echoes South of Paris). There are some less visible problems (orographically masked areas displayed as no-rain pixels, over- and underestimation of reflectivity depending on the distance to the radar). All these problems must be solved automatically in real time before radar data can be considered available for NWP data assimilation.

3. The philosophy : learning from satellites

The situation with radars being similar to that with TOVS radiances 15 years ago, it seems wise to try and apply the lessons learned from the TOVS community :

The remote-sensing process is complex and nonlinear, so we should try to assimilate something close to the measured quantity (reflectivities) instead of using partial inversions to something that seems easier to assimilate in the model (rain rates or derived humidity profiles for instance). One of the reasons is that inverted data contain errors that are very difficult to correct, because they are a mixture of real data, model a-priori information, interpolation errors and empirical assumptions about the atmosphere. In 3d- or 4d-var, it is not really more difficult to use real data than inverted data.

Assimilating reflectivities requires an observation operator that will simulate reflectivities for each radar, so the first part of the job is to develop a system to monitor radar data against model output, and ensure that the observation operator works well. Then we need to write the tangent-linear and adjoint versions of the observation operator, and check the accuracy of the linearization.

Biases and gross errors cannot be handled by 3d- and 4d-var. We need to develop an automated software that will detect and remove the following data : corrupt, too difficult to simulate, or biased (perhaps with the help of the model and / or independent observed data, e.g. from satellites). Bias correction requires a study of the space-and-time structure of the biases between the simulated and the observed data. Each radar must be considered independently. Bias correction and gross error correction are normally done before the minimization process (e.g. in the observation preprocessing and the screening). Too dense data must also be thinned consistently with the resolution of the analysis.

The physical process of the observation must be modelled as accurately as possible. For radars, at the planned resolutions, it means that we need to interpolate/average the model variables precisely along the radar-beam path (more on this below).

The physics of the instrument and the interaction with the atmosphere are complex. Fortunately, many people have studied it in details, so we do not want to redo their job : we must rely on radar specialists to provide us with the physical part of the observation operator. Some interaction with them is necessary to make sure we are speaking about the same kind of instruments and resolutions (most research studies are done with sophisticated radars at very high resolutions, which are not relevant to our NWP plans). Ideally, they should write the physical part of the observation operator, and we should just plug that software into the (suitably interpolated) model fields. That is the way it already works with satellite radiances (the RTTOV software), and we shall try to make it work for radars, by international pluri-disciplinary collaboration.

4. Which model and data assimilation ?

Reflectivities are sensitive to the cloud properties, so their processing is in principle affected by the way clouds are represented in the model (e.g. subgrid-scale diagnostic clouds in large-scale physics, or detailed prognostic mesoscale microphysics). Whether cloud-related fields are part or not of the control variable may also be important to get a good impact of the data. The constraint is that we will need to work simultaneously on ALADIN and AROME for the next few years. It is not a genuine problem really, because we will anyway want to use radar data in large-scale models (ARPEGE and perhaps IFS), so we do not want the use of radar data to be too much tied to a particular set of parameterisations.

Whichever the model, it seems fair to assume that the radar observation-operator only needs to know about a few microphysical fields along the beam path (cloud and precipitation liquid water and ice, primarily) to work. The radar observation-operator does not need to know how these fields are produced, so it can be developed independently from the physics. A physics-specific piece of software will be needed for the interface with the physics to get these fields, and to interpolate them between the model and observation geometries.

Whether the analysis corrects microphysical fields or not is not relevant for radar reflectivity simulation : we assume that the fields are provided by the physics. In the tangent-linear and adjoint observation-operators, it is not relevant either if (and only if) the fields are not part of the 3d- or 4d-var control variables i.e. if their perturbation is kept to zero. If there are microphysical fields in the control variable, which should only be done if we are confident that we will know how to correct them in the analysis, then we will have issues of size and definition of the control variable (i.e. choosing variables with a Gaussian distribution of errors), and of multivariate coupling in the background constraint term (i.e. the "Jb" balance between cloud variables and other variables such as divergence, temperature and water vapour). It is a difficult and complex issue, which can hardly be studied until we have the radar observation-operator and the prognostic cloud microphysics working in AROME : it will be studied later, probably around 2006-2008.

5. Things to do

The list of things to do before we can start radar-reflectivity assimilation experiment in ALADIN 3d-var is the following :

Get some radar-reflectivity data samples of reasonably good quality. This means we need to be very careful about radar data that was originally designed for human visualisation : it probably requires very strict screening of suspicious pixels.
Get an idea of the fields needed to simulate reflectivities. One can look at the simple radar simulation code in Meso-NH. It would be interesting to check the RSM (DWD, HIRLAM) model as well. It is essential to read a bit of scientific literature to learn what is available and what is the validity of the formulae. Discussing with the radar specialists is essential to check we are not going to start with something too silly. The Eumetnet/OPERA web database contains useful references and discussions on the available European radars.
Decide on some simple (to start with) reflectivity simulation formulae following the work in (2). Check them approximately, e.g. by applying them on current ALADIN historical files, to get an impression of the problems to solve.
Specify the observation-operator software completely by making a list of the necessary model fields to use (= to be interpolated for the observation operator), and of the necessary observation information (= measured data, complete identification of its expected quality, description in space and time of the relevant radar beam, all meta-data useful for the monitoring, quality control and bias correction) to be provided to the ALADIN screening and minimization. An important data access pattern is likely to be by beam, i.e. in polar form, rather than regular gridded pixels matrices as normally used for imagery. If we are confident that all along-the-beam effects (like attenuation by precipitation) are corrected before the analysis, then each pixel can be considered independently.
Specify carefully the technical implementation of these specifications. At this stage one can concentrate on the observation operator (leaving the screening and quality control for later) : how will the model fields be extracted from the physics and interpolated/averaged along the radar beam ? (there are parallelization and adjoint issues to study) How will the observed data be implemented into ODB, with specific codes and meta-data ? ( there are data volume issues to study)
It is extremely important to propose code structures that will logically fit with the existing code (fields and observations). So one needs to understand very well the existing observation-operator code, to minimize the disturbance to other developers (on satellite data notably), and to think of the future by leaving room for volumic, Doppler and polarimetric radar data, and for other similar observation operators (= with a slanted interpolation path) : GPS-occultation, limb sounders, GPS-ground slant delays, line-of-sight wind for Doppler wind lidar, radiometers with a large footprint. The aim is not to code all of them, but to think how it could be done, and to write a reflectivity-simulation code that will be compatible with them. Doppler data may become available very soon after reflectivity data (it is already the case in many countries), so it may make sense to implement Doppler data processing at the same time as reflectivities.
Document/write the detailed technical proposal, and have it read by the IFS /ARPEGE /ALADIN /AROME community before starting the heavy coding. It is usually nice to make some quick-and-dirty prototype code to see how things could work, but dirty code should not be merged with an official library release until it is cleaned and approved by everyone.
Implement the "radar data" type into the observation processing : ODB generation, screening, minimization, monitoring. At this stage, the data is simply loaded into ALADIN, but not compared with the model. Check what it costs in CPU and memory with the planned data volumes.
Implement the direct interpolation of model fields. This involves extracting the local gridpoint fields from the physics (or the historical fields if they are available) and interpolating them along each relevant radar path, or just at the radar pixel location if there is no effect along the path, or this effect (anomalous propagation ...) can corrected without using model fields. The beam aperture will require some consistent averaging of model fields on several levels ("beam filling" problem : at 100 km of distance, the beam may be more than 1000 m wide in the vertical). The result is model data at the time and place of each radar pixel. The code must be parallelized.
Convert the interpolated model data into simulated reflectivities, compute the difference with the observed data, store it into ODB and include it into the "Jo" cost function computation. The result is a computation of the "Jo" component for radar data in the screening run, and the ability to monitor the "obs-model" departures in the ECMASCR ODB (using obstat or mandaodb).
Study the monitoring statistics, and design bias correction, screening, thinning and quality control procedures. Implement them either into the screening (if requiring model fields) or in a preprocessing program. Check the speed of execution : it has to be very quick, and probably in parallel mode. This is a very interesting scientific work, and it is crucial for the success of radar assimilation.
Code the tangent-linear and adjoint versions of the observation operator, check the quality of the linearization in a few representative cases, check the correctness of the adjoint. The result is that you can do a 3d-var minimization that uses radar data. Check that the speed and quality of the minimization are not badly affected. Check how much closer to the data the analysis is, compared to the first guess.
Simulate one radar pixel, and check how it is used by 3d-var to correct to atmospheric fields.
Study how one analysis with radar data modifies the ALADIN and AROME (or Meso-NH) forecasts, under several situations (fronts, scattered showers, strong convection) and resolutions (10 km and 2.5 km at least). There should be some improvement to the rain and clouds. Check that the spin-up is not badly affected.
Run several cycles of data assimilation, to see if radar data has a cumulative (and beneficial !) effect on the forecasts : normally, they should be closer to radar (and other) data than when radars are monitored but not assimilated.
Retune the preprocessing and analysis parameters, improve the observation operator, test how new radar types and new physics can make the assimilation more efficient. Run tests on field experiments to detect possible problems. Try better formulations of the background-error constraints to improve the structure of analysis increments.

6. Conclusion : the ALADIN work plan

As one can see from the above list, there is a lot of work to do so it should be shared between several people. The main work processes are :

scientific input : V. Ducrocq and her collaborators (O. Caumont, JP. Pinty, the radar labs and experts) give advice on the first simulation formulae to use. Basically, there are scalar formulae available from the Meso-NH software that can be duplicated (they will be improved by GMME, LA, etc. later but it should not change the basic data requirements that drive the design of the observation-operator software). To get a simple simulation model to start with is important, to understand the issues and do the technical prototyping work; we will upgrade ALADIN when the scientists have news methods to recommend, but we do not need to wait to start the technical work. The most simple formulae only require cloud liquid water and ice at the radar pixel location. These fields can already be taken from the ALADIN physics (the interface already exists for the ECMWF use of radiances). contacts : V. Ducrocq, J.P. Pinty, O. Caumont, P. Tabary (at Météo-France).

interpolation stencil : A difficult question to answer quickly is : can we use each radar pixel by using model fields at the pixel location, or does each pixel require fields along the radar beam path ? One must check with the literature and the specialists to get the right answer. If we only need pixel data interpolation is much simpler, but getting this part of the software strategy wrong may compromise the entire effort on radar data . Studying this should be first part of the work. One also needs to understand some of the science behind the development project. (same contacts as above - some literature reading is required).

ingestion : In parallel, one can probably start working on introducing radar data into ODB. It is a rather long and tedious work, so it is best to start early. One needs to create codes to identify potentially useful radar data and meta-data, and to design a simple layout for ODB. The new code needs to be approved at GMAP and ECMWF, and then one should develop software to write some sample radar data into ODB, and to read it from inside IFS /ARPEGE /ALADIN and from the monitoring software. contacts : P. Caille, P. Moll, D. Puech, ALADIN ODB specialists, (E. Andersson at ECMWF for approval) .

observation operator design : The big part is the interpolation part of the observation operator. This is where most of the technical development and problems will be. The issues are (1) the parallelization : a radar beam will in general be scattered across several processors. What should be reorganized : the field data or the observation data ? (2) the memory : there will be many pixels (think of volumic scanning !). Is it best to extract the gridpoints needed to interpolate each beam, or to get all the gridpoints for a group of beams and access them directly as needed on a single processor ? (3) the efficiency : computations must be organized in an efficient way inside the model code. Current observation operators have an horizontal interpolation (using the SL code), and a vertical interpolation/physical simulator code. The optimal organization that saves both CPU, communications and memory is probably different. contacts : the design should be done in collaboration between an ALADIN specialist (M. Jurasek at SHMI was suggested) and someone from GMAP (E. Wattrelot probably) who knows the code well. Design decisions must be checked by code experts like P. Moll, C. Fischer and M. Hamrud.

development : The rest of the work (explained in the previous section) is rather linear once the interpolation and ODB parts are worked out. It should always involve regular contacts between

- at least one dedicated "ALADINist" (e.g. M. Jurasek but more are possible),

- at least one dedicated "GMAPist" (e.g. E. Wattrelot),

- the "radar science team" of GMME (V. Ducrocq and colleagues), who may provide contacts with other members of the science community when necessary.

collaboration : There will certainly be a long, but crucial, exchange of informations with ECMWF to reach agreements on technical aspects of ODB modifications, the new observation operator(s), and to debug the first common cycles with these modifications.

Scientific publications can be done as soon as monitoring works. Ideally the ALADIN work should start by providing quasi-operational monitoring of some radars on long periods, with code that can be validated against special test cases studied at GMME. This goal can probably be reached sometime in 2004. Then, we can start work on actual assimilation, which will provide the first experiments showing quasi-operational radar impact, probably in 2006. It is too early to foretell whether the first successful results will be found with the ALADIN-10 km model or with one of the AROME models (10 km or 2.5 km).

In parallel with the operational-oriented work, which is the priority for AROME and ALADIN, there will be some "moving target" evolution with the deployment of new radars, and the development of better observation-operator softwares (for preprocessing - e.g. propagation of radar beam - and physical simulation - e.g. interaction with microphysics). We will need to decide from time to time whether we want to continue quickly with old data/software, or whether it is more productive to redo some of the work with new and better data/software.

Comments and initiatives on these issues are welcome from all ALADIN participants who are willing to embark quickly on this complex and time-constrained project.