The Mars Analysis Correction Data Assimilation (MACDA) Dataset V1.0

The Mars Analysis Correction Data Assimilation (MACDA) dataset version 1.0 contains the reanalysis of fundamental atmospheric and surface variables for the planet Mars covering a period of about three Martian years (a Martian year is about 1.88 terrestrial years). This has been produced by data assimilation of observations from NASA's Mars Global Surveyor (MGS) spacecraft during its science mapping phase (February 1999–August 2004). In particular, we have used retrieved thermal profiles and total dust optical depths from the Thermal Emission Spectrometer (TES) on board MGS. Data have been assimilated into a Mars global climate model (MGCM) using the Analysis Correction scheme developed at the UK Meteorological Office. The MGCM used is the UK spectral version of the Laboratoire de Météorologie Dynamique (LMD, Paris, France) MGCM. MACDA is a joint project of the University of Oxford and The Open University in the UK.


Introduction
Ever-increasing numbers of atmospheric observations from orbiting spacecraft, and increasingly sophisticated numerical models, have recently permitted data assimilation techniques to be applied to planets beyond Earth. A meteorological 'reanalysis' is the application of a single consistent scheme to assimilate data spanning an extended historical period. Mars is the first extraterrestrial planet for which reanalyses of the atmospheric state are now available (Montabone et al., 2006a;Lewis et al., 2007;Greybush et al., 2012;Lee et al., 2013).
The Thermal Emission Spectrometer (TES) (Christensen, 2001) on board NASA's Mars Global Surveyor (MGS) has produced an extensive atmospheric dataset during its scientific mapping phase between February 1999 and August 2004. The well-sampled spatial and temporal coverage given by the 2-h, sun-synchronous polar orbit has permitted the observation of Mars at local times centred around 2 AM and 2 PM (at tropical and mid-latitudes), while displacing about 30°in longitude at each new orbit, corresponding to about 12 complete orbits per mean solar day.
Thermal profiles for the atmosphere up to about 40 km altitude and infrared column dust optical depths (almost entirely limited to daytime) have been retrieved from TES absorption spectra in nadir view, among other products (Smith, 2004). These data cover almost three complete Martian seasonal cycles, 1 from late northern summer in MY 24 to late northern spring in MY 27. 2 This dataset of global atmospheric observations is ideal for data assimilation into a global climate model. We have, therefore, used it to produce the 4-year (MY 24-27) reanalysis of the atmosphere of Mars which is described in this article.
The use of this reanalysis for scientific studies has already led to several publications. In particular, here we mention those on the interannual variability in dust storms (Montabone et al., 2005) and their impact on the landing of NASA's Mars Exploration Rovers as well as ESA's Beagle 2 (Montabone et al., 2006b), on the interannual variability in thermal tides (Lewis and Barker, 2005), on a teleconnection event during the planetencircling dust storm in 2001 (MY 25, Mart ınez-Alvarado et al., 2009), on the radiative effects of tropical water ice clouds (Wilson et al., 2008), and on Martian weather predictability (Rogberg et al., 2010). More recent ongoing work involves the use of the MGS/TES Mars Analysis Correction Data Assimilation (MACDA) reanalysis for the study of the solstitial pause in the intensity of high latitude baroclinic waves, studies of the Martian boundary layer, and the interannual variability in polar vortex dynamics and atmospheric angular momentum.
In Section 2, we summarize key information about the MGS/TES observations, the global climate model and the data assimilation scheme that we have used. Section 3 is devoted to the description of the MACDA MGS/TES v1.0 dataset. In Section 4 we describe the available web interface for the dataset visualization. How to access the database and the visualization tool is detailed in Section 5. Finally, we mention future improvements of the database in Section 6.

MGS/TES Observations and Data Assimilation
TES nadir retrievals of thermal profiles and total (i.e. integrated over the whole atmospheric column) dust optical depths have been analysed by assimilation into a Mars global climate model (MGCM), making use of a sequential procedure known as the Analysis Correction (AC) scheme. This is a form of successive corrections method which was originally developed for Earth data assimilation at the Meteorological Office (Met-Office) in the UK (Lorenc et al., 1991).
Only a limited number of TES limb profiles are available , which are not used in the current assimilation. Our reanalysis of TES retrievals, therefore, does not include observations of temperature above about 40 km altitude.
TES retrievals of absorption-only column dust optical depth are in the infrared (wavelength around 1075 cm À1 , or 9.3 lm), whereas the GCM radiation scheme computes dust heating rates based on mean visible opacities (about 670 nm). To convert to equivalent visible values, infrared dust opacities from TES have been multiplied by a factor of 2.0. This factor includes the value for the conversion from absorption-only to full extinction (absorption and scattering), which Smith (2004) and Wolff (2006) indicate as roughly 1.3. Clancy et al. (1995Clancy et al. ( , 2003, Lemmon et al. (2004), andWolff (2006) provide values for the conversion factor from infrared extinction to visible optical depth, measured in several observational campaigns. For dust particle sizes in the range 1.5-2.0 lm, the average of these values is 2.5 AE 0.6, which has a large associated uncertainty. By choosing a single factor 2.0 to convert from infrared absorption to visible extinction optical depth, we might underestimate the mean visible opacities, but given the large uncertainties on particle sizes at different seasons and locations, this might not be the case at all times and places. Montabone et al. (2006a), for instance, showed that there are no significant differences in the results of the assimilation during the 2001 (MY 25) planet-encircling dust storm when using visible extinction/ infrared absorption factors between 1.5 and 2.5.
In the version of the reanalysis described in this article, we have not used the dust lifting, transport and sedimentation model available in the MGCM to carry out the complete assimilation of dust observations. We have just continuously updated the prescribed column-integrated dust optical depth field in the MGCM with increments from the analysis of total dust optical depth retrievals, when observations are available. When there are no dust observations available, the dust field is simply kept constant until new observations become available again. The vertical distribution of the dust is analytically prescribed in the model using the Conrath distribution (Conrath, 1975), see also details in Montabone et al. (2006a).
It is also worth mentioning that the MGCM used to produce the reanalysis described in this article does not include the microphysical modelling of carbon dioxide condensation, particularly under supersaturated conditions. Instead, this version of the MGCM uses a simple scheme for condensation and sublimation of carbon dioxide, based on not exceeding saturation . To avoid too much condensation as a result of supersaturation, and therefore too much seasonal and interannual variation in surface pressure with respect to observations by Viking landers, we have not assimilated TES temperature profiles that exhibit values below the carbon dioxide condensation temperature (see also Montabone et al., 2006a). The number of such profiles represents about 8.2% of the total number of available retrieved profiles (over 50 million).
The number of observations available to assimilate after the quality control procedure detailed in Montabone et al. (2006a) is shown in Figure 1 for temperature (day and night sides) and Figure 2 for total dust optical depth (only day side). We also provide these data as supporting information of the article (Data S1, file in NetCDF format, which has a self-descriptive header). There are gaps in the data coverage, particularly in the dust optical depth observations at polar latitudes during polar nights, where the thermal contrast between surface and atmosphere makes it difficult to retrieve this variable. When the gap in temperature observations is of the order of or longer than the Martian radiative time scale (1-2 sols on average), the state of the atmosphere is no longer constrained by observations, particularly during the 'dusty season' in the second half of each MY. Lack of coverage in dust optical depth observations is less critical, except for the column-integrated dust optical depth field we provide in the database, which is obviously affected. Users of the MGS/TES MACDA v1.0 database should, therefore, refer to Figures 1 and 2, and check the provided NetCDF file, to verify the observation coverage. This is particularly the case when dataset variables show sudden changes, which might originate from transitions to free-running GCM states. Data gaps in the NetCDF file provided as supporting information are clearly identified as zeros (white) in the number of available observations.
The MGCM used to produce the dataset described in this article is an earlier version of the spectral GCM  Lewis and Read (1995) and Lewis et al. (1996Lewis et al. ( , 1997 first tested the implementation of the AC data assimilation scheme in the MGCM. It has since been adapted to assimilate TES retrievals using observations made during the less-than-ideal MGS aerobraking period between September 1997 and January 1998 (Lewis et al., 2007). The reanalysis we present here is based on the assimilation of TES retrievals using observations made during the subsequent MGS science mapping phase. Montabone et al. (2006a) describes both the assimilation procedure and the validation of the mapping phase reanalysis. One main difference between the reanalysis dataset described in this article and the one described in Montabone et al. (2006a) is that TES retrievals have since been revised. The revision has been characterized by four basic improvements: (1) surface temperature has been retrieved simultaneously with aerosol optical depth, (2) the model for the spectral dependence of dust and ice absorption has been updated, (3) the absorption from minor 'hot bands' of carbon dioxide has been treated by reading from a map instead of attempting their retrieval from each individual spectrum and (4) water ice has been restricted to form above the water condensation level instead of assuming a well-mixed profile. In relation to point (4), there is no impact of this change on the temperature retrievals, but there is some potential impact (albeit small) on the dust retrievals, because dust and ice optical depths are retrieved simultaneously.

Dataset Description
The MGS/TES reanalysis version-1.0 is available from 141°solar longitude 3 in MY 24 through 86°solar longitude in MY 27.
The reanalysis dataset is divided into 63 data files, each one including data for 30 Martian sols. All 30-sol periods are consecutive, with no interruption. With the assistance of the BADC, the data files have been made available in CF-NetCDF format, 4 where the metadata used conform to the international 'Climate and Forecast' (CF) standard. The advantage of producing standard-compliant data files is that it promotes easy access using several types of software, data reuse, compatibility, and cross-disciplinarity. Only two variables included in the database and specifically related to the Martian calendar are not (yet) standard CF variable names. These are the 'Martian year' and the 'sol' (or Martian mean solar day).
The name of each file includes the approximated (integer) solar longitude and MY of the first and last available sols within the file. The format for the file names is as follows: mgs-tes-reanalysis mars MY Ã Ls Ã MY Ã Ls Ã v1À0:nc; where the asterisks correspond to the values of MY and solar longitude of the first and last sols. Each Net-CDF file contains the same header with detailed information about the variable dimensions, a short description of all the variables that are present in the file (including units and CF standard names), and general information about the dataset (i.e. global attributes of the NetCDF file). The 63 data files, each about 295 MB in size, have been added to the BADC archive, where they are freely available for download following the procedure explained in Section 5. The total size of the uncompressed dataset is about 18.6 GB.
We briefly describe here the variables included in the dataset, and provide information that we consider useful for potential users.

Dimensions
Each NetCDF file includes variables which can depend on up to three spatial dimensions and one temporal dimension (see Table 1). Dimensions are integers with no units.
There are 6 one-dimensional variables (longitude, latitude, model sigma level, sol, solar longitude and MY), 4 three-dimensional variables (amount of deposited carbon dioxide ice, surface pressure, surface temperature, and total dust optical depth), and 3 fourdimensional variables (atmospheric temperature, zonal and meridional wind components). This gives a total of 13 variables.

One-dimensional spatial variables
The three spatial variables are reported in Table 2.
Longitude and latitude values are provided with 5°s pacing. All variables that depend on the longitude and latitude dimensions are, therefore, provided on a 5°9 5°horizontal grid. Given Mars' mean radius (3389 km), this corresponds to 296 km resolution at the Equator.
The vertical grid is determined by the model sigma levels, which are non-dimensional terrain-following levels, with values between 1 at the ground and 0 at infinite distance from the ground. The sigma value at a particular model level is defined as the ratio between the atmospheric pressure at that level and the surface pressure, for each horizontal grid point. The atmospheric pressure at each model level and grid point can, therefore, be calculated by using the formula p(i, j, k) = p surf (i, j)Álev(k), where i, j, k are indices of longitude, latitude, and level, p is the atmospheric pressure, p surf is the surface pressure value and lev is the sigma value. One can also associate a pseudo-altitude above the local surface to each model level, using the formula z p = ÀH ln(lev(k)), where z p is the pseudo-altitude value, and H is the Martian scale height (about 10 km).
The vertical levels are not evenly spaced. They are denser closer to the ground and more widely spaced when they are closer to the top of the model. The first (lowermost) level has a pseudo-altitude of about 5 m; the last (uppermost) level has a pseudo-altitude of about 98 km. On average, they correspond to pressures ranging between 610 Pa and 0.034 Pa. The last three levels are also used as 'sponge levels' in the MGCM, to inhibit the reflection of vertically propagating waves (see also Forget et al., 1999).
The pseudo-altitude value is only a rough approximation to the real altitude. To calculate the precise altitude of a particular model level at a required grid point and time, the user needs to integrate the hydrostatic equation using the appropriate atmospheric temperature profile for that grid point and time.

One-dimensional temporal variables
The three temporal variables are reported in Table 3.
Our main continuous time variable in the dataset is the Martian mean solar day (sol), which does not reset to zero at the beginning of a new year. The solar longitude value, instead, resets to zero each time Mars crosses the position of the northern hemisphere spring equinox, thus defining the beginning of a new year.
The integer part of each time value defines the sol, and the decimal part defines the fraction of the sol, from which one can calculate the corresponding Mars Universal Time (MUT), i.e. the local time at the Prime Meridian. The time origin in the dataset (sol = 0.0) It is important to remark here that it is only our convention to start the GCM with assimilation at midnight MUT at L s = 0°in MY 24. The astronomical MY 24 northern spring equinox did not precisely occur when it was midnight MUT. There is, therefore, a constant bias of about 6 h between the model solar longitudes reported in the dataset and the astronomical solar longitudes. 5 The bias is present to ensure that all observations are assimilated in the model using both their precise local time and solar declination, which are important parameters to calculate heating rates in the GCM, and have the correct relative time difference. The only way to remove the small offset in L s , while retaining these much more important features of the assimilation, would be to introduce a more complex and complete ephemeris to the GCM. If one requires more precise values of L s than those reported in the dataset, a good approximation consists in subtracting a constant offset of 0.12°. Sols are divided into 24 Martian hours, and output fields are provided every 2 h (a period deemed to be useful for capturing large-scale waves and tides in Martian climate data without generating excessively large files, Lewis et al., 1999), beginning at 2 AM MUT on sol 301 (first available sol in the dataset, corresponding to L s = 141.5°i n MY 24) and ending at midnight MUT on sol 2190 (last available sol in the dataset, corresponding to L s = 86.3°in MY 27. Note that TES retrievals practically end after L s = 82.5°in MY 27). 6 Because the variable 'Sol' is not yet included in the standard CF variable list, there is no corresponding unit related to a standard Martian calendar. In the dataset, we use the standard time unit for the Earth calendar referred to a reference date of 0000-00-0 00:00:00, mainly for ease of use of software that automatically recognizes time variable units. The values of the time variable, though, are intended as 'sols since 0.0' where 0.0 is the time reference in the dataset as mentioned above. The scientific community primarily uses a combination of MY and solar longitude when referring to the Martian calendar. This combination, though, is not satisfactory in a reanalysis dataset, because of the requirement to keep track of the fraction of a sol at the beginning of each new year, when the solar longitude resets to zero at sol 668.6. There is, therefore, the need to standardize a Martian calendar based on MYs, months and sols, which can be used for precise temporal determination in reanalyses datasets. One possible choice could be to divide the Mars orbit into 12 or 24 months of approximately 30°or 15°solar longitude each, and have leap years to accommodate the fraction of the sol left at the end of each year. We note that, because of the eccentricity of Mars' orbit, 7 the relationship between solar longitude and sol number is not linear throughout a year, i.e. 1°of solar longitude corresponds to 2.04 sols at spring equinox, 2.15 sols at summer solstice, 1.66 sols at autumn equinox, and 1.58 sols at winter solstice. The number of sols in each month of such a Martian calendar would, therefore, need to change accordingly. Lewis et al. (1999) have adopted the convention to divide the MY into 12 months ('seasons') of 30°solar longitude, but they have approximated the number of sols in a year to 669. This convention simplifies the Martian calendar, but is not ideal to keep track of the fractions of sol at the end of each year in multiannual reanalyses.
At the time of writing, the CF community is engaged in discussions to standardize the Martian calendar and introduce the variable 'Sol' in the CF standard variable list. The 'Martian year' is not a standard CF variable either, and could be simply calculated from the time and solar longitude variables. We decided to explicitly include it for ease of use of the dataset. Future releases of the MACDA reanalysis may be able to be fully CF-compliant in relation to the Martian calendar.

Three-dimensional variables
The four variables that provide time-evolving surface data are described in Table 4. Note that variable dimensions in a NetCDF file are usually reported in reverse order, e.g. when interrogating the file with the command ncdump. The proper order of the dimensions in every dataset array is the one indicated in Tables 4 and 5.
The 'surface carbon dioxide ice' variable refers to the mass per square metre of carbon dioxide depos- Surface pressure and surface temperature reflect the seasonal condensation and sublimation of carbon dioxide in the Martian atmosphere. In particular, in the presence of carbon dioxide ice on the ground, the surface temperature assumes the value equal to the carbon dioxide condensation temperature at the corresponding surface pressure. The surface pressure at each grid point shows a seasonal behaviour compatible with the cycle of condensation and sublimation of the carbon dioxide, which is the main component (about 95%) of the Martian atmosphere.
The 'total dust optical depth' variable is related to how much radiation (at average visible wavelengths of about 670 nm) would be removed from a beam during its path through the entire atmosphere by absorption and scattering due to airborne mineral dust. The total dust optical depth at each horizontal grid point in the dataset can be referred to a specific pressure level (e.g. 610 Pa) by dividing its value by the surface pressure and multiplying by the value of the reference pressure (assuming that dust is well mixed and the dust properties are consistent throughout). This interpolation/extrapolation eliminates the effects of topographical inhomogeneity.
It should be noted that surface temperature, surface pressure, surface carbon dioxide amount, as well as the horizontal wind components, although not directly modified by observations, are indirectly adjusted by the data assimilation procedure, which dynamically modifies these fields according to the analysis of atmospheric temperature and the update of the model column-integrated dust optical depth. Because we have not used the dust lifting, transport, and sedimentation model available in the MGCM to carry out the complete assimilation of dust observations, the 'total dust optical depth' variable should not be considered as a fully analysed variable. We include it in the dataset to specify the dust distribution which has been used in the MGCM. We provide the total dust optical depth variable also at those times and locations where there is no information from TES total dust optical depth observations (see Figure 2). These values correspond to the observed values retained after the last available assimilation update.

Four-dimensional variables
The three variables that provide time-evolving atmospheric data are described in Table 5.
We remind users that atmospheric temperatures are directly modified by TES observations only below about 40 km altitude. Atmospheric temperatures in the boundary layer (where TES vertical resolution does not allow accurate retrievals) and above about 40 km altitude are indirectly modified by dynamical adjustment in the MGCM during the analysis procedure.
The zonal wind component is positive for eastward winds (westerlies), and the meridional wind component is positive for northward winds (southerlies).

Dataset Visualization
We have developed a web visualization tool (the 'MAC-DA Plotter'), coded in Python, that allows exploration of the variables included in the MACDA dataset. Figure 3 shows the graphical user interface which is displayed when the web interface (version 1.0) is opened, as explained in Section 5.
This version of the tool does not perform interpolation in time or space. The variables included in the dataset can only be displayed as maps at the times and sigma levels provided, which coincide with those described in Section 3. When a specific value of solar

Access to the Dataset, the Documentation and the Visualization Tool
The MACDA reanalysis dataset for MGS/TES v1.0 is archived at the British Atmospheric Data Centre (BADC, http://badc.nerc.ac.uk), Harwell Campus, Didcot (UK). This data centre is based in the Centre for Environmental Data Archival (CEDA) group.
The CF-NetCDF data files are freely available to all registered BADC users, as this allows the BADC to monitor the use of these data.
The registration to obtain a BADC username and password can be requested at http://badc.nerc.ac.uk/reg/user_register_info.html.
BADC MACDA webpage. This webpage can be accessed at http://dx.doi.org/10.5285/78114093-E2BD-4601-8AE5-3551E62AEF2B (or, alternatively, at the shortened URL: http://bit.ly/165Ulxd). It provides information about the dataset and its use (including copyright and disclaimer issues), references and documentation, as well as the hyperlinks to the data file archive, the visualization tool, and other related items (scroll down the webpage to the section 'Online References' to find the hyperlinks). BADC MACDA Archive. Click on the 'MACDA: MGS/TES v1.0 data directory' hyperlink in the 'Online References' section on the webpage to access the archive and download the data files. BADC username and password will be required at this stage. The user can download a single data file or multiple files. The 'Download multiple files' box at the top right of the screen allows the user to select a set of files for downloading as a single gzipped tar file, but there is a limitation to 1 GB at a time. To download multiple files it may be easier to directly use the CEDA FTP service. If you want to use this option, connect to ftp://ftp.ceda.ac.uk/badc/mgs/data/macda/v1-0/ 8 Do not tick the option 'Prevent this page from creating additional dialogues' if your browser makes it appear in a dialogue box, otherwise the help message is disabled, but no access is allowed anyway until all mistakes are corrected. See Figure 4 for a typical example of plot (surface pressure at the beginning of the dataset). using your BADC username and password. There is no size limitation via the ftp server. MACDA Visualisation tool. Click on the 'Data Visualisation' hyperlink in the 'Online References' section to access the web visualization tool, which does not require BADC username and password. The MACDA Plotter can also be directly accessed at http://macdap.physics.ox.ac.uk.
For any data access problems, please contact the BADC helpdesk directly (http://badc.nerc.ac.uk/help/ contact.html) or send an email to the corresponding author of this paper.
Scientific use of the data included in the MACDA dataset is freely allowed provided that the origin of the data is appropriately acknowledged in any publications. The correct reference to the dataset is provided at the beginning of this article, in the Section 'Dataset', or at the end in the Section 'References' (Montabone et al., 2011). The authors provide no warranties regarding the reliability, validity or accuracy of the data, and bear no responsibility for any use made of such data.

Future versions
Updated versions of the dataset will become available in future. People interested in the most recent version are encouraged to contact the corresponding author of this article to ascertain the status of the work in progress.
Future improvements of the MACDA reanalysis will include the following: 1. Update of the MGCM to the latest available version (Forget et al., 2011). 2. Full assimilation of dust observations using the lifting, transport and sedimentation model for dust particles available in the MGCM. 3. Parameterization of the carbon dioxide condensation under supersaturation conditions, to allow the assimilation of supersaturated temperature profiles. 4. Extension of the reanalysis period to cover observations from the Mars Reconnaissance Orbiter/Mars Climate Sounder radiometer (2006 to date, i.e. from late northern summer of MY 28 to beyond MY 31). 5. Inclusion of other available observations, particularly related to dust opacity. 6. Possible release of higher order diagnostic variables, such as vorticity and stream function. 7. Release of a 'control simulation', i.e. a GCM simulation that does not assimilate temperature observations. Such simulation already exists for the MACDA v1.0, but it is not currently made publicly available. Interested people can nevertheless contact the corresponding author to request access to it.