The Madrigal data model and metadata files | Doc home | Madrigal home |
Understanding the Madrigal data model is an important step in understanding how Madrigal works. There is a correspondence between each level of the data model and the metadata files that are found in the MADROOT/metadata directory. In this section we describe each level of the Madrigal data model and the corresponding metadata file.
The highest level of Madrigal is a Madrigal site. A Madrigal site is one particular web site controlled by one particular group, that holds all their own data. While each Madrigal site stores their own data locally, they also share metadata with all the other sites. This makes it possible for users to search for data at all the Madrigal sites at once no matter which site they visit, and simply follow links to the Madrigal site that has the data they are interested in.
Metadata about all sites is stored in MADROOT/metadata/siteTab.txt. When new Madrigal sites are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:
The next layer of the Madrigal data model is the instrument. All data in Madrigal is associated with one and only one instrument. Any given Madrigal site will hold data from one or more instruments. Since Madrigal focuses on ground-based instruments, most instruments have a particular location associated with them. However, some Madrigal data is based on measurements from multiple instruments, and so have no particular location. Some examples are "EISCAT Scientific Association IS Radars" which combine data from the multiple EISCAT radars, and "World-wide GPS Receiver Network", which consists of over a thousand individual GPS receivers distributed around the globe.
Metadata about all instruments is stored in MADROOT/metadata/instTab.txt. When new Madrigal instruments are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This instrument code list is usually consistent with the Cedar instrument list. This file contains the following comma-separated fields:
The instrument type table lists categories of instruments, to allow the user to search instruments more easily. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:
All the data from a given instrument is organized into experiments. An experiment consists of data from a single instrument covering a limited period of time, and, as a rule, is meant to address a particular scientific goal. Madrigal makes the assumption that instruments may be run in different modes, and so the data generated may vary from one experiment to another. By organizing one instrument's data into experiments, the purpose and limitations of each experiment can be made clearer. As a Madrigal administrator, you can also provide users with supplemental plots and documentation about that experiment, in addition to the standard Madrigal data files. See the section on creating and updating Madrigal experiments for more information.
Madrigal has a number of security codes for different types of experiments. Most experiments are public, which allows them to be accessed by everyone. An experiment can also be made private, which means that only users with set ranges of ip addresses can access them. (See Set private versus public access.) Private experiments are never shared with other Madrigal sites. There is also a hidden experiment state to completely remove an experiment from access by Madrigal (if, for example, the data is discovered to be corrupt, but might be fixed in the future).
There are also two experiment states for archiving experiments. These are public archive and private archive. These states are meant to support the archiving of Madrigal data at a central Madrigal site, such as the one at NCAR. These archived experiments are duplicates of experiments found at other Madrigal sites. In general these archived experiments are ignored by any part of the user interface that searches all sites, because the user will only want to find the main data source. However, when the user interface is only accessing local data, these archived experiments will appear. A private archived experiment is subject to the same restriction as a regular private experiment.
Metadata about all experiments is stored in MADROOT/metadata/expTab.txt. This file is automatically generated from individual expTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:
There is also a file called MADROOT/metadata/expTabAll.txt which is also automatically generated. If differs from expTab.txt in that it contains experiment metadata from all Madrigal sites, not just the local one. Any remote experiment with a non-zero security code will be excluded.
The data from a given experiment is stored in one or more experiment files. There are a number of reasons there may be more than one file for a given experiment. The first is that different kinds of data may be stored in different files. Also, the experimental data may be analyzed in more than one way, leading to files with different sets of measured parameters. For these two cases, each file should have its own kindat code (see below). Another reason for multiple files is that older, historical files can be kept on-line for reference purposes.
With Madrigal 3, the format of these files is Hdf5 as defined by the CEDAR Madrigal Hdf5 file format. Each file may contain only one kindat. The category field is used to distinguish files which are of historical interest only, e.g. a file which have been superseded by a file with an improved electron density calibration. In some cases there may be more than one up-to-date variant of a file, e.g. when different analysis options have been chosen. In this case one of these files is designated the default, and the others are designated as variants.
Metadata about all experiment files is stored in MADROOT/metadata/fileTab.txt. This file is automatically generated from individual fileTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:
There is also a file called MADROOT/metadata/fileTabAll.txt which is also automatically generated. If differs from fileTab.txt in that it contains experiment file metadata from all Madrigal sites, not just the local one.
Any given file is made up a series of records holding measured parameters. Note that based on which parameters are in the file, Madrigal will automatically derive a large number of other parameters such as Kp and Magnetic field strength that aren't in the file itself. In the web browser, measured parameters are shown in bold, derived parameters in normal font.
The metadata file parmCodes.txt contains information about what Madrigal or Cedar parameters are supported. It replaces the old format file parmCodes.txt used in Madrigal 2 because that old file limited the size of id codes with its column delimited format. . If a parameter has a parameter code of 0, it cannot be stored in a Cedar file and is meant the be a derived value only. All Madrigal mnemonics must be unique. All non-zero parameter codes must be unique. If a new parameter is desired, it should be done in coordination with Bill Rideout at brideout@haystack.mit.edu.
The file parmCodes.txt is comma delimited.
Parameter explanations
For parameters that cannot be fully described in the simple string in the parmCodes.txt file, additional explanation about the parameter or its corresponding error parameter can be added to the file madroot/doc/parmDesc.html. Simply create a new named anchor in that file, where the anchor name is the parameter mnemonic in all capitals. Following that, a description of arbitrary length can be given using html. Change one of the last two columns from 0 to 1 for that parameter in parmCodes.txt to let Madrigal know that this explanation exists. In general, the parameter order in parmDesc.html matches that of parmCodes.txt, but that is not a functional requirement.
The Madrigal category metadata file(madCatTab.txt) contains information about what categories Madrigal parameters belong in. The categories are similar to the Cedar categories, but do not follow them exactly. This file does not change. This file contains the following comma-separated fields:
The Madrigal data type (also called kind of data or kindat) metadata file(typeTab.txt) contains a list of all data types in the database. The purpose of the kindat is to uniquely identify the data processing algorithm used to compute the parameters in the associated Madrigal file. Each individual Madrigal site is free to update this table, as long as there are no duplicate kindats in your own file. It is best to simply keep the kindat codes in ascending order to aviod any chance of a duplicate. The kindat description must not contain a comma.
The instrument parameter metadata file (instParmTab.txt) contains information about what measured parameters are found in the data for any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:
The instrument kindat metadata file (instKindatTab.txt) contains information about what kindat codes are used with any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:
The instrument data metadata files (instData.txt and instDataPriv.txt) contain information about for what years and instruments each Madrigal site has data. Effectively these file are summaries of data in expTabAll.txt, and only exist to speed UI performance. Note that there will be only one line per kinst value. If muliple sites have data for the same instrument, the rules as to which siteID is used are:
The two files instData.txt and instDataPriv.txt diff only in that private data is included in instDataPriv.txt, whereas public data isd included in both.
This data is used to enhace performance of various web pages, and is rebuild by updateMaster.
This file contains the following comma-separated fields:
The bottom level of the Madrigal data model is of course the data itself. With Madrigal 3.0 and beyond, the actual file format is Hdf5 that follows the rules defined in the CEDAR Madrigal Hdf5 format document. A Madrigal file is made up of a series of records, each with a start and stop time, representing the integration period of measurement (Madrigal tries to enforce the idea that all measurements take a finite time, but sometimes the start time = the stop time).
Each Cedar parameter can also have an associated error value. This error value can have the special values "missing", "assumed", or "known bad". If an error parameter is "assumed", the implication is that the measured value itself is assumed, and does not represent a measured value. If the error value is "known bad", the measured data is known to have a problem.
The Madrigal data model and metadata files | Doc home | Madrigal home |