Six streams phenology sites were selected to evaluate the variability of springtime aquatic insect emergence and phenology (SA025) across the Andrews Forest. Stream temperature sensors were installed in these six streams and temperatures measured year round so that calculations of themal accumulation and degree days could occur.
At low elevation, a gaged stream through old growth forest and a stream through a young forest were selected. At high elevation, a similar pair of gaged streams were selected. Prior information on insect community composition and seasonal emergence (SA022) was used to select these sites.
Two additional sites were selected; one was a cold water stream whose hydrology and thermal regime is greatly impacted by a headwater spring. The final site was located near the cold water spring and in a stream that goes dry in the later summer.
Evaluation of the high resolution temperatures (15 and 20 minute intervals) were conducted and questionable data flagged. Before calculation of hourly averages, flagged data were removed and missing values calculated using regression relationships with other temperature data from Andrews Forest for that period.
Script name: WaTeR.py
This script was designed to flag, clean, average by hour, and fill data from air temperature sensors deployed at the HJ Andrews.
Datasets: The script currently creates folders: flagged, cleaned, reference, and filled. These contain:
Requires: Python, SciPy and NumPy
Note: This program was written for data loggers started during June (daylight savings time). Since ONSET uses the computer clock for time stamps, raw times are in PDT not PST. Hourly averaging changes times to PST and matches reference file format (where the hour represents the average of temperatures in the preceding hour). If original logger start dates are not in PDT, there is a line of code (currently 313) which can be turned on.
Settings: This may be run for a file or folder. The file or folder should be found in the same directory as the script. Input the file or folder name (e.g. INPUTFOLDER="Folder") and comment out the unused line (e.g. #INPUTFILE). Input files should contain the site name as this name is retained through processing. Date limits should be specified under the '#Date limits' heading. These form the bookends in which the program will attempt to fill gaps using the available reference files. Reference files are stored together in a folder (e.g. REFERENCE_DIR = "RS data for PC sites") and are labeled with *reformatted* to distinguish them from reference files which have not been modified to match the required input format. All sites are added as they are run, so this should be run twice if they are not already included in the reference folder and you want them as reference files for sites run in the same batch. Site files will not be used if they have *cleaned*, *filled* or *flagged* in their file name to avoid using processed data. Reference files must be in the correct format and include *reformatted* in their filename or they will not be used. These labels are consistent with the output files from this script as well as the script to convert reference data downloaded from the Andrews website (convert_reference_data.py).
Description: This script serves to flag, prune, average (by hour), and fill air temperature data as detailed below.
Step 1: Flagging (Original time steps); Output file – (input file name)_flagged_00-0000.csv, where 00-0000 is the month-year of the last data point
Flagging identifies for each line (date/time) entry:
Step 2: Pruning (Original time step); Output file – (input file name)_cleaned_00-000.csv
Pruning removes lines containing extreme, air_past, air, jump and nodata.
Step 3: Averaging (Hourly time step); Output file – (input file name)_00-0000_reformatted.csv
Averaging uses only values remaining after pruning. The number of values used to calculate the average is included as a new column. Notes: The command for saving this output file includes the path for the reference folder, if that folder is changed, it should also be changed in this section (or a new folder will be made with the files but they will not be used for filling). Averaging follows the convention used for Andrews weather stations where the hour represents the average of temperatures in the preceding hour. The output is in PST (while all previous outputs are in PDT, matching the raw input).
Step 4: Filling (Hourly time step); Output file – (input file name)_filled_00-000.csv
The script uses cleaned data and compares remaining entries to reference files (see Settings). This is done as a linear regression of the cleaned data with each reference file, the output includes the R2 which can be found in the text file corresponding to the input file name. Prior to filling, the script creates placeholder hours bounded by the date range specified under “Date limits” which is the range in which filling is attempted. These values are set at 1000 degrees. The script aims to fill missing (1000 degree) data by moving sequentially through the reference data in order of fit (R2). The linear regression equation is used to modify the reference value for that data point and it replaces the 1000 degree placeholder. The reference file used in the temperature value filling is listed in a neighboring column. If all reference files are examined and no data is found to replace the missing value placeholder, the placeholder is retained, thus 1000 degrees should be treated as "no data".
Step 5: Max, min, mean (Daily time step); Output file – (input file name)_daily_00-0000.csv
The script ignores 1000 degree data and calculates daily max, min and mean temperature values from the filled dataset. The number of records (hours) used in the calculation is listed in the column 'count.'
Figures: Data points over time with flagged (and cleaned) data shown in red. Located within the "flagged" folder as .pdfs.
After sensors were downloaded, high resolution data were put through a series of programs for quality control and for filling missing values before generating the hourly averages. A major concern of quality control was to detect when the sensors were not in the stream and recording air temperatures. When data anomalies or gaps in data were found, data were filled using the regression relationships with other sensors from this project and from stream temperature data from stream gages. Regressions were calculated using the best fit with other sensors during periods of time when the full data were available.
After the averaging and filling, data were also visually evaluated for outliers. If outliers were found, estimates were calculated and inserted.