Air temperature at core phenology sites and additional bird monitoring sites in the Andrews Experimental Forest, 2009 to present

CREATOR(S): Mark D Schulze, Matthew G Betts, Sarah J. K. Frey
PRINCIPAL INVESTIGATOR(S): Matthew G Betts, Mark D Schulze
ORIGINATOR(S): Sarah J. K. Frey
OTHER RESEARCHER(S): Adam M Kennedy, Sherri L. Johnson, Nina C. Ferrari
3 Jun 2016
24 Apr 2023
phenology, microclimate, air temperature
These air temperature data were collected to support phenological research focused on increasing our understanding of plant, insect and bird responses to climates across the Andrews Experimental Forest. Species, taxa and communities within and across trophic levels are likely responding differently to climatic drivers.
Experimental Design - MS045:

Sixteen core phenology sites were selected based on locations of long term air temperature and vegetation data at reference stands. Distribution of study sites was augmented by adding a few additional sites. At these Core sites, air temperature was measured year round and plant phenology, insect and bird abundances was measured during springtime from 2009-2014.

Additional sensors were placed at 40 Core Bird sites and 124 Auxillary Bird sites. These locations were stratified across elevation, forest type, and distance to roads to insure that the full environmental gradient was sampled, with a minimum distance between sampling points of 300m.

Field Methods - MS045:
Description: At all the phenology air temperature sites, an Onset brand temperature sensor was placed at 1.5m above ground and shielded from direct sun. Shields were 8 inches long made from 3.5 inch diameter schedule 40 PVC pipe split in half lengthwise.
Instrumentation: At the 16 Phenology Core sites, Onset Hobo U22-001 (accuracy 0.2 C) were programmed to record instantaneous air temperatures at 15 min intervals. At the 40 Core Bird sites and the 128 Auxillary Bird sites, Onset Hobo Pendant temperature and light sensors UA002-64 (accuracy 0.5 C) recorded instantaneous data at 20min intervals.
Statistics - MS045:

Evaluation of the high resolution temperatures (15 and 20 minute intervals) was conducted and questionable data flagged. Before calculation of hourly averages, flagged data were removed and missing values calculated using regression relationships with other temperature data from Andrews Forest for that period.

Script name: (used to create entities 2 and 4)

This script was designed to flag, clean, average by hour, and fill data from air temperature sensors deployed at the HJ Andrews.

A revised Python program was used to flag the raw data in entity 5. This program also cleans and fills, but these data are not provided. A researcher can download the Python program from Bitbucket, the raw data from entity 5, and run the program. There is also a visualization program.

Datasets: The script creates folders: flagged, cleaned, reference, and filled. These contain:

  • flagged original data with flags
  • cleaned data with the flags removed
  • reference cleaned data averaged into hourly timesteps
  • filled cleaned data which has been filled using other files in the reference folder based on their respective regressions
  • summary daily values with min, max and mean

Requires: Python, SciPy and NumPy

Note: This program was written for data loggers started during June (daylight savings time). Since ONSET uses the computer clock for time stamps, raw times are in PDT not PST. Hourly averaging changes times to PST and matches reference file format (where the hour represents the average of temperatures in the preceding hour). If original logger start dates are not in PDT, there is a line of code (currently 313) which can be turned on.

Settings: This may be run for a file or folder. The file or folder should be found in the same directory as the script. Input the file or folder name (e.g. INPUTFOLDER="Folder") and comment out the unused line (e.g. #INPUTFILE). Input files should contain the site name as this name is retained through processing. Date limits should be specified under the '#Date limits' heading. These form the bookends in which the program will attempt to fill gaps using the available reference files. Reference files are stored together in a folder (e.g. REFERENCE_DIR = "RS data for PC sites") and are labeled with *reformatted* to distinguish them from reference files which have not been modified to match the required input format. All sites are added as they are run, so this should be run twice if they are not already included in the reference folder and you want them as reference files for sites run in the same batch. Site files will not be used if they have *cleaned*, *filled* or *flagged* in their file name to avoid using processed data. Reference files must be in the correct format and include *reformatted* in their filename or they will not be used. These labels are consistent with the output files from this script as well as the script to convert reference data downloaded from the Andrews website (

Description: This script serves to flag, prune, average (by hour), and fill air temperature data as detailed below. This script was applied to the raw data that created entities 2 and 4

Step 1: Flagging (Original time steps); Output file – (input file name)_flagged_00-0000.csv, where 00-0000 is the month-year of the last data point

Flagging identifies for each line (date/time) entry:

nodata - Date/time recorded on the logger but contains no data. Interval does not equal 15 or 20 minutes – The time between samples was not as programmed extreme – Any temperature exceeding 20 deg C or less than -20 deg C (outside of the sensor range) jump – If the change in temperature exceeds 5 deg C in one time interval air – A forward rolling window, flags when the variation in temperatures within the 24hr period is greater than the TVAR_MAX (specified at the opening of the code, currently 1.5 deg C and the temperatures are below TMIN (currently 0.2 deg C) air_past – Looks back and flags if a 'Snow' flag is present in the past 24 hours

Step 2: Pruning (Original time step); Output file – (input file name)_cleaned_00-000.csv

Pruning removes lines containing extreme, air_past, air, jump and nodata.

Step 3: Averaging (Hourly time step); Output file – (input file name)_00-0000_reformatted.csv

Averaging uses only values remaining after pruning. The number of values used to calculate the average is included as a new column. Notes: The command for saving this output file includes the path for the reference folder, if that folder is changed, it should also be changed in this section (or a new folder will be made with the files but they will not be used for filling). Averaging follows the convention used for Andrews weather stations where the hour represents the average of temperatures in the preceding hour. The output is in PST (while all previous outputs are in PDT, matching the raw input).

Step 4: Filling (Hourly time step); Output file – (input file name)_filled_00-000.csv

The script uses cleaned data and compares remaining entries to reference files (see Settings). This is done as a linear regression of the cleaned data with each reference file, the output includes the R2 which can be found in the text file corresponding to the input file name. Prior to filling, the script creates placeholder hours bounded by the date range specified under “Date limits” which is the range in which filling is attempted. These values are set at 1000 degrees. The script aims to fill missing (1000 degree) data by moving sequentially through the reference data in order of fit (R2). The linear regression equation is used to modify the reference value for that data point and it replaces the 1000 degree placeholder. The reference file used in the temperature value filling is listed in a neighboring column. If all reference files are examined and no data is found to replace the missing value placeholder, the placeholder is retained, thus 1000 degrees should be treated as "no data".

Step 5: Max, min, mean (Daily time step); Output file – (input file name)_daily_00-0000.csv

The script ignores 1000 degree data and calculates daily max, min and mean temperature values from the filled dataset. The number of records (hours) used in the calculation is listed in the column 'count.'

Figures: Data points over time with flagged (and cleaned) data shown in red. Located within the "flagged" folder as .pdfs.

Quality Assurance - MS045:

After sensors were downloaded, high resolution data were put through a series of programs for quality control and for filling missing values before generating the hourly averages (entities 2 and 4). A major concern of quality control was to detect when the sensors were buried by snow because temperatures would be representative of the snowpack and not the air temperature. When burial was detected, data were filled using the regression relationships with other sensors. Regressions were calculated using the best fit with other sensors during periods of time when the full data were available.

Note: Spikes in data associated with direct light impacting the sensors were not evaluated as part of the QC programs.

Note: For Phenology Core sites additional manual QA/QC was conducted to evaluate and correct snow flags and temperature spikes, resulting in data filling or reverting to original data, as needed.

For these entities (1-4), quality assurance and quality control was conducted on all temperature data collected. All data were averaged into hourly segments and run through a Python script to identify and flag impossible values, periods of missing data, and when sensors were buried by snow. Data were further checked via manual QAQC and values were compared to those from nearby temperature stations to identify any erroneous snow flags (i.e., data flagged as snow burial when there was no snow at that site), as well as temperature spikes, missing data, and other questionable values not identified by automated QAQC.

Raw data for entity 5 was flagged and not filled using the hja_hobo_clean Python programs. These data were checked for burial by snow, detection of extreme values and jumps, as well as the influence of high/extreme light intensity.

Raw data for entity 6 was flagged and not filled using the GCE Toolbox workflow and based on the flagging algorithms in the hja_hobo_clean workflow. Like entity 5, these data were checked for burial by snow, detection of extreme values and jumps, as well as the influence of high/extreme light intensity. Rather than having multiple data columns for each flag, as in entity 5, these data use an aggregated flagging system for each data variable (temperature and light).

Temperature data at bird monitoring sites used in publication (entities 3 and 4; 2009-2014) are available from PASTA:
HJ Andrews Phenology sites
HJ Andrews Experimental Forest