Prerequisite for Imputing Non-detects among Airborne Samples in OSHA's IMIS Databank: Prediction of Sample's Volume
This research aimed to develop a predictive model for the volume of air sampled in non-detected measurements in OSHA's IMIS databank. The results showed that the sampled air volume was right-skewed and varied based on substance, industry, and year. The developed model can help estimate the limit of detection for non-detected measurements, but separate models should be developed for each substance, industry, and year.
College of Health researcher(s)
Abstract
Introduction
The US Integrated Management Information System (IMIS) contains workplace measurements collected by Occupational Safety and Health Administration (OSHA) inspectors. Its use for research is limited by the lack of record of a value for the limit of detection (LOD) associated with non-detected measurements, which should be used to set censoring point in statistical analysis. We aimed to remedy this by developing a predictive model of the volume of air sampled (V) for the non-detected results of airborne measurements, to then estimate the LOD using the instrument detection limit (IDL), as IDL/V.
Methods
We obtained the Chemical Exposure Health Data from OSHA’s central laboratory in Salt Lake City that partially overlaps IMIS and contains information on V. We used classification and regression trees (CART) to develop a predictive model of V for all measurements where the two datasets overlapped. The analysis was restricted to 69 chemical agents with at least 100 non-detected measurements, and calculated sampling air flow rates consistent with workplace measurement practices; undefined types of inspections were excluded, leaving 412,201/413,515 records. CART models were fitted on randomly selected 70% of the data using 10-fold cross-validation and validated on the remaining data. A separate CART model was fitted to styrene data.
Results
Sampled air volume had a right-skewed distribution with a mean of 357 l, a median (M) of 318, and ranged from 0.040 to 1868 l. There were 173,131 measurements described as non-detects (42% of the data). For the non-detects, the V tended to be greater (M = 378 l) than measurements characterized as either ‘short-term’ (M = 218 l) or ‘long-term’ (M = 297 l). The CART models were complex and not easy to interpret, but substance, industry, and year were among the top three most important classifiers. They predicted V well overall (Pearson correlation (r) = 0.73, P < 0.0001; Lin’s concordance correlation (rc) = 0.69) and among records captured as non-detects in IMIS (r = 0.66, P < 0.0001l; rc = 0.60). For styrene, CART built on measurements for all agents predicted V among 569 non-detects poorly (r = 0.15; rc = 0.04), but styrene-specific CART predicted it well (r = 0.87, P < 0.0001; rc = 0.86).
Discussion
Among the limitations of our work is the fact that samples may have been collected on different workers and processes within each inspection, each with its own V. Furthermore, we lack measurement-level predictors because classifiers were captured at the inspection level. We did not study all substances that may be of interest and did not use the information that substances measured on the same sampling media should have the same V. We must note that CART models tend to over-fit data and their predictions depend on the selected data, as illustrated by contrasting predictions created using all data vs. limited to styrene.
Conclusions
We developed predictive models of sampled air volume that should enable the calculation of LOD for non-detects in IMIS. Our predictions may guide future work on handling non-detects in IMIS, although it is advisable to develop separate predictive models for each substance, industry, and year of interest, while also considering other factors, such as whether the measurement evaluated long-term or short-term exposure.