Bring Your Own Location Data: Use of Google Smartphone Location History Data for Environmental Health Research
The research evaluates the potential of Google Location History (GLH) data for capturing long-term retrospective time-activity data in environmental health studies. By analyzing data from 378 individuals, the study found that GLH data, which spans from 2010 to 2021, can significantly enhance exposure assessments for pollutants like nitrogen dioxide, suggesting that incorporating GLH data could improve environmental epidemiology research while addressing privacy concerns.
College of Health researcher(s)
Abstract
Background
Environmental exposures are commonly estimated using spatial methods, with most epidemiological studies relying on home addresses. Passively collected smartphone location data, like Google Location History (GLH) data, may present an opportunity to integrate existing long-term time–activity data.
Objectives
We aimed to evaluate the potential use of GLH data for capturing long-term retrospective time–activity data for environmental health research.
Methods
We included 378 individuals who participated in previous Global Positioning System (GPS) studies within the Washington State Twin Registry. GLH data consists of location information that has been routinely collected since 2010 when location sharing was enabled within android operating systems or Google apps. We created instructions for participants to download their GLH data and provide it through secure data transfer. We summarized the GLH data provided, compared it to available GPS data, and conducted an exposure assessment for nitrogen dioxide (NO2) air pollution.
Results
Of 378 individuals contacted, we received GLH data from 61 individuals (16.1%) and 53 (14.0%) indicated interest but did not have historical GLH data available. The provided GLH data spanned 2010–2021 and included 34 million locations, capturing 66,677 participant days. The median number of days with GLH data per participant was 752, capturing 442 unique locations. When we compared GLH data to 2-wk GPS data (∼1.8 million points), 95% of GPS time–activity points were within 100m of GLH locations. We observed important differences between NO2 exposures assigned at home locations compared with GLH locations, highlighting the importance of GLH data to environmental exposure assessment.
Discussion
We believe collecting GLH data is a feasible and cost-effective method for capturing retrospective time–activity patterns for large populations that presents new opportunities for environmental epidemiology. Cohort studies should consider adding GLH data collection to capture historical time–activity patterns of participants, employing a “bring-your-own-location-data” citizen science approach. Privacy remains a concern that needs to be carefully managed when using GLH data.
Google Location History for Environmental Health Research: FAQ
What is Google Location History (GLH) data and how can it be used for environmental health research?
GLH data comprises location information routinely collected by Google since 2010 from smartphones with location services enabled. This data, owned and managed by individuals, is a valuable resource for environmental health research. It allows researchers to study the impact of environmental exposures, such as air pollution, noise, walkability, and green space, on individuals' health. By analyzing individuals' movement patterns and the environments they frequent, researchers can gain a more detailed understanding of their exposure levels.
How does GLH data compare to traditional methods of exposure assessment?
Traditional methods, like relying on home addresses, often lead to exposure misclassification and biases. GLH data offers several advantages:
- Long-term retrospective data: GLH provides historical location data, offering insights into long-term exposure patterns.
- Detailed time-activity patterns: It captures individuals' movements throughout the day, allowing for a more accurate assessment of exposures beyond their residence.
- Large-scale data collection: GLH data can be readily collected from large populations, facilitating large-scale environmental health studies.
- Passive data collection: GLH data is collected passively, eliminating compliance issues and biases associated with active data collection methods like surveys or GPS trackers.
How accurate is GLH data compared to GPS data?
Comparisons between GLH and GPS data show a high degree of spatial correspondence. Studies have found that a significant percentage of GPS time-activity points fall within a close proximity (e.g., 100 meters) of GLH locations. This suggests that GLH data can reliably capture individuals' movement patterns and provide accurate exposure estimates.
Can GLH data be used to study specific environmental exposures?
Yes, GLH data can be used to study various environmental exposures. Researchers can utilize GLH data to:
- Estimate air pollution exposure: By combining GLH data with air pollution models, researchers can assess individuals' exposure levels to pollutants like nitrogen dioxide (NO2) at different locations and times.
- Evaluate exposure to green spaces: GLH data can determine the amount of time individuals spend in green spaces, which can be linked to health benefits.
- Assess noise pollution exposure: Researchers can estimate individuals' exposure to noise pollution based on their location history and noise level data.
What are the privacy concerns related to using GLH data?
GLH data raises significant privacy concerns as it contains detailed information about individuals' movements and locations over extended periods. To mitigate these concerns:
- Informed consent is crucial: Participants must be fully informed about the nature of GLH data and provide explicit consent for its use in research.
- Data security measures are essential: Researchers need to implement robust data security protocols to protect participants' privacy.
- Data anonymization techniques: De-identifying personal information from GLH data is crucial to ensure anonymity.
- Transparent data management practices: Clear guidelines and protocols regarding data access, storage, and sharing should be established and communicated to participants.
What are the limitations of using GLH data for research?
While GLH data presents promising opportunities, it's essential to acknowledge its limitations:
- Potential for selection bias: Individuals willing to share GLH data might not represent the general population.
- Inconsistent data collection intervals: Unlike GPS loggers, GLH data isn't collected at fixed intervals, leading to potential gaps in location information.
- Accuracy variations: GLH data accuracy can vary depending on factors like signal strength and device type.
- Data retention policies: Google's data retention policies could limit the availability of historical data.
How can GLH data be integrated into existing health studies?
Researchers can leverage GLH data in existing health studies by:
- Adding GLH data collection: Requesting participants' consent to share their GLH data, promoting it as a citizen science approach to contribute to health research.
- Linking GLH data to existing cohorts: Connecting GLH data to cohort studies with existing surveys, biomarkers, and medical records allows for comprehensive analysis.
- Developing analytical tools: Building efficient data processing pipelines and analytical tools is essential for handling large-scale GLH datasets.
What future research directions are possible with GLH data?
GLH data opens up numerous avenues for future research:
- Personalized exposure assessment: Developing models that consider individual time-activity patterns, microenvironments, and other relevant factors for a more accurate and personalized exposure assessment.
- Exposome research: Integrating GLH data with other exposure data to capture the totality of environmental exposures and their combined health impacts.
- Understanding mobility patterns: Investigating how factors like socioeconomic status, built environment, and health conditions influence individuals' mobility patterns.
- Health interventions: Using GLH data to design targeted interventions aimed at reducing environmental exposures and promoting healthier behaviors.