The World Wide Web (WWW) has become a popular and readily accessible big data source in recent decades. The information in the WWW is offered in many different types, e.g. Google Trends, which provides deep insights into people's search queries in the Google Search engine. Analysing this kind of data is not straightforward because they usually take the form of high-dimensional data, given that the latter can be collected over extensive periods. Comparing Google Trends' means of different groups of people or Countries can help understand many phenomena and provide very appealing insights into populations' interests in specific periods and areas. However, appropriate statistical techniques should be adopted when inspecting and testing differences in such data due to the well-known curse of dimensionality. This paper suggests an original approach to dealing with Google Trends by concentrating on the search for the “Cytotec” abortion drug. The final purpose of the application is to determine if different Countries' abortion legislation can influence the research trends. This research focuses on Functional Data Analysis (FDA) to deal with high-dimensional data and proposes a generalisation of the classical functional analysis of variance model, namely the Augmented Functional Analysis of Variance (A-fANOVA). To test the existence of statistically significant differences among groups of Countries, A-fANOVA considers additional curves' characteristics provided by the velocity and acceleration of the original google queries over time. The proposed methodology appears to be intriguing for capturing additional information about curves' behaviours with the final aim of offering a monitoring tool for policy-makers.
Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries
Fabrizio Maturo
;
2022
Abstract
The World Wide Web (WWW) has become a popular and readily accessible big data source in recent decades. The information in the WWW is offered in many different types, e.g. Google Trends, which provides deep insights into people's search queries in the Google Search engine. Analysing this kind of data is not straightforward because they usually take the form of high-dimensional data, given that the latter can be collected over extensive periods. Comparing Google Trends' means of different groups of people or Countries can help understand many phenomena and provide very appealing insights into populations' interests in specific periods and areas. However, appropriate statistical techniques should be adopted when inspecting and testing differences in such data due to the well-known curse of dimensionality. This paper suggests an original approach to dealing with Google Trends by concentrating on the search for the “Cytotec” abortion drug. The final purpose of the application is to determine if different Countries' abortion legislation can influence the research trends. This research focuses on Functional Data Analysis (FDA) to deal with high-dimensional data and proposes a generalisation of the classical functional analysis of variance model, namely the Augmented Functional Analysis of Variance (A-fANOVA). To test the existence of statistically significant differences among groups of Countries, A-fANOVA considers additional curves' characteristics provided by the velocity and acceleration of the original google queries over time. The proposed methodology appears to be intriguing for capturing additional information about curves' behaviours with the final aim of offering a monitoring tool for policy-makers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.