Theoretical and Natural Science

- The Open Access Proceedings Series for Conferences


Theoretical and Natural Science

Vol. 25, 20 December 2023


Open Access | Article

Evaluating random sampling bias in sentiment analysis of social media data

Xuanting Xiong * 1
1 University of Rochester

* Author to whom correspondence should be addressed.

Theoretical and Natural Science, Vol. 25, 36-42
Published 20 December 2023. © 2023 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Xuanting Xiong. Evaluating random sampling bias in sentiment analysis of social media data. TNS (2023) Vol. 25: 36-42. DOI: 10.54254/2753-8818/25/20240895.

Abstract

In this age marked by a wealth of information, the relevance of social networks has increased in a manner that is analogous to an exponential growth curve. Notably, the content that is shared on these platforms has the potential to act as a reflection of the emotional states that people are now experiencing. The importance of emotions is brought to light in the research presented here, which makes use of a technique based on a review of the relevant literature to analyze the problem of random sample bias and the effects that it has on sentiment analysis. It is possible to draw the conclusion, on the basis of the findings of the research, that the problem of random sample propensity is not a sporadic or insignificant one. In addition, the findings of the study indicate the presence of multiple types of prejudice. Because of the potential repercussions that could result from doing a distorted sentiment analysis, it is really necessary to keep your method focused.

Keywords

Social Media Data, Random Sampling Bias, Sentiment Analysis

References

1. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.

2. Kemp, S. (2020). Digital 2020: Global digital overview. Datareportal.

3. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of Twitter data. In Proceedings of the workshop on languages in social media (pp. 30-38). Association for Computational Linguistics.

4. Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In Seventh international AAAI conference on weblogs and social media.

5. Zhang, A. X., Chen, R. M., & Carley, K. M. (2018). Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science. World Scientific.

6. Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA).

7. Smith, T. M. (2013). Sampling and statistical methods for behavioral ecologists. Cambridge University Press.

8. Bethlehem, J. (2010). Selection bias in web surveys. International Statistical Review, 78(2), 161-188.

9. Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31(3), 980-991.

10. Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646-675.

11. Salganik, M. J. (2017). Bit by bit: Social research in the digital age. Princeton University Press.

12. Lohr, S. (2019). Sampling: Design and Analysis. Chapman and Hall/CRC.

13. Cochran, W. G. (2007). Sampling techniques. John Wiley & Sons.

14. Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 853-867). Springer, Boston, MA.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 3rd International Conference on Computing Innovation and Applied Physics
ISBN (Print)
978-1-83558-233-6
ISBN (Online)
978-1-83558-234-3
Published Date
20 December 2023
Series
Theoretical and Natural Science
ISSN (Print)
2753-8818
ISSN (Online)
2753-8826
DOI
10.54254/2753-8818/25/20240895
Copyright
20 December 2023
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated