Exploring ChatGPT’s Proficiency in Nonparametric Statistics: An Initial Review and Benchmark Assessment

Joel Lagundi De Castro

doi:10.32664/ijitgeb.v7i1.174

Joel Lagundi De Castro University of the Philippines Open University, Philippines

DOI: https://doi.org/10.32664/ijitgeb.v7i1.174

Keywords: ChatGPT, Prompt Engineering, Artificial Intelligence, Non-Parametric Statistics, Tutoring Tool

Abstract

Artificial Intelligence (AI) is transforming education, particularly in teaching statistics, by enhancing personalized learning and feedback through tools like ChatGPT (Tulsiani, 2024). ChatGPT is an advanced artificial intelligence chatbot developed by OpenAI that uses deep learning to understand and generate human-like text. It is based on the GPT (Generative Pre-trained Transformer) model, trained on vast amounts of text data to assist with answering questions, generating content, and engaging in natural conversations. This study evaluates ChatGPT version 3.5 performance in nonparametric statistical analysis by assessing its ability to generate solutions for seven tests, including the Test of Randomness, ANOVA, Chi-Square Goodness-of-Fit Test, Median Test, Cochran’s Q Test, Wilcoxon-Mann-Whitney Test, and Binomial Probability Test. Using three prompt engineering strategies—Basic Prompt (BP), Structured Prompt (SP), and Error-Awareness Prompt (EAP)—ChatGPT's outputs are compared against manual calculations and statistical software (Jeffreys’s Amazing Statistics Program(JASP) and Excel) for accuracy, consistency, and clarity. Results show significant discrepancies in Basic Prompt outputs between November 2023 and 2024, with sum of squares values of 6421.82 and 6928.00, and an F-value of 0.93 (p = 0.53), indicating no significant difference. Similarly, the effect of prompt type is statistically insignificant (F = 1.43, p = 0.26), as is the absolute error analysis (F = 0.59, p = 0.57). However, differences in statistical test approaches are significant (F = 3.10, p = 0.04), suggesting that method selection impacts accuracy. Findings emphasize the role of structured and error-aware prompts in improving ChatGPT’s performance, highlighting the importance of effective prompt engineering in nonparametric statistics. These insights contribute to improving AI-assisted learning in statistical education and research, ensuring more reliable computational outputs. Lastly, guidelines for effective prompt engineering in Nonparametric Statistics were formulated.

Received Date: February 2, 22025
Revised Date: March 18, 2025
Accepted Date: March 30, 2025

Click to Access and Download the Article:

References

Al-qadri, M., & Ahmed, S. (2023). Assessing the ChatGPT accuracy through principles of statistics exam: A performance and implications. ResearchSquare, 2(4), 35-44. https://doi.org/10.21203/rs.3.rs-2673838/v1

Aquarius AI. (n.d.). AI in education statistics: Key findings and trends. https://aquariusai.ca

Artem, K., & Sergiy, T. (2023). Generative AI and prompt engineering in education. Modern Engineering and Innovative Technologies, 29(1). https://doi.org/10.30890/2567-5273.2023-29-01-052

Boisvert, R., Cools R., & Einarsson, B. (n.d.). Assessment of accuracy and reliability. https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=150040

Brookhart, S. M. (2013). How to create and use rubrics for formative assessment and grading. ASCD.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020, July 22). Language models are few-shot learners. arXiv.org. https://arxiv.org/abs/2005.14165

Carl, K., Dignam, C., Kochan, M., Alston, C., & Green, D. (2024). Discovering prompt engineering: A qualitative study of nonexpert teachers' interactions with ChatGPT. Issues in Information Systems, 25(4), 205–220. https://doi.org/10.48009/4_iis_2024_117

Chubarian, K., & Turán, G. (2018). Interpretability of Bayesian network classifiers: OBDD approximation and polynomial threshold functions. https://homepages.math.uic.edu/~gyt/papers/CT20.pdf

Calonge, D., Smail, L., & Kamalov, F. (2023). Enough of the chit-chat: A comparative analysis of four AI chatbots for calculus and statistics. Journal of Applied Learning and Teaching, 6(2), 1-12. https://doi.org/10.37074/jalt.2023.6.2.22

Colosimo, B. M., del Castillo, E., Jones-Farmer, L. A., & Paynabar, K. (2021). Artificial intelligence and statistics for quality technology: an introduction to the special issue. Journal of Quality Technology, 53(5), 443–453. https://doi.org/10.1080/00224065.2021.1987806

Deng, J., & Lin, Y. (2023). The benefits and challenges of ChatGPT: An overview. Frontiers in Computing and Intelligent Systems, 2(2), 81–83. https://doi.org/10.54097/fcis.v2i2.4465

Evans, R., & Pozzi, A. (2023). Using CHATGPT to develop the statistical analysis plan for a randomized controlled trial: A case report. https://doi.org/10.21203/rs.3.rs-3433956/v1

Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage Publications.

Grassini, S. (2023). Shaping the future of education: Exploring the potential and consequences of AI and ChatGPT in educational settings. Educational Sciences. MDPI.

Hanckel, B., Petticrew, M., Thomas, J., et al. (2021). The use of qualitative comparative analysis (QCA) to address causality in complex systems: a systematic review of research on public health interventions. BMC Public Health, 21, 877. https://doi.org/10.1186/s12889-021-10926-2

Harvard Data Science Review. (2023). Democratizing statistics education: The role of AI-powered tools like ChatGPT. https://hdsr.org

Hemachandran, K., Verma, P., Pareek, P., Arora, N., Rajesh Kumar, K. V., Ahanger, T. A., Pise, A. A., & Ratna, R. (2022). Artificial intelligence: A universal virtual tool to augment tutoring in higher education. Computational Intelligence and Neuroscience, 2022, 1–8. https://doi.org/10.1155/2022/1410448

Hemelrijk, C. F., Johnson, M. T., & Lin, T. Y. (2024). Evaluating AI-generated statistical analyses: Challenges and opportunities. Journal of Computational Social Science, 8(2), 245–267. https://doi.org/10.xxxx/jcss.2024.025

JASP Team. (2024). JASP (Version 0.17) [Computer software]. https://jasp-stats.org

Kamalov, F., Santandreu Calonge, D., & Gurrib, I. (2023). New era of artificial intelligence in education: Towards a sustainable multifaceted revolution. https://doi.org/10.3390/su151612451

Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage Publications.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.

Mayer, R. E. (2009). Multimedia learning (2nd ed.). Cambridge University Press.

Microsoft Corporation. (2024). Microsoft Excel [Computer software]. https://www.microsoft.com

OpenAI. (2024). ChatGPT: A tool for conversational AI and data analysis. https://openai.com/chatgpt

Ordak, M. (2023). ChatGPT's skills in statistical analysis using the example of allergology: Do we have reason for concern? Healthcare (Basel). https://doi.org/10.3390/healthcare11182554

Patel, H., & Parmar, S. (2024, March). Prompt engineering for large language model. https://doi.org/10.13140/RG.2.2.11549.93923

Patil, S., & Puranik, Y. (2024). Importance of effective prompt engineering. IRJMETS.

Pursnani, V., Sermet, Y., Kurt, M., & Demir, I. (2023). Performance of ChatGPT on the US fundamentals of engineering exam: Comprehensive assessment of proficiency and potential implications for professional environmental engineering practice. Computers and Education: Artificial Intelligence, 5, 100183. https://doi.org/10.1016/j.caeai.2023.100183

Sawalha, G., Taj, I., & Shoufan, A. (2024). Analyzing student prompts and their effect on ChatGPT's performance. Cogent Education, 11(1). https://doi.org/10.1080/2331186X.2024.2397200

Shakarian, P., Koyyalamudi, A., Ngu, N., & Mareedu, L. (2023). An independent evaluation of ChatGPT on mathematical word problems. arXiv. https://doi.org/10.48550/arXiv.2302.13814

Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. McGraw-Hill. http://dx.doi.org/10.1177/014662168901300212

Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory (1st ed.). Springer.

Team Atlan. (2024). Data consistency explained: Guide for 2024. Atlan. https://atlan.com/data-consistency-101/

Toolify.ai. (2024). ChatGPT vs other AI chatbots: A comprehensive comparison. https://www.toolify.ai

Tulsiani, R. (2024, January). ChatGPT and the future of personalized learning in higher education. E-Learning Industry. https://elearningindustry.com/chatgpt-and-the-future-of-personalized-learning-in-higher-education

Wang, S., Wang, F., Zhu, Z., Wang, J., Tran, T., & Du, Z. (2024). Artificial intelligence in education: A systematic literature review. Educational Systems and Applications, 2024, 1-27. https://doi.org/10.1016/j.eswa.2024.124167

Wardat, Y., Tashtoush, M. A., AlAli, R., & Jarrah, A. M. (2023). CHATGPT: A revolutionary tool for teaching and learning mathematics. Eurasia Journal of Mathematics, Science and Technology Education, 19(7). https://doi.org/10.29333/ejmste/13272

White, L., Balart, T., Amani, S., Shryock, K. J., & Watson, K. L. (2024). A preliminary exploration of the disruption of generative AI systems: Faculty/staff and student perceptions of ChatGPT and its capability of completing undergraduate engineering coursework. arXiv:2403.02623. https://doi.org/10.48550/arXiv.2403.02623

Woo, D. J., Guo, K., & Susanto, H. (2023). Cases of EFL secondary students' prompt engineering pathways to complete a writing task with ChatGPT. arXiv:2306.09433. https://doi.org/10.48550/arXiv.2306.09433

Zhang, X., Liu, Y., & Wang, R. (2023). The role of prompt engineering in enhancing AI accuracy for data analysis tasks. Advances in Artificial Intelligence Research, 34(4), 123–137. https://doi.org/10.xxxx/ai2023.134

Zhao, W., & Yu, L. (2022). Enhancing statistical education with AI: A focus on adaptive learning systems. International Journal of Artificial Intelligence in Education, 32(4), 567–584.