Evaluating Google Neural Machine Translation from Chinese to English: Technical vs. Literary Texts

Authors

  • Zhongming Zhang Faculty of Modern Languages and Communication Universiti Putra Malaysia
  • Syed Nurulakla Syed Abdullah Faculty of Modern Languages and Communication Universiti Putra Malaysia
  • Muhammad Alif Redzuan Abdullah Faculty of Modern Languages and Communication Universiti Putra Malaysia
  • Wenqi Duan Faculty of Modern Languages and Communication Universiti Putra Malaysia

DOI:

https://doi.org/10.17576/gema-2025-2503-09

Keywords:

Google Neural Machine Translation (GNMT), Translation Quality Evaluation, Technical and Literary Texts, Multidimensional Quality Metrics (MQM), COMET Metric

Abstract

As the global need for translation increases, machine translation (MT) has significantly enhanced the efficiency in facilitating information dissemination and cross-cultural communication. However, its quality remains bound by intrinsic limitations among language pairs and text genres. These discrepancies lead to distinct MT performance when processing technical and literary texts, forming the core gap and focus. This study aims to compare the quality of Google Neural Machine Translation (GNMT) in literary and technical texts, investigating error disparities and establishing the abilities and limits of MT across diverse linguistic contexts. The research was concerned with the English-Chinese language pair with the Multidimensional Quality Metrics (MQM) framework for manual annotation. The COMET automatic evaluation metric was also applied for validation and confirmation of quality differences observed. This study selected five excerpts from Apple product manuals (33 aligned units) and the novel, the Old Man and Sea (32 aligned units), respectively. Findings included (1) GNMT performed well with technical texts, but acted less effective with literary texts and technical texts exhibited notable terminology errors, whereas literary texts showed more stylistic inconsistencies; (2) MQM scores demonstrated a remarkable difference, with technical texts outperforming literary texts by 18.57%; and (3) COMET evaluation validated the above observations, confirming a significant difference between the two text styles. Although GNMT faced challenges with both texts, the quality remained acceptable within this study. Results recommend improving GNMT algorithms to enhance accuracy and remedy error patterns and distributions. 

Author Biographies

Zhongming Zhang, Faculty of Modern Languages and Communication Universiti Putra Malaysia

Zhang Zhongming is a PhD candidate in Translation Studies at UPM, Malaysia, and a university lecturer in China, where he teaches courses in translation and language education. His research interests focus on translation assessment, particularly machine translation, and its applications in education. He also explores innovative teaching methods to enhance students’ practical and academic skills. With a strong interdisciplinary background, his work aims to improve translation quality and modernize teaching approaches, bridging theory and practice in both fields.

Syed Nurulakla Syed Abdullah, Faculty of Modern Languages and Communication Universiti Putra Malaysia

Syed Nurulakla bin Syed Abdullah is an Assistant Professor at Universiti Putra Malaysia (UPM) and a Senior Lecturer at the Department of Foreign Languages, Faculty of Modern Languages and Communication, Universiti Putra Malaysia. He is renowned for translating the world masterpiece Rihlah Ibn Battutah into Malay as Pengembaraan Ibn Battutah: Pengembara Agung, Karya Terulung, Menyingkap Wajah Dunia, launched by Sultan Selangor in 2004. He earned a Ph.D. from the University of Malaya in 2015, specializing in translation. Widely recognized as an instructor by language institutions, he translated Roger T. Bell’s Translation and Translating into Malay in 2012 and recently translated Iktibar daripada Kehidupanku (2021). Actively involved in national translation activities, he contributes to organizations like Bank Negara Malaysia, RMK-12, and PR agencies. His extensive publications encompass journal articles, book chapters, books, and conference proceedings. At UPM, he strengthens translation initiatives through teaching, research, supervision, and publication while guiding international Ph.D. students from the Middle East and China.

Muhammad Alif Redzuan Abdullah, Faculty of Modern Languages and Communication Universiti Putra Malaysia

Muhammad Alif Redzuan Abdullah is currently a Senior Lecturer in the Faculty of Modern Languages and Communication at Universiti Putra Malaysia. He has published articles in research journals in the area of his studies. His research interests include Translation, Interpretation, and Comparative Applied Linguistics. Furthermore, he is actively shaping the next generation as he supervises a cadre of Chinese Ph.D. students in Translation and Interpreting at UPM, showcasing his commitment to cross-cultural communication and academic mentorship.

Wenqi Duan, Faculty of Modern Languages and Communication Universiti Putra Malaysia

Duan Wenqi is a PhD candidate in Translation Studies at Universiti Putra Malaysia (UPM) and a university teacher in China, where she teaches English Interpretation, College English, and Academic Writing. Her research interests include translation and culture, classical literature, and applied linguistics. She also has extensive experience in education and teaching. Her work aims to explore the intersection of translation and cultural studies, particularly in the context of classical literature, while enhancing translation strategies that bridge linguistic and cultural gaps and improving pedagogical approaches in language and translation education.

References

Alenezi, A. M. (2024). Error analysis of neural machine translation in technical texts: Google Translate as a case study. Journal of the North for Humanities, 9(2, Part 1), 167–181. https://doi.org/10.12816/0061799

Alzain, E., Nagi, K. A., & Algobaei, F. (2024). The Quality of Google Translate and ChatGPT English to Arabic Translation: The Case of Scientific Text Translation. In Forum for Linguistic Studies (Vol. 6, No. 4, pp. 837-849). http://dx.doi.org/10.30564/fls.v6i3.6799

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Baker, M. (2011). Corpus linguistics and translation studies—implications and applications. In Text and technology: In honour of John Sinclair (pp. 233-250). John Benjamins Publishing Company.

Cai, L. (2024). How does ChatGPT Compare with Conventional Neural Machine Translation Systems in Performing a Chinese to English Translation Task?. Journal of Translation Studies, 4(1), 25-45. http://dx.doi.org/10.3726/JTS012024.02

Chéragui, M. A. (2012). Theoretical Overview of Machine translation. ICWIT, 160-169.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46.

Dunder, I., Seljan, S., & Pavlovski, M. (2021). What Makes Machine-Translated Poetry Look Bad? A Human Error Classification Analysis. In Central European conference on information and intelligent systems (pp. 183-191). Faculty of Organization and Informatics Varazdin.

Fakih, A., Ghassemiazghandi, M., Fakih, A. H., & Singh, M. K. (2024). Evaluation of Instagram’s Neural Machine Translation for Literary Texts: An MQM-Based Analysis. GEMA Online Journal of Language Studies, 24(1). http://dx.doi.org/10.17576/gema-2024-2401-13

Fang, Q. A Comparative Analysis on Wu Lao’s and Yu Guangzhong’s Chinese Versions of The Old Man and the Sea. Journal of Innovation and Social Science Research, 9(9), 504–507.

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics,9, 1460-1474. http://dx.doi.org/10.1162/tacl_a_00437

Guerberof-Arenas, A., & Toral, A. (2022). Creativity in translation: Machine translation as a constraint for literary texts. Translation Spaces, 11(2), 184-212. http://dx.doi.org/10.1075/ts.21025.gue

He, L., Ghassemiazghandi, M., & Subramaniam, I. (2024). Comparative assessment of Bing Translator and Youdao Machine Translation Systems in English-to-Chinese literary text translation. In Forum for Linguistic Studies (Transferred) (Vol. 6, No. 2, pp. 1189-1189).

http://dx.doi.org/10.59400/fls.v6i2.1189

Hu, K., & Li, X. (2023). The creativity and limitations of AI neural machine translation: A corpus-based study of DeepL’s English-to-Chinese translation of Shakespeare’s plays. Babel, 69(4), 546-563. http://dx.doi.org/10.1075/babel.00331.hu

Hutchins, W. J. (1986). Machine translation: past, present, future (p. 66). Chichester: Ellis Horwood.

J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174

Ji, B., Duan, X., Zhang, Y., Wu, K., & Zhang, M. (2024). Zero-shot prompting for llm-based machine translation using in-domain target sentences. IEEE/ACM Transactions on Audio, Speech, and Language Processing. http://dx.doi.org/10.1109/TASLP.2024.3519814

Jinfang, Y., Kasuma, S. A., & Moindjie, M. A. (2025). Translator’s Style in Fiction Translation: A Ten-Year Systematic Literature Review. Journal of Language Teaching and Research, 16(1), 125-133. http://dx.doi.org/10.17507/jltr.1601.14

Koehn, P. Neural Machine Translation. Cambridge University Press: Cambridge, UK, 2020.

Kostikova, I., Shevchenko, A., Holubnycha, L., Popova, N., & Budianska, V. (2019). Use of machine translation technology for understanding scientific and technical texts. Journal of Theoretical and Applied Information Technology, 97(4), 1350-1361.

Kuzman, T., Vintar, Š., & Arcan, M. (2019, August). Neural machine translation of literary texts from English to Slovene. In Proceedings of the qualities of literary machine translation (pp. 1-9).

Liu, J. (2020). Comparing and analyzing cohesive devices of SMT and NMT from Chinese to English: a diachronic approach. Open Journal of Modern Linguistics, 10(06),765. http://dx.doi.org/10.4236/ojml.2020.106046

Liu, M., Zhang, H., & Wu, G. (2021). Fine grained human evaluation for English-to-Chinese machine translation: A case study on scientific text. arXiv preprint arXiv:2110.14766.

Lommel, A. (2013). Multidimensional quality metrics: a flexible system for assessing translation quality. In Proceedings of Translating and the Computer 35.

Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumàtica, 12, 455-463.

Long, X., Chen, K., Bamigbade, O. O., & Swenson, D. L. (2023, September). In-Depth Analysis of Machine Translation and Human Translation of Literary Book Chinese Traditional Culture and a Community with a Shared Future for Mankind. In 3rd International Conference on Internet, Education and

Information Technology (IEIT 2023) (pp. 1163-1170). Atlantis Press. http://dx.doi.org/10.2991/978-94-6463-230-9_139

Lu, Y. (2023, July). An Analysis of Error Types in Chinese to English Translation by Google Neural Machine Translation. In Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence (pp. 148-154).

Lyu, C., Du, Z., Xu, J., Duan, Y., Wu, M., Lynn, T., ... & Wang, L. (2023). A paradigm shift: The future of machine translation lies with large language models. arXiv preprint arXiv:2305.01181.

McIntosh, T. R., Susnjak, T., Arachchilage, N., Liu, T., Xu, D., Watters, P., & Halgamuge, M. N. (2025). Inadequacies of large language model benchmarks in the era of generative artificial intelligence. IEEE Transactions on Artificial Intelligence. http://dx.doi.org/10.1109/TAI.2025.3569516

Maxmudjanovna, Y. N., & Xamidjanovna, A. N. (2021). Technical translation as a type of specialized translation. Central Asian Journal of Literature, Philosophy and Culture.

Mohsen, M. (2024). Artificial intelligence in academic translation: A comparative study of large language models and google translate. PSYCHOLINGUISTICS, 35(2), 134-156. http://dx.doi.org/10.31470/2309-1797-2024-35-2-134-156

Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. iScience, 27(10), 110878. https://doi.org/10.1016/j.isci.2024.110878

Ng, Y. L. E. (2009). A Systemic Approach to Translating Style: A Comparative Study of Four Chinese Translations of Hemingway’s The Old Man and the Sea. (Doctoral dissertation, University College London). UCL Discovery.

Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).

Peng, Z., & Yvon, F. (2023). Document-level Machine Translation for Scientific Texts (Doctoral dissertation, ISIR, Université Pierre et Marie Curie UMR CNRS 7222).

Ploeger, E., Lai, H., Van Noord, R., & Toral, A. (2024). Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation. arXiv preprint arXiv:2408.17308.

Rei, R., Stewart, C., Farinha, A. C., & Lavie, A. (2020). COMET: A neural framework for MT evaluation. arXiv preprint arXiv:2009.09025. http://dx.doi.org/10.18653/v1/2020.emnlp-main.213

Shahnazaryan, L., & Beloucif, M. (2024). Defining Boundaries: The Impact of Domain Specification on Cross-Language and Cross-Domain Transfer in Machine Translation. arXiv preprint arXiv:2408.11926.

Siu, S. C. (2023). ChatGPT and GPT-4 for Professional Translators: Exploring the Potential of Large Language Models in Translation. Available at SSRN 4448091. http://dx.doi.org/10.2139/ssrn.4448091

Stewart, C., Rei, R., Farinha, C., & Lavie, A. (2020, October). COMET-Deploying a New State-of-the-art MT Evaluation Metric in Production. In AMTA (2) (pp. 78-109).

Tahseen, W., & Hussein, S. H. (2024). Investigating Machine translation errors in rendering English literary texts into Arabic. Integrated Journal for Research in Arts and Humanities, 4(1), 68-81. http://dx.doi.org/10.55544/ijrah.4.1.11

Tan, Z., Wang, S., Yang, Z., Chen, G., Huang, X., Sun, M., & Liu, Y. (2020). Neural machine translation: A review of methods, resources, and tools. AI Open, 1, 5-21. http://dx.doi.org/10.1016/j.aiopen.2020.11.001

Toral, Antonio, Andreas Van Cranenburgh, and Tia Nutters. “Literary-adapted machine translation in a well-resourced language pair: Explorations with More Data and Wider Contexts.” Computer-Assisted Literary Translation. Routledge, 2023. 27-52. http://dx.doi.org/10.4324/9781003357391-3

Ulitkin, I., Filippova, I., Ivanova, N., & Poroykov, A. (2021). Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment. In E3S Web of Conferences (Vol. 284, p. 08001). EDP Sciences. http://dx.doi.org/10.1051/e3sconf/202128408001

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wang, X., & Wang, T. (2019). A comparative study of human translation and machine translation post-editing in EC Translation: Translation speed, quality and translators’ attitude. Foreign Languages and Cultures, 3(4), 83-93.

Way, A., Youdale, R., & Rothwell, A. (2023). Why more literary translators should embrace translation technology. Revista Tradumática, 21, 87-102. https://doi.org/10.5565/rev/tradumatica.344

Weaver, W. (1952). Translation. In Proceedings of the Conference on Mechanical Translation.

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J.R., Riesa, J., Rudnick, A., Vinyals,

O., Corrado, G.S., Hughes, M., & Dean, J. (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv, abs/1609.08144.

Xie, Y. (2008). Hemingway’s Language Style and Writing Techniques in “The Old Man and the Sea”. English language teaching, 1(2), 156-158. http://dx.doi.org/10.5539/elt.v1n2p156

Ying, C., Shuyu, Y., Jing, L., Lin, D., & Qi, Q. (2021). Errors of machine translation of terminology in the patent text from English into Chinese. ASP Transactions on Computers, 1(1), 12-17.

Zhang, B., Haddow, B., & Birch, A. (2023, July). Prompting large language model for machine translation: A case study. In International Conference on Machine Learning (pp. 41092-41110). PMLR.

Zhao, Y, Zhang, H &Yang, Y. (2024). Comparative Study on the Translation Quality of Large Language Models—Taking the Translation of “Fan Hua” as an Example. Technology Enhanced Foreign Language Education, 4(109), 60-66.

Downloads

Published

2025-08-28