A Comparison of Mainstream Large Language Models’ Performance in Chinese-to-Japanese Political Text Translation: An Empirical Analysis Based on BLEU and TER

Qi  Shi; Kui Zhu

doi:10.54691/szsj5w82

Authors

Qi Shi
Kui Zhu

DOI:

https://doi.org/10.54691/szsj5w82

Keywords:

Generative AI; Chinese-to-Japanese Translation; Political Text; Machine Translation Evaluation; BLEU; TER.

Abstract

With the rapid development of generative artificial intelligence, Large Language Models (LLMs) are being increasingly applied in the field of machine translation. However, their performance in high-difficulty domains such as political text translation still requires systematic evaluation. Using the Report to the 20th National Congress of the Communist Party of China (CPC) as the research corpus, this study selects four mainstream LLMs—DeepSeek, Doubao, ChatGPT, and Gemini—as research subjects. Taking the official Japanese version translated by the Institute of Party History and Literature of the CPC Central Committee as the reference text, this study quantitatively evaluates the Chinese-to-Japanese translation results of each model using two automated evaluation metrics: BLEU (Bilingual Evaluation Understudy) and TER (Translation Edit Rate), supplemented by qualitative analysis through case comparisons. The results indicate that Gemini performed best across both BLEU and TER metrics, with its translations approaching human standards in terms of structural restoration, terminology handling, and stylistic conformity. ChatGPT and DeepSeek showed moderate overall performance, with differences that were not statistically significant. Doubao performed the worst in both metrics, with primary issues concentrated in the inappropriate use of honorifics (Keigo) and the mistranslation of specific technical terms. The conclusions of this paper provide empirical evidence for the application of generative AI in professional translation and offer references for the optimization of models for political text translation in the future.

Downloads

Download data is not yet available.

References

[1] Chen, Y. (2023). Beyond ChatGPT: Opportunities, Risks, and Challenges of Generative AI. Journal of Shandong University (Philosophy and Social Sciences Edition), (03), 127-143.

[2] Feng, Z. (2018). Parallel Development of Machine Translation and Artificial Intelligence. Foreign Languages (Journal of Shanghai International Studies University), 41(06), 35-48.

[3] Feng, Z., Zhang, D., & Rao, G. (2023). From Turing Test to ChatGPT: Milestones and Enlightenments of Human-Computer Dialogue. Language Strategy Research, 8(02), 20-24.

[4] Wang, Y. (2025). Quality Assessment of Generative AI in Japanese-to-Chinese Translation: A Case Study of GPT-4o. Japanese Learning and Research, (02), 12-24.

[5] Xie, L., & Wang, Y. (2018). A Study of Political Discourse Translation from the Perspective of China’s International Image Construction. Foreign Language Education, 39(05), 7-11.

[6] Zhang, Q. (2017). International Communication and Translation of Chinese Political Discourse. In Foreign Language Research Papers. Tianjin: School of Foreign Languages, Tiangong University, 42-47.

[7] Obeidat, M. M., et al. (2024). Analyzing the performance of Gemini, ChatGPT, and Google Translate in rendering English idioms into Arabic. FWU Journal of Social Sciences, 18(4), 1-18.

[8] Papineni, K., et al. (2002). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th ACL, 311-318.

[9] Snover, M., et al. (2006). A study of translation edit rate with targeted human annotation. Proceedings of AMTA, 223-231.