Routing Struktural Adaptif: Arsitektur Hybrid Chunking untuk RAG pada Dokumen Pembelajaran Teknik Informatika Adaptive Structural Routing: Hybrid Chunking Architecture for RAG in Informatics Engineering Learning Documents
Main Article Content
Abstract
Retrieval-Augmented Generation (RAG) has received attention in various studies, but research specifically addressing adaptive chunking strategies for Indonesian-language Informatics learning documents remains very limited. This study aims to design a hybrid adaptive chunking system that routes each document section to an appropriate chunking strategy based on structural signals detected at the preprocessing stage. This study used a Design and Development Research (DDR) approach through the stages of document analysis, system architecture design, and expert validation involving three experts in Informatics and Natural Language Processing (NLP). Data were collected through structured expert review instruments and scenario walkthrough sessions. The results showed that rule-based structural detection was able to reliably distinguish heading, narrative, list, and code block sections, supported by a confidence-based fallback mechanism. The conclusion of this study affirms that hybrid adaptive chunking plays an important role in maintaining the semantic coherence of learning materials in RAG systems. These findings contribute to the development of adaptive information retrieval studies and broaden understanding of RAG design aligned with pedagogical needs in the Indonesian-language academic context. The implications of this study include the provision of a reusable design framework for Indonesian-language technical documents and practical guidance for developers of educational AI systems.

Citation Metrics:
Downloads
Article Details

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
References
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
Chaerul Haviana, S. F., Agus Riyadi, M., & Kusumaningrum, R. (2025). Evaluation of chunking strategies in RAG application for explicit retrieval on Indonesian language scientific papers. 2025 12th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 59–65. https://doi.org/10.1109/EECSI67060.2025.11290624
Darmawan, F., Purnama, W. G., & Nurcahyo, A. A. (2025). Prototipe Sistem Chatbot Panduan Akademik Fakultas Teknik Unpas Menggunakan Large Language Model. Jurnal Sistem dan Informatika (JSI), 19(2), 72–82. https://doi.org/10.30864/jsi.v19i2.733
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Ellis, T. J., & Levy, Y. (2010). A guide for novice researchers: Design and development research methods. Proceedings of the Informing Science and IT Education Conference, 107–118. https://doi.org/10.28945/1309
Elysia, S., & Herianto. (2024). Chatbot Berbasis Retrieval Augmented Generation (RAG) untuk Peningkatan Layanan Informasi Sekolah. Journal TIFDA (Technology Information and Data Analytic), 1(2), 52–58. https://doi.org/10.70491/tifda.v1i2.52
Firdaus, D., Sumardi, I., & Kulsum, Y. (2024). Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7B for Indonesian medical herb. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), 230–243. https://doi.org/10.14421/jiska.2024.9.3.230-243
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv. https://arxiv.org/abs/2312.10997
Hidayat, L. R., Wijaya, I. G. P. S., & Dwiyansaputra, R. (2025). Optimalisasi Layanan Sistem Informasi Mahasiswa dengan Integrasi Telegram: Chatbot Retrieval-Augmented-Generation Berbasis Large Language Model. Jurnal Teknologi Informasi, Komputer, dan Aplikasinya (JTIKA), 7(1), 121–131. https://doi.org/10.29303/jtika.v7i1.459
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. Proceedings of EMNLP 2020, 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Pujiono, I., Agtyaputra, I. M., & Ruldeviyani, Y. (2024). Implementing Retrieval-Augmented Generation and Vector Databases for Chatbots in Public Services Agencies Context. JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), 10(1), 216–223. https://doi.org/10.33480/jitk.v10i1.5572
Richey, R. C., & Klein, J. D. (2014). Design and development research: Methods, strategies, and issues. Routledge. https://doi.org/10.4324/9781410611925
Riduwan. (2013). Skala Pengukuran Variabel-Variabel Penelitian. Alfabeta.
Samudra, G., Turmudi Zy, A., & Ermanto. (2025). Implementation of Retrieval Augmented Generation (RAG) in the design of digestive health chatbot. Journal of Soft Computing Exploration (JSAI), 8(1), 181–188. https://doi.org/10.36085/jsai.v8i1.7678
Shi, W., Min, S., Yasunaga, M., Seo, M., James, R., Lewis, M., Zettlemoyer, L., & Yih, W. (2024). REPLUG: Retrieval-augmented language model pre-training. Proceedings of NAACL 2024, 3301–3316. https://doi.org/10.18653/v1/2024.naacl-long.183
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14. https://doi.org/10.3102/0013189X015002004
Tohir, H., Merlina, N., & Haris, M. (2024). Utilizing Retrieval-Augmented Generation in Large Language Models to Enhance Indonesian Language NLP. JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), 10(2), 352–360. https://doi.org/10.33480/jitk.v10i2.5916
Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., Yang, L., Zhang, W., Jiang, J., & Cui, B. (2024). Retrieval-Augmented Generation for AI-Generated Content: A survey. arXiv. http://arxiv.org/abs/2402.19473














