As expected, the set of criteria underwent several revisions and adjustments. Some criteria were modified, some were removed, and new ones were added to the list. This section discusses each of these changes in detail.
Which evaluative criteria should the AI self-assessment writing toolkit include?
At the outset, we devised three major questions for ourselves, considering “realism,” “cognitive challenge,” and “authentic evaluative judgment” to ensure that the toolkit remains innovative, adaptable, and proactive in addressing the evolving needs and challenges our students face. Table 2 provides an overview of how each question aligns with these criteria.
Our toolkit was tailored based on students’ experience by conducting a thorough analysis of their reflections on utilizing AI while keeping the above-mentioned questions in mind at all times. For example, an analysis of medical students’ reflections led to the formulation of the question, “Have you understood the feedback provided by AI?” This question emerged from difficulties reported by some students (7 students) learning English as a foreign language in comprehending certain feedback. For example, one student had written, “Some of the suggestions were too advanced for me to understand and implement fully. I wish there were a simpler explanation or more guidance provided.”
Feedback literacy, introduced by Sutton [60], refers to the ability to read, interpret, and utilize written feedback effectively. This skill enables students to use feedback to improve their work and develop their evaluative judgment [61]. Therefore, upon evaluating the question based on authentic assessment criteria, it was categorized under the criterion of ‘realism,’ as it encourages students to honestly assess their understanding of the AI-generated feedback.
Another reason for including this question is to emphasize the importance of academic integrity and the responsible use of AI tools. When students do not fully understand the feedback provided by AI and simply copy and paste the suggested text without critical evaluation, they risk committing plagiarism and academic misconduct. Encouraging students to reflect on whether they genuinely understand the feedback helps prevent this passive acceptance of AI suggestions, promoting ethical behavior in academic writing. According to Sefcik et al. [62], effective academic integrity education not only addresses the prevention of dishonest behaviors but also fosters the development of ethical decision-making skills. Integrating such reflective questions within the toolkit helps ensure that students engage with AI-generated content critically and use it to enhance their learning rather than replace their independent efforts.
By prompting students to critically assess AI-generated feedback and reflect on its application, the toolkit aims to build a strong foundation for ethical practices and the long-term development of writing skills. Guerrero-Dib et al. [63] emphasize that promoting academic integrity within educational settings has a positive influence on ethical behavior in professional contexts, making it crucial to incorporate such considerations into AI-supported learning tools.
Another key question designed for the toolkit was: ‘Was there any specific feedback that caught your attention and was particularly helpful in your life-long improvement process? Please use a highlighter marker to emphasize or mark them.’ This question emerged because 11 students used adjectives or adverbs like ‘interesting,’ ‘surprisingly,’ and ‘exciting’ when faced with feedback that caught their attention and helped improve their writing process. One student, for example, remarked, “The feedback on the organization of my essay was incredibly valuable in highlighting the need for a logical flow of ideas.” This question is grounded in Krashen’s input hypothesis and constructivist theory. The former suggests that language learning happens when students are subjected to understandable input slightly above their current proficiency level. Feedback that helps students improve their language skills by providing clear and easily understandable input can align with this theory [64]. Constructivist theory posits that learners actively construct their knowledge and understanding through interaction with the environment. The feedback that encourages students to reflect on their writing and language learning processes and take ownership of their learning can be viewed through this lens [65].
We also came up with the following question to enhance students’ critical thinking skills: “How has the feedback from AI helped you think more critically about your writing? Please explain.” This question was devised because some students [9] had mentioned remarks like “the feedback forced me to question my assumptions and gather more evidence” or “it pushed me to evaluate the evidence supporting my decisions and consider alternative approaches.” [66] points out that writing at the university level differs greatly from writing in secondary schools since it requires students to write in a more critical academic style. Academic writing heavily relies on critical thinking and analysis, essential for evaluating information, questioning assumptions, analyzing arguments, synthesizing data, and constructing a coherent argument [67]. However, when students are not taught how to utilize AI tools and assess the information they provide, utilizing them may reduce their creativity and critical thinking skills [5]. Therefore, we thought this question would help student writers develop a more nuanced perspective on their writing abilities and how technology can support their growth as writers.
We formulated another question: “Have you identified your weaknesses (e.g., any recurring mistakes or patterns) as a writer and taken measures to improve them?” For instance, one student pointed out, “I have a tendency to use complex medical jargon that may confuse my readers,” or “I have realized that I neglect proper transitions between paragraphs.” Addressing this weakness has significantly improved my health-related essays’ coherence and flow.” This question was posed in light of a study by Kristensen, Torkildsen and Andersson [68], which found that students who do not pay attention to the feedback they receive are more likely to repeat their mistakes in the future. It is essential to break this cycle to ensure progress. Previous research has found links between error-monitoring efficiency and working memory [69], self-regulation skills [70], and overall academic performance [71]. Therefore, this component has been hypothesized to represent an improvement in cognitive control or the compensatory effort to prevent errors [72].
Regarding writing pedagogy and focusing on the positive side, we designed a specific question for our writing course: “Have you noticed any particular areas in which your writing has improved due to feedback from AI? Explain and provide specific examples.” This question is particularly relevant because all 22 students have reported that certain areas of their writing improved when they received feedback from AI. For instance, one student mentioned, “ChatGPT has pointed out areas where I could be more precise in my language and eliminate unnecessary fluff.” Although almost all students reported improvements in their work, it is important to acknowledge that AI-generated feedback is direct and often lacks the nuanced, personalized insights provided by human evaluators. Consequently, students may become over-reliant on AI tools, which can hinder the development of critical and creative thinking skills over time [73]. According to Barrot [5], while tools like ChatGPT can help students complete writing assignments quickly, this ease of use raises concerns about potential learning loss, particularly in higher-order cognitive skills. As Kanungo, Gupta [74] emphasize, this dependency can weaken metacognitive skills by discouraging active reflection on the writing process. Excessive reliance on AI-generated feedback may hinder students’ ability to engage in authentic self-reflection and self-assessment [35].
To counter these challenges, our toolkit incorporates structured reflection prompts specifically designed to mitigate over-reliance on AI. These prompts encourage students to evaluate their work critically and identify areas for improvement before consulting AI-generated feedback. For example, a prompt such as ” How does the AI feedback challenge the assumptions you’ve made in your arguments, and how might alternative perspectives improve your analysis?” helps students engage with the underlying assumptions of their writing, fostering deeper reflection.
Over-reliance on AI, as Barrot [5] warns, can hinder the development of critical thinking and creativity by encouraging surface-level engagement with feedback rather than analytical and reflective interaction. For instance, students relying solely on AI for grammar corrections or structural suggestions may miss the opportunity to learn underlying principles, such as constructing coherent arguments or synthesizing diverse ideas, which are foundational for academic and professional success [75]. Our toolkit directly addresses these risks by embedding reflection questions that promote active engagement with AI feedback. For example, the prompt, “How does feedback about coherence and cohesion influence the way you connect ideas and structure paragraphs in future assignments?” helps students recognize areas requiring improvement, fostering long-term learning rather than reliance on immediate fixes. This active engagement aligns with Zimmerman [76] framework of self-regulated learning, which highlights the importance of self-monitoring and iterative improvement in sustained academic growth.
The toolkit integrates AI feedback into the reflective process, encouraging students to engage with the feedback critically and independently. By using AI’s insights in conjunction with self-assessment prompts, students are guided to identify areas for improvement and develop a deeper understanding of their writing strengths and weaknesses. This approach not only fosters critical thinking and creativity but also nurtures metacognitive skills, empowering students to become more autonomous and reflective learners. As Denisova-Schmidt [77] emphasizes, fostering ethical and reflective engagement with technological tools in education supports the development of transferable skills critical for professional success.
The final two questions identified as essential for fulfilling authentic assessment objectives were related to strategic goal setting. The first question was: “Have you set specific writing goals for yourself based on the feedback received from AI?” and “How do you plan to apply the feedback received on this assignment to future writing tasks to continue enhancing your language and critical thinking skills?” Answering these two questions involves creating specific, measurable, achievable, relevant, and time-bound (SMART) goals and devising a detailed plan that outlines the steps and actions needed to achieve those goals. According to goal-setting theory, planning and strategizing are a mediating factor [78] and can help students stay organized, focused, and on track toward reaching the desired outcomes [79]. According to Troia, Harbaugh [80], mastery goals focus on acquiring knowledge and skills and attaining a sense of competence, aligning with Ryan and Deci [81] cognitive evaluation theory. Creating mastery goals can also lead to increased self-efficacy, self-regulation, and academic achievement. By setting mastery goals, students can view writing as purposeful and meaningful [82]. Teaching students to create writing objectives and track their progress toward those goals promotes creativity and engagement [83]. Please find below the list of questions from the first draft:
-
1) Have you understood the feedback AI has given to you? If not, circle the number of or underline the feedback you did not fully understand.
-
2) Was there any specific feedback that caught your attention and was particularly helpful in your life-long improvement process? Please mark them.
-
3) How has AI’s feedback helped you think more critically about your writing? Please explain.
-
4) Have you noticed any particular areas in which your writing has improved due to AI feedback?
-
5) Have you been able to identify your weaknesses (e.g., any recurring mistakes or patterns, such as having more than one S-V agreement error) as a writer and take measures to improve?
-
6) Have you set specific writing goals for yourself based on the feedback received from AI?
-
7) How do you plan to apply the feedback received on this assignment to future writing tasks to continue enhancing your language and critical thinking skills?
Once the initial version of the self-assessment toolkit was created, the questions were presented to experts in the focus group discussion. At the same time, anonymous students’ sample reflection papers (Appendix C) were also shared with the experts. This process allowed them to better understand the connection between the questions in the toolkit and the reflection papers created by the students.
As a result of feedback, certain questions were modified, and new ones were added. For instance, two experts suggested that question 3 could be too difficult for some students. Therefore, they recommended that the question be rephrased or that critical thinking skills be broken down into specific factors to enable students to learn from them. The original question, “How has the feedback from AI helped you to think more critically about your writing?” remains the same; however, the following factors have been added to enhance critical thinking: a) Critically evaluate the sources of information. b) Consider alternative perspectives and think more critically about the underlying assumptions. c) Analyze the arguments presented in writing more thoroughly. d) Provide more context for the data presented. e) Construct more coherent arguments in writing.
Experts presented a similar suggestion to help students identify areas in their writing in the following question, “Have you noticed any particular areas in which your writing has improved as a result of feedback from AI?” Similarly, the question was kept, but the following items were added:
-
a) Grammar: Ensure sentences are well-structured and relevant; eliminate grammatical errors.
-
b) Sentence Structure: Vary sentence structure (simple, compound, complex, and compound-complex) to maintain reader interest and clarity.
-
c) Academic Vocabulary: Utilize appropriate and rich terminology and language specific to the field of study.
-
d) Coherence and Cohesion: Ensure that ideas are connected logically using conjunctions and sentence connectors and that transitions between paragraphs are smooth.
-
e) Clarity: Communicate ideas clearly and avoid using language that may confuse the reader. Utilize appropriate tone and register.
-
f) Mechanics: Correctly use mechanics of writing such as indentation, punctuation, capitalization, word endings, etc.
In addition, experts believe that providing students with specific criteria can raise their awareness of writing improvement areas. According to the cognitive theory of consciousness-raising, learners must be conscious of their language production to make progress [84]. By breaking down grammar, sentence structure, academic vocabulary, coherence, cohesion, and clarity, students can more easily identify where to focus their attention to enhance their writing skills. This type of explicit feedback helps students develop a metacognitive awareness of their writing abilities, ultimately improving their proficiency [85].
These subtopics are aligned with the principles of formative assessment, which highlight the significance of offering constructive feedback to students to guide their learning process [54]. By specifically pointing out areas for improvement and offering suggestions for enhancement, students can engage in self-reflection and actively work towards strengthening their writing skills [86]. This targeted feedback can also boost students’ confidence in their abilities and motivate them to continue developing as writers [53].
The experts have also suggested that each set of questions in the self-assessment toolkit should be given an appropriate title so the students can easily understand them. For example, the first two questions (1 & 2) were titled ‘Understanding and Incorporation of Feedback.’ Similarly, questions (4 and 5) were titled ‘Improvement in Writing Skills’ and moved to the second set of questions in the toolkit. Likewise, question 3 was titled ‘Critical Thinking and Analysis,’ and questions 6 and 7 were titled ‘Strategic Goal Setting and Planning.’ As their final suggestion, experts recommend providing sample examples for some questions like questions 6 and 7 to help students learn how to set detailed academic goals and provide strategic planning to achieve those goals. Özlem [87] study shows that when learners receive advice on goal-setting, goal-achievement tactics, and goal reflection, personal goal-setting may help support them in EFL writing situations.
The revised version of the self-assessment toolkit after experts’ suggestions is presented in Table 3.
To what extent do experts consider the questions in the self-assessment toolkit to be important and comprehensive for evaluating students’ writing skills?
As shown in Table 4, the experts provided ratings for the seven questions included in the toolkit. The mean scores ranged from 4.33 to 5.00, with most questions receiving a mean score of 4.83 or higher. This result suggests that the experts collectively viewed the questions as highly important, with over 90% of the questions rated as significant or very significant. The detailed feedback from the experts offered valuable insights that were used to refine and finalize the academic writing toolkit, ensuring the instrument effectively addressed the key areas of concern identified in the earlier research stages.
Overall, the experts rated the toolkit questions highly, with most questions considered very important. This finding indicates that the toolkit is well-designed to assess writing self-assessment skills.
To what extent does the academic writing self-assessment toolkit demonstrate content validity, face validity, and reliability in measuring students’ academic writing skills?
The 22 medical student participants reviewed the academic writing self-assessment toolkit and provided feedback on its content validity and face validity.
Content Validity: The students rated 95% of the toolkit items as highly relevant (score of 4 or 5 on a 5-point scale) for evaluating academic writing proficiency. Additionally, 88% of the items were deemed comprehensive, covering the essential elements necessary for a thorough assessment of writing abilities. However, the students recommended rephrasing a few items for clarity, such as:
Item 3a: “Ensure that sentences are properly structured and free of grammatical errors” could be changed to ” Demonstrate proper sentence structure and grammatical accuracy.” The latter phrasing was considered more natural and intuitive in the Persian language.
Item 3e: “Communicate ideas clearly and avoid using language that may not be very clear to the reader” could be revised to “Express ideas clearly and concisely, using language that is accessible to the reader.” The suggested change was intended to better align with common Persian language usage and conventions.
These minor wording changes were incorporated to enhance the content validity of the toolkit. Content validity is an important aspect of assessment design, as it ensures the instrument measures what it intends to measure [88].
Face Validity: Face validity refers to the extent to which an assessment appears to measure the intended construct at face value [89]. Establishing face validity is crucial for ensuring the acceptability and usability of an assessment tool from the perspective of the target population [88].
The students rated 92% of the toolkit items as highly clear and unambiguous (score of 4 or 5 on a 5-point scale). Furthermore, 90% of the items were deemed appropriate and relevant for a self-assessment of academic writing skills. A small number of items were identified as somewhat confusing or unclear, such as:
Item 5b: “Consider alternative perspectives and think more critically about the underlying assumptions” was rephrased as “Critically examine different viewpoints and underlying assumptions.”
Item 7: The example “Enhance my academic vocabulary knowledge by using a dictionary and thesaurus more and studying root words and prefixes” was revised to “Expand my academic vocabulary by regularly consulting reference materials and studying word roots and affixes.” These changes aimed to better align with common Persian language usage and study practices.
The research team incorporated this valuable student feedback to improve the clarity and interpretability of the self-assessment tool, enhancing its face validity. This student-centered evaluation of content validity and face validity was instrumental in strengthening the academic writing self-assessment toolkit, ensuring it was a comprehensive and user-friendly instrument for measuring writing proficiency.
Reliability: The toolkit’s reliability was rigorously evaluated using two methods. First, Cronbach’s alpha coefficient—a measure of internal consistency—was calculated based on the responses of 20 participants who did not take part in the main phase of the study. The resulting coefficient of 0.91 indicates an excellent level of reliability, demonstrating that the toolkit consistently measures academic writing skills. Additionally, a test–retest reliability method was employed, in which the same toolkit was administered to the same participants at two different time points. The results of this assessment showed a high degree of reliability (r = 0.93), suggesting that the toolkit provides consistent results over time. This finding confirms the reliability of the academic writing self-assessment toolkit. The high Cronbach’s alpha coefficient and Pearson’s r values further attest to the toolkit’s dependability, making it a valuable resource for students to assess their academic writing skills. The comprehensive evaluation process, including rigorous content validity, face validity, and reliability checks, contributes significantly to the credibility and utility of the academic writing self-assessment toolkit. The finalized English version of the reflective toolkit can be found in the supplementary material.
To what extent does the academic writing self-assessment toolkit influence students’ self-evaluations of their writing abilities compared to self-assessments conducted without the toolkit?
To explore the influence of the academic writing self-assessment toolkit on students’ self-evaluations, a comparison was made between self-assessment scores from the pre-toolkit and post-toolkit phases. The findings (Table 5) revealed significant differences in students’ scoring patterns, indicating that the toolkit fostered more critical and reflective evaluations of their writing abilities.
The comparison revealed intriguing trends that underscore the impact of the toolkit on students’ self-assessment processes. Quantitative results indicated decreases in scores for certain areas after using the toolkit. For example, average scores for Understanding Feedback decreased from 4.1 to 3.5, and for Improvement in Writing Skills from 4.0 to 3.3. These decreases reflect a shift toward more critical self-evaluation, consistent with research suggesting that structured tools encourage students to reassess their performance more rigorously [90, 91].
Conversely, Goal Setting saw a notable increase from 3.9 to 4.4, suggesting that the toolkit’s structured approach to goal setting was effective in helping students create more specific and actionable writing objectives. Additionally, the increase in Usefulness of Feedback scores (from 4.3 to 4.8) highlights that students found the toolkit instrumental in using feedback more effectively.
To provide additional context and deepen understanding, follow-up interviews were conducted with ten students, selected through purposive sampling to represent diverse perspectives while working toward data saturation. During these interviews, students were asked to compare their pre- and post-toolkit self-assessment scores and explain their rationale for the differences. Their writing samples were reviewed alongside the interview, aligning scores with the actual quality of their work.
To deepen our understanding of these shifts, follow-up interviews with students were conducted. The qualitative data from these interviews highlighted three primary themes, which further explain the toolkit’s influence on students’ self-evaluations.
Increased awareness of strengths and weaknesses
The toolkit enhanced students’ ability to critically evaluate their writing, fostering a nuanced understanding of strengths and areas for improvement. Several students reflected on how their initial self-assessments lacked the nuance needed for meaningful improvement. As one student noted, “Before using the toolkit, I’d rate myself high in everything because I thought my essay was fine. But after going through the checklist, I realized my paragraphs don’t always flow well, and sometimes my examples are weak or unclear.” Another added, “I didn’t realize how important transitions were. Now, I see how crucial it is to make ideas flow smoothly.”
This deeper awareness was reflected in the drop in Understanding Feedback and Improvement in Writing scores (4.1 to 3.5) and Improvement in Writing (4.0 to 3.3). The toolkit prompted students to evaluate writing tasks more critically, especially in areas like argument structure and clarity. One student commented, “I thought my grammar was my main issue, but now I understand that organizing my ideas clearly is just as important.” However, some students also acknowledged the emotional discomfort of identifying weaknesses they had previously ignored. One student remarked, “It was hard to realize how much I was missing, but it gave me a direction to improve.” This reflects the toolkit’s dual role as both a developmental tool and a mirror for self-awareness.
Clearer goal setting
The toolkit not only made students more aware of their writing weaknesses, but also helped them set more specific and actionable goals for improvement. Many students noted that their pre-toolkit goals were vague, while post-toolkit goals were more targeted. One participant explained, “Before, I just said I wanted to improve my grammar, but I didn’t know where to start. The toolkit made me realize I need to focus on subject-verb agreement or comma usage.” Another added, “Now, I can break down goals like improving introductions or clarifying arguments.”
The toolkit also helped students think long-term about their improvement. One said, “Now I feel like the goals I set will help me in future assignments too.” This improvement in goal-setting clarity aligns with the quantitative increase in Goal-Setting scores (3.9 to 4.4) reflects the greater clarity students gained in setting specific, measurable objectives. The toolkit’s structure made goal-setting a more focused and deliberate process, enhancing students’ self-regulated learning.
Alignment with teacher feedback
Interviews also revealed closer alignment between post-toolkit self-assessments and teacher evaluations, suggesting that the toolkit enabled students to evaluate themselves more realistically. The teacher-interviewer noted, “Before the toolkit, students often overestimated their grammar and vocabulary. After using the toolkit, their self-assessments were much more aligned with what I saw in their writing.” Also, one participant stated, “After the toolkit, my self-assessment matched my teacher’s feedback more closely, which made me feel more confident that I was evaluating myself accurately.” However, some discrepancies remained, particularly in grammar and vocabulary, where students still tended to overrate their abilities. The teacher-interviewer commented, “Students still tend to rate their grammar high, despite noticeable errors. This area could benefit from more focused prompts.” Despite these gaps, students found the toolkit valuable in clarifying academic expectations. As one student noted, “The toolkit showed me what teachers are looking for—clarity and structure, not just fancy words.”
This qualitative data supports the conclusion that the toolkit prompts students to critically reflect on their abilities, set strategic goals, and align their self-assessments with academic standards, all of which support their writing development. This aligns with existing research on self-assessment, which suggests that structured tools can reduce overestimation and encourage more critical reflection [92]. Rickert [90] argues that students often overestimate their abilities due to a lack of structured evaluation guidelines, while Zhang’s research [91] found that students struggle to assess areas like coherence and grammar accurately.
Furthermore, students’ reflections showed how the toolkit encouraged them to address overlooked aspects of writing, such as logical flow and coherence, aligning with Nicol and Macfarlane-Dick’s [75] recommendation that self-assessment tools should guide students in breaking down complex tasks into manageable components for critical evaluation. The emotional discomfort some students expressed in recognizing gaps in their writing skills also highlights the toolkit’s role in fostering critical self-reflection, as noted by Boud and Falchikov [93], who stress that self-assessment can challenge students’ perceptions, which is necessary for growth. For example, in nursing or medical education, the toolkit could be used to help students improve their reflective practices in clinical documentation by prompting them to assess the clarity, accuracy, and structure of patient notes. This would encourage them to critically reflect on their documentation skills and identify areas for improvement, much like how the toolkit helps students in this study evaluate their writing. Similarly, in pharmacy education, the toolkit could assist students in evaluating the quality of medication histories, prescription notes, and other written communications. Studies have shown that documentation errors in these fields—such as inaccurate medication records or incomplete patient histories—are common and can have serious consequences [94, 95]. The toolkit could help students in these fields develop stronger documentation skills by prompting more accurate and reflective assessments.
The closer alignment between students’ post-toolkit self-assessments and teacher-interviewer evaluations suggests that the toolkit improved the accuracy of students’ evaluations. This is particularly important given the findings of León, Panadero and García-Martínez [96], who note that self-assessments are most effective when they closely mirror external evaluations. The alignment observed in areas such as coherence and argumentation highlights the toolkit’s success in bridging the gap between student perceptions and academic standards [97].
Students also expressed increased confidence in their self-assessments post-toolkit. However, persistent discrepancies in grammar and vocabulary assessments suggest that these areas remain challenging for students to evaluate independently. This aligns with findings by Topping [98], who highlights the complexity of self-assessing technical aspects of writing, such as grammar, without additional scaffolding. The teacher-interviewer’s observation that students continued to overestimate their grammar skills underscores the need for further refinement of the toolkit. Adding more targeted prompts or detailed rubrics for technical aspects could enhance students’ ability to assess these areas accurately [91].