Elena Kkese(1*),

(1) Cyprus University of Technology; UCLan
(*) Corresponding Author


Two central issues in the assessment of direct writing tests, especially for L2 contexts, refer to the development of these tasks and the scoring procedures. These allow making inferences about the specific test takers’ writing ability and provide useful diagnostic information about what aspects of writing are mostly important for raters. This study was concerned with constructing specific rating scales based on written samples by Cypriot-Greek students in an EFL classroom context in an effort to examine and determine what aspects of writing are more important in L2 writing and how are these divided up. The examination of these written samples was conducted using two different approaches in an effort to come up with valid and reliable ways to evaluate L2 written samples. The developed rating scales addressed overall writing ability and spelling accuracy. The findings suggest that more emphasis on L2 writing may be given to accuracy rather than communicative effectiveness drawing attention to the need of involving tasks that provide the opportunity to students to reflect on content or topical knowledge.


second language writing; spelling accuracy; holistic scales; analytic scales; Cypriot Greek

Full Text:



Alderson, C. J., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation. United Kingdom, UK: Cambridge University Press.

Bachman, L. F. & Palmer, A. S. (2010). Language Assessment in Practice. Oxford, UK: Oxford University Press.

Bachman, L. F. & Palmer, A. S. (1996). Language Testing in Practice. Oxford, UK: Oxford University Press.

Baker, B. A. (2012). Individual differences in rater decision-making style: an exploratory mixed methods study. Language Assessment Quarterly, 9(3), 225–248. doi:

Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy & Practice, 18(3), 279-293. doi:

Bejar, I. I. (2012). Rater cognition: implications for validity. Education Measurement: Issues and Practice, 31(3), 2–9. doi:

Bøhn, H. (2015). Assessing spoken EFL without a common rating scale: Norwegian EFL teachers’ conceptions of constructs. Sage Open (October–December 2015): 1–12. doi:

Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31–51. doi:

Ellis, R., Johnson, K. E., & Papajohn, D. (2002). Concept Mapping for Rater Training. TESOL Quarterly, 36(2), 219–233. doi:

Fulcher, G. (2015). Re-examining Language Testing: A Philosophical and Social Inquiry. London, UK: Routledge. doi 10.4324/9781315695518

Fulcher, G. (2012). Scoring performance tests. In G. Fulcher & F. Davidson (Eds.), The Routledge Handbook of Language Testing. Oxford, UK: Routledge.

Galaczi, E. (2014). Content analysis. In A. J. Kunnan (Ed.), The Companion to Language Assessment (Vol. 3). Chichester, UK: Wiley-Blackwell. doi: 10.1002/9781118411360

Grabe, W. & Kaplan, R. B. (2014). Theory & Practice of Writing: An Applied Linguistic Perspective. New York, NY: Routledge.

Green, A. (2014). Exploring Language Assessment and Testing: Language in Action. New York, NY: Routledge. doi: 10.4324/9781315889627

Hsieh, C. N. (2011). Rater effects in ITA testing: ESL teachers’ versus American undergraduates’ judgments of accentedness, comprehensibility, and oral proficiency. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 9, 47-74.

Hulstijn, J. H. (2011). Language proficiency in native and nonnative speakers: An agenda for research and suggestions for second-language assessment. Language Assessment Quarterly, 8, 229-249. doi:10.1080/15434303.2011.565844

Huot, B. (1990). Reliability, validity, and holistic scoring: what we know and what we need to know. College Composition and Communication, 41(2), 201–213.

Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus on language assessment courses. Language Testing, 25, 385-402. doi:10.1177/0265532208090158

Jacobs, H., Zinkgraf, S. Wormuth, D. Hartfiel, V., & Hugley, J. (1981). Testing ESL Composition: A Practical Approach. Rowley, MA: Newbury House.

Kane, M. (2012). All validity is construct validity. Or is it? Measurement: Interdisciplinary Research and Perspectives, 10, 66-70. doi:10.1080/15366367.2012.681977

Krippendorf, K. (2013). Content Analysis (3rd ed.). Thousand Oaks, CA: SAGE.

Seedhouse, P., Harris, A., Naeb, R., & Ustunel, E. (2014). The relationship between speaking features and band descriptors: A mixed methods study. IELTS Research Reports Online Series. IELTS Partners: British Council, Cambridge English Language Assessment and IDP: IELTS Australia.

Taylor, L. & Galaczi, E. (2011). Scoring validity. In L. Taylor (Ed.), Examining Speaking: Research and Practice in Assessing Second Language Speaking (Vol. 30, pp. 171-233). Cambridge, UK: Cambridge University Press.

Underhill, N. (1987). Testing Spoken Language: A Handbook of Oral Testing Techniques. Cambridge: Cambridge University Press. doi: 10.1017/S0272263100007361

Vaughan, C. (1991). Holistic assessment: what goes on in the rater’s mind? In L. Hamp-Lyons (Ed.), Assessing Second Language Writing in Academic Contexts.

Weigle, S. C. (2002). Assessing Writing. Cambridge: Cambridge University Press. Retrieved from


Article Metrics

Abstract view : 959 times
PDF - 297 times


  • There are currently no refbacks.

Copyright (c) 2018 Elena Kkese

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.