Profiling Collocation Use in English Textbooks for Vietnamese Students

Collocations have been proven to be significantly important to language learners. To Vietnamese learners, English textbooks compiled by the Ministry of Education and Training are the major source of language exposure. This study, therefore, investigates (1) collocational profiles of the English textbook series for students from elementary to high schools, (2) the relevance of collocations targeted to high frequency collocation lists suggested in the literature, and (3) the recycling of the targeted collocations. The study is a corpus-based study of verb-noun and adjective-noun collocations. An English corpus of 312,770 word tokens was built from the textbook series from which 13,292 collocations of verb-noun and 11,079 collocations of adjective-noun patterns were identified. The study found that frequencies of occurrences of collocation tokens and types increase from one grade level to another. Collocations targeted in the textbooks only cover 10.5% of the collocations recommended in an academic collocation list, and 31% were identified not high-frequency collocations. 76% of collocations are not recycled or recycled not to the point where learning is likely to occur. Implications for learning and teaching collocations and materials designing are discussed


A. Introduction
It is widely accepted that collocations play an integral part in language teaching and learning. Research in this area has shown that the teaching of collocations improves not only students' overall vocabulary knowledge but also their communicative competence. With the use of collocations language learners can be more confident in their communication and their language is more native-like. However, previous studies have found that collocation knowledge is not easy to develop and that many language learners, even advanced learners, have problems with collocations, especially verb-noun and adjective-noun collocations (Author, 2019 (blinded for review; Nesselhauf, 2003;Siyanova & Schmitt, 2008).
In English as a foreign language (EFL) contexts where learners can hardly access native speakers of the target language, textbooks turn out to be one of the major sources of exposure to the target language for language learners. In Vietnam, the series of English textbooks for students from elementary to high school are compiled by the Ministry of Education and Training. As specified in Circular 32/2015 of the Ministry of Education and Training, by the time the student graduate from high school they should have acquired around 2,500 to 2,800 English words. This number of words tends to correspond with the New General Service List of 2,801 words built by Browne et al. (2013).
Understanding the importance of collocations, textbook writers tend to include collocations in the introduction of lexical items in language textbooks. The series of English textbooks for Vietnamese students from elementary to high schools is of no exception. However, little is known about whether the textbooks present high frequency collocations as featured in existing collocation lists, both general and academic lists. It is also not clear whether collocations targeted in the series of English textbooks are recycled systematically to facilitate the students' learning. As research in second language acquisition has shown, the mere exposure to a language feature is not sufficient for it to be acquired; it also requires rehearsal or repetition for short-term memory to be "transferred to long-term memory" (Hummel, 2020. p.76). Thus, the present study, with a particular focus on verb-noun and adjective-noun collocations, aims to address the following questions: 1. What are the collocational profiles of the series of English textbooks for Vietnamese students from elementary school to high school? 2. To what extent are the collocations introduced in this series of English textbooks relevant to the collocation lists suggested in the literature? 3. Are the collocations targeted in this series of English textbooks recycled to facilitate the acquisition of collocations?

Collocation identification
Collocations are defined slightly differently depending on whether they are viewed as a statistical or phraseological phenomenon. Scholars adopting statistical approach define collocations as the co-occurrence of words with some defined level of frequency (Clear, 1993;Durrant, 2008;Sinclair, 1991;Stubbs, 2002b). Mutual Information which as Hunston (2002, P.71) describes is to measure 'the amount of non-randomness present when two words cooccur' is used in many studies as a statistical measurement for collocation identification. It is, however, considered not a very useful statistic on its own since it emphasizes rare words (Baker, 2006;Gablasova et al., 2016;Kilgarriff & Kosem, 2012). Take, for example, the cooccurrence of prophesy and disaster which gains a very high MI score of 8.21 despite occurring once in the British National Corpus (BNC) (Tsai, 2015). For scholars of this tradition, the syntactic relation between elements of a combination does not play a role in determining whether a combination is a collocation or not. Instead, elements of a combination have to cooccur within a 'span', the number of words before and after the node, the element being considered (Stubbs, 2002a). Although there is no consensus with regard to the span and MI score threshold, Jones and Sinclair's (1974) span of 4 and MI score of 3.0 are widely adopted when calculating frequency (Durrant & Schmitt, 2010). Being aware of the weakness of the MI score, some researchers use t-score (> 2.0) as an additional criterion in collocation identification to ensure that a combination is frequent enough in language use to be considered as a collocation (Tsai, 2015).
In contrast, according to scholars adopting the phraseological approach, a word combination has to be of a particular grammatical pattern in the first place to be considered as a collocation (Bahns & Eldaw, 1993;Cowie, 1994;Nesselhauf, 2005). A collocation consists of a base, which is always the noun heads in patterns containing nouns such as verb + noun (e.g., pursue studies) and adjective + noun (e.g., heavy smoker), and a collocate (e.g., pursue and heavy). Collocation identification from this angle often involves the distinction between free combination and idiom (Benson, 1989). Restricted substitutability is a widely used criterion among scholars of this tradition to distinguish between collocations and free combinations of which the substitutability of elements depends solely on their semantic properties (Cowie, 1994;Laufer, 2010;Nesselhauf, 2003). The distinction between collocations and idioms is based on the transparency in meaning of the word combination. Perform a task and kick the bucket are examples of collocations and idioms respectively since the meaning of the former word sequence is the combination of the meanings of individual words whereas the meaning of the latter is not.
Given the discussion above, collocations in this study will be identified based on the combination of both statistical and phraseological approaches. This combination allows the researchers to take advantage of the frequency measurement of the statistically-based approach while only taking into consideration combinations of specific syntactic patterns. In this way, combinations with very high MI score and t-score such as and respectively, between and (Durrant, 2008) or any other, each other (Shin & Nation, 2008) which are not pedagogically valuable will not be considered as collocations

Collocation lists
Some collocation lists have been built by scholars with the aim of focusing the teaching and learning activities around collocations of high frequency. Shin & Nation's (2008) list of 100 collocations seems to be the first collocation list which, according to these authors, is intended to be introduced to beginning and low intermediate learners of English whose vocabulary size is around 1,000 words. For the list to be built, the authors used the 10 million word spoken section of the BNC as the data source. They built it based on four criteria: (1) the base must be a noun, a verb, an adjective, or an adverb, (2) they must occur in the most frequent 1,000 content words in the BNC, (3) the collocations have to occur at least 30 times in the considered corpus, and (4) the collocations should be grammatically well-formed. Since only the base in a collocation is the content word, the majority of collocations in the list are not pedagogically meaningful to introduce to learners (e.g., a bit, as well, said to). A close scrutiny of the list revealed that only six of the collocations in the list are of the verb-noun and adjective-noun patterns. As such, this collocation list is not an exemplar for the present study whose focus is on collocations with both the base and the collocates being content words.
Besides Nation and Shin's (2008) collocation list, Durrant's (2008) list of 1,000 collocations appears to be the first list whose focus is on academic collocations. To build the list, as the author describes, he had to compile a corpus of 25 million word tokens from research articles in scholarly journals. From the self-compiled corpus, he filtered high frequency collocations through a rigorous selection of criteria and procedures. In particular, to begin with, Durrant used log-likelihood-based 'keyness' techniques to extract those pairs which appear significantly more frequently in academic than in non-academic texts. To be included for further consideration, those pairs have to occur at least once per million words in the academic corpus. Another criterion to be met is that the MI-scores of those pairs have to be above the threshold of 4.0. Though built through a rigorous procedure, the list reveals some weaknesses. First, it includes structurally incomplete combinations (e.g., identified been, effect long, factor such). Second, there are quite a lot of free combinations in the list such as these features/ these conditions which are not pedagogically valuable to language learners. Another academic collocation list appearing in the literature is the one developed by Chon and Shin (2013). It was also built through a thorough procedure with precise steps; however, it is no better than Durrant's (2008) list for the same reasons. Free collocations such as these data/this data which do not present much difficulty to learners abound in the list.
One other academic collocation list was developed by Ackermann and Chen (2013). This list is believed to overcome limitations of the two academic lists discussed above in that it only includes structurally complete collocations (Lei & Liu, 2018). Collocations in the list are of common syntactic patterns that most language educators or lexicographers focus on (e.g., verbnoun, adjective-noun, adverb-adjective, verb-adverb, adverb-adjective). To develop the list, the authors employed a five-step procedure: (1) from a list of 1,300 most frequent content words which were considered as the bases in collocations, extracting combinations with specific syntactic patterns by using Pos-tagging, (2) filtering combinations that meet statistical measures with cut-off point of MI-score ≥ 3 and t-score ≥ 4, (3) manually checking to exclude combinations based on some criteria (completeness of the combinations, degree of fixedness, meanings, and hyphenated forms), (4) expert reviewing the appropriateness and relevance of each entry for pedagogical purposes, and (5) systematizing the list (e.g., by adding articles, lemmatizing the entries, and adding dominant prepositions). The process resulted in a list of 2,468 entries of which 1,773 are adjective-noun and 310 verb-noun collocations. Eliminating the shortcomings of the other two lists, Ackermann and Chen's (2013) list constitutes a valuable resource for language learners whose target is to acquire collocations for academic purposes. It is, therefore, used as an exemplar to check against in this study.

The recycling of collocations to learners' collocation acquisition
It is widely known that to facilitate vocabulary learning vocabulary should be taught and learned deliberately (Ellis, 2008;Laufer, 2005;P. Nation, 2007). However, it does not mean that once learned, vocabulary will be retained and are ready to be used throughout one's life. Research has shown that it is of crucial importance for vocabulary taught and learned to be recycled to be retained in learners' long-term memory (Brown et al., 2008). Nation (2005) even warns that vocabulary teaching without proper recycling will not bring the desired results; instead, it will be a wasted effort. Due to the difference in the opportunities to practice the vocabulary learned in a natural way, EFL learners need more recycling compared to ESL learners (Kojic-Sabo & Lightbown, 1999), and the recycling should be done over an extended period of time (Harwood, 2000;Lewis, 1993). Likewise, multiple encounters of lexical items have been proven to bring positive effects on learners' vocabulary acquisition in some other studies (Peters, 2016;Rott, 1999;Webb et al., 2013;Webb & Kagimoto, 2010). Although these studies have yielded mixed results regarding the number of encounters to best facilitate the learning, the general consensus is that the more frequently a string of words is encountered, the higher chance of it to be recalled. It can be found from these studies that the lowest number of times that a collocation should be recycled to secure some learning gain is three times (Peters, 2016;Webb & Kagimoto, 2009).

Research Design
This study is a corpus-based study of verb-noun and adjective-noun collocations in a series of English textbooks for Vietnamese students.

The corpus of English textbooks
The corpus of English textbooks was compiled from the new series of English textbooks used for students from Grade 1 to Grade 12 in Vietnam, comprising almost 313,000 words. The textbooks were developed for the curriculum in Vietnamese schools from elementary to high school level. For elementary school level, there are five English textbooks for Grade 1 to Grade 5 students. Each grade book in this level consists of 20 units published in two volumes (Units 1 to 10 in Volume 1 and Units 11 to 20 in Volume 2; there is a review unit after every five units). For secondary school level, there are four English textbooks for Grade 6 to Grade 9 students. Each book in this level consists of 12 units published in two volumes (Units 1 to 6 in Volume 1 and Units 7 to 12 in Volume 2; there is a review unit after every three units). For high school level, there are three English textbooks for Grade 10 to Grade 12 students. Each book in this level consists of 10 units published in two volumes (Units 1 to 5 in Volume 1 and Units 6 to 10 in Volume 2; there is a review unit after every three units).
The series of textbooks were designed based on communicative language teaching approach, focusing mainly on developing students' listening and speaking skills. This new series has recently been implemented in some leading schools across all grades in Vietnam.

Corpus analysis
To address the first research question, first we ran a concordance to extract all of the nouns in the corpus. Next, we used POS-tagging (Part-of-Speech tagging) in the Sketch engine to extract all of the adjectives and verbs that collocate with the noun bases extracted. After extracting all combinations of verb-nouns and adjective-nouns, we then examined them against the BNC for their statistical measures to identify collocations. The MI-score threshold of ≥ 3 and t-score of ≥ 4 used in Ackermann and Chen's (2013) study were adopted in collocation identification in this study since their list would be checked against. Collocation profiles of the textbooks comprise collocations occurring not only in vocabulary sections but also elsewhere in the textbooks on the grounds that even though not targeted, collocations can still be learned accidentally (Webb et al., 2013).
To address the second question about the extent to which collocations targeted in the series of English textbooks are relevant to the collocation lists suggested in the literature, we first identified which collocations are targeted. According to the preface of the textbooks, they are collocations presented in the vocabulary sections of each unit. They are also collocations, embedded in the first listening or reading section of each unit, pertaining to the topics. After the targeted verb-noun and adjective-noun collocations were identified and extracted into a separate spreadsheet, we then examined them against the list built by Ackermann and Chen (2013) since as discussed above it is the most valuable list of academic collocations. However, comparing to the academic collocation list alone would not present the whole picture of how effective the textbooks are in introducing collocations to learners since general collocations, which are worth in-class teaching time (e.g., free time, pay attention, strong tea), do not occur in Ackermann and Chen's (2013) list built from a corpus of academic written texts. Since there is no general collocation list in the literature worth considering for the purpose of this study, we decided to take one step further. That is, we would examine whether collocations targeted in the textbooks are high frequency collocations to introduce to the learner. Instead of developing a general collocation list, we assessed targeted collocations of the textbook series through a threestep procedure. First, we extracted nouns from the New General Service List (NGSL) of 2,801 words, which is the basis for curriculum design prescribed by the Ministry of Education and Training. Then, we checked if the noun bases of the targeted collocations in our textbook corpus occur in the list of nouns extracted from the NGSL. If not, collocations containing those nouns would not be considered high frequency collocation and therefore would be concluded not worth focusing on. If yes, they would be put through the final step -checking the frequency of occurrences in the BNC. If they occur at least 30 times in the corpus, they are high frequency collocations and hence are worth deliberate teaching.
The last research question concerns whether the collocations targeted in the series of textbooks are recycled in a systematic and principled manner to facilitate the learning. To this end, we looked up and counted manually the number of re-occurrence of each targeted collocation throughout the textbook series.

Findings
The first research question concerns the collocational profiles of the English textbook series for students from elementary to high school in Vietnam. To address this research question, a corpus of 312,770 word tokens was built from the series of English textbooks. 26,589 combinations of verb-noun and 11,700 combinations of adjective-noun pattern were then extracted. They were next run against the BNC for statistical check (MI-score ≥ 3, t-score ≥ 4). 13,292 verb-noun and 11,079 adjective-noun combinations met the threshold and were identified collocations. Table 1 below presents the number of collocation types and tokens occurring in textbooks of each grade. The distribution of collocation tokens and collocation types increases gradually from one grade to another, showing the level upgrade from beginning to upper intermediate level, except for the textbook English 7 and 10 which contains fewer collocations types and tokens than that of the lower grades. The frequencies of occurrences of collocation tokens and types per 1000 words of both verb-noun and adjective-noun patterns at the three grade levels are presented in table 2. The frequencies of occurrences of collocation tokens per 1000 words of both verb-noun and adjective-noun patterns increase from 52.08 at elementary grades to 76.70 at secondary grades and to 84.40 at high school grades. The frequencies of occurrences of collocation types per 1000 words show a similar trend, increasing from 22.92 at the elementary grades to 49.97 at secondary grades and to 53.89 at the high school grades. This suggests that overall students are exposed to increasing numbers of collocational exemplars over the course of curriculum.
To address the question about how relevant targeted collocations are to the collocation lists recommended for language learners, we extracted collocations that the textbooks target from the list of collocations identified from the procedure described in the Methodology section above. They are collocations introduced in the vocabulary sections and those typographically enhanced (underlining, highlighting, italicizing) in the first listening or reading passage of each unit. 542 verb-noun and 536 adjective-noun collocations were identified targeted collocations in the whole textbook series. They were then checked against the Academic Collocation List built by Ackermann and Chen's (2013) of 1,773 adjective-noun and 310 verb-noun collocation items. Table 3 below shows the number of collocations targeted and the percentage of them appearing in the Academic Collocation List. As can be seen from the above table, there are 45 out of 542 targeted collocations of verbnoun and 174 out of 536 adjective-noun pattern occurring in the high frequency Academic Collocation List. In total, one fifth of the targeted collocations of both patterns appears in the ACL; however, in terms of coverage, the number of targeted collocations that the textbook series introduce accounts for only 10.5% of the collocations in the ACL, meaning that the textbooks do not prepare the learners well with respect to academic language.
To have a full picture of how well the textbooks prepare the learner in terms of general collocations, we examined whether the targeted collocations are highly frequent in language use on the basis that high frequency collocations are essential and are therefore worth deliberate teaching. Surprisingly, from the list of targeted collocations extracted we found that a great deal of collocations (68 verb-noun, 55 adjective-noun) are duplicated in different units of different grade books. Take, for example, ride a bike. This collocation is first introduced in textbook English 2 Unit 3, and is introduced again as a new collocation in English 4 Unit 7 and English 6 Unit 12.To identify whether a collocation is a high-frequency collocation, we first extracted noun lemmas from the NGSL of 2,801 words of different parts of speech. This resulted in 1,455 nouns. Checking the noun bases in the targeted collocations against the list of nouns extracted from the NGSL, we found that 36 nouns of the verb-noun collocations and 26 nouns of the adjective-noun collocations targeted in the English textbook series do not occur in the NGSL (see Appendix A). If those noun bases do not occur in the high frequency list, collocations containing them are arguably not highly frequent and therefore should not be prioritized to teach learners at this word band level. The rest of the targeted collocations were then checked against the BNC for their occurrence (≥ 30 times). Table 4 presents the number of collocations that are not highly frequently-used and are duplicated in different units. Adding the number of targeted collocations duplicated in different units across the textbook series to the number of collocations identified not highly frequent results in 331 out of 1,078 targeted collocations (31%). This means that a large number of unnecessary collocations are deliberately introduced in the English textbooks.
To answer the last research question, we counted and tallied the number of re-occurrences of the targeted collocations. Table 5 summarizes the number of targeted collocations recycled in different units across the textbook series. As can be seen from this table, the number of targeted collocations of both verb-noun and adjective-noun patterns not recycled at all is 658 collocations (61%). 159 collocations are recycled in only one to two units, and as discussed in the literature this mere repetition of collocations is not really meaningful in consolidating learners' knowledge of the collocations learned before. If three times is the least number of re-occurrence of a word sequence to secure some learning gain as discussed in the literature, the total number of collocations that meet that occurrence threshold is 261 collocations (24%). Some instances of collocations recur over ten times across the series are pay attention (occurring in 20 units), do homework (20 units), take turns (14 units), social media (11 units), and future generations (11 units).

Discussion
The study found that there is an increase in the number of collocation tokens and collocation types of both patterns from one grade to another in the English textbooks for the twelve grades, except for grade 7 and 10. The substantial rise of the number of collocations in the secondary grades compared to the elementary grades is understandable when considered in relation to the text length of the teaching materials. At the elementary level, especially grade 1 to 3, textbooks contain mostly single words and very short and simple sentences; therefore, they have low collocational density. Collocation density and diversity of the textbooks series tallied at three grade levels are even much higher than that of native speakers' corpus tallied in Tsai's (2015) study, which reported that the frequencies of occurrences of collocation tokens and collocation types per 1000 words are 19.35 and 10.88, respectively. This is properly a good sign that the textbook authors purposefully provide a more intensive exposure to collocations in the target language by including them in vocabulary sections and exercises. However, regarding the low coverage of the Academic Collocation List, learners are not well-equipped with collocations for academic use at tertiary level. To better prepare the learner for further study, it is advisable to add more academic collocations, especially into high school textbooks. This is the level at which learners are assumed to be more familiar with essay writing of different genres.
The study found that though introducing deliberately a large proportion of high frequency collocations, the textbook series contain a considerable amount of the targeted collocations (31%) identified either duplicated in different units or not highly frequent. This suggests that the textbook authors do not have a clear and effective plan on controlling and regulating targeted collocations in different units in each grade book and throughout the textbook series. The recycling of collocations learned is claimed to bring significant benefits in helping learners consolidate the collocational knowledge and is strongly recommended in the literature. Considering the constrained and valuable class time, targeting repeatedly the same collocations over different units is, however, a wasted effort. A taught collocation in one unit can be recycled by being embedded in different exercises or activities in later units throughout the textbook series. Deliberate teaching time, therefore, should be spent on other high frequency collocations rather than on the repetition of the same collocations. A possible solution to this is that textbook authors should also have a clear route for controlling what targeted collocations to introduce at which unit. In this way, textbook writers can not only avoid repeating the same targeted collocations in different units but also ensure that the number of targeted collocations are introduced proportionately in each unit.
Regarding the noun bases constituting targeted collocations, the study found that 62 do not appear in the high frequency list NGSL. The textbooks also failed to target established collocations in 146 instances whose bases belong to the NGSL. This means that the selection of targeted collocations to give explicit attention is not optimal. This seems to be a shared problem to material writers in general and therefore needs to be taken into account when designing lexical syllabus. Particularly, collocations with high pedagogical value should be preselected in the same way individual lexical items are. The selection of typical collocations should also take into account learners' target vocabulary level. This is because for some cases the collocation itself is very typical with high MI-score, the base of which, nonetheless, lies outside the targeted vocabulary list for the learner at a specific level.
Despite the empirical evidence of the importance of recycling to vocabulary retention, the frequency distribution of collocations in the textbooks does not seem to respond to the call for collocation recycling. Less than one fourth of the targeted collocations are recycled three times or more throughout the textbooks, revealing that the textbooks' authors failed to pay due attention to collocation recycling. Admittedly, it is almost unrealistic to expect EFL textbook authors to present a wide range of collocations and at the same time recycle them as many times as recommended in the literature. Also, collocations congruent to learners' L1 tend to pose no difficulty (e.g., do housework, have time, spend time) and hence might not need to be revisited as many times as those that are not. If sharing the first language with the learner, as in the case of textbook authors of this textbook series, they might take this advantage to recycle more frequently collocations that are supposed to pose problems. Regardless of those arguments, it is concerning that the vast majority of targeted collocations are not recycled to the point where learning is likely to occur while those that are not worth the investment of valuable class time are used repeatedly. To make up for the inadequate repetition of targeted collocations and to increase learners' exposure to a large number of collocations generally, it is advisable for teachers to use the textbooks in combination with other substantial extensive listening and reading materials.
Another possible solution to eliminate this shortcoming of the textbook series is to raise teachers' and learners' awareness of the importance of recycling lexical items taught and learned. Once being aware of the significance of revisiting lexical items, teachers, who are in direct contact with the learner and use the textbooks, can recycle them creatively through productive activities. The inclusion of collocations in test questions should also be encouraged so as to maintain the awareness of the collocation phenomenon in vocabulary teaching and learning. By doing so, collocations can be treated as equally as many other lexical items in all lesson plans and syllabi implemented for the sake of learners' language competence and performance.

E. Conclusion
This study found that the textbook authors did pay certain attention to collocations with the emphasis on the increase of collocation tokens and collocation types, providing learners with an appropriate amount of collocations for use in accordance with their language capacity. However, the textbook authors seem to underestimate the significance of the Academic Collocation List in selecting targeted collocations. The study, therefore, calls for a textbook revision implemented in all grades with more academic collocations from the Academic Collocation List included and more high-frequency collocations recycled. With the current textbook series, teachers are expected to play more proactive roles in creating activities that enable a systematic and interesting recycling of targeted collocations. Textbooks are important as they can maximize learners' exposure to useful vocabulary in learners' learning process. So are teachers since they play the decisive roles in how effectively the textbooks are used. Therefore, guidance on how to use the textbooks efficiently needs to be clearly communicated.