Syntactic and Lexical Complexity of Undergraduate Students’ Essays: A Comparison Study between L1 and L2 Writings

ABSTRACT


Introduction
In recent years, psycholinguists, particularly SLA researchers, lexical and syntactic complexity research have been an important concern in SLA theory conceptualizing both dimensions of language's evolving complexity (Casal & Lee, 2019). Most researchers investigate lexical complexity and syntactic complexity separately since measurements of each complexity is distinct and specific for each other. Those who investigated syntactic complexity utilized various syntactic complexity measures as a potential standard of Indonesian Journal of English Language Teaching and Applied Linguistics, 5(2), 2021 language competence or improvement (Bulté & Housen, 2018;Crossley & Mcnamara, 2014;Mazgutova & Kormos, 2015). Furthermore, according to Ortega (2003), syntactic complexity has to do with the variety of appearing in language production as well as their level of sophistication. According to this definition, Norris & Ortega (2009) proposed the measurement of syntactic complexity based on the most commonly employed length. Another common measurement is considered to the mean of subordination clause by computing all clauses and dividing them over a given production unit of choice, for example, T-unit (Elder & Iwashita, 2005) or C-unit (Skehan & Foster, 2012). The less common measurement is based on the mean of a coordinate clause in the T-unit, which is suggested for low-level proficient (Bardovi-Harlig, 1992).
On the other hand, lexical complexity refers to a study of choosing words used in spoken or written language. According to Lu (2012), lexical complexity can describe the writer's communication's skill to express his/her idea effectively in written form. The effectiveness of the writer's communication skills can be presented by measuring lexical complexity in second language acquisition through three parts: lexical density, variation, and sophistication. The lexical density is compared by the number of lexical words (nouns, verbs, adjectives, and adverbs) to the total number of words in the essays. On the other hand, lexical variation has to do with how varying the words used in the essay by using Type Token Ratio (TTR), the researchers counted the type of the words and compared it with the amount of token words. The last measurement is lexical sophistication, which measured the advanced or unusual words' ratio to the total amount of lexical words in the essay.
Lexical and syntactic complexities are commonly employed to measure learners' productivity, especially in writing and speaking. On the other hand, it is essential to set an objective measurement to investigate writing and speaking since their measurements are inevitably biased and subjective. Therefore, this study is critical to conduct since measuring the students' lexical and syntactic complexity will provide the teachers with a trustworthy picture of how successfully the teaching-learning process in the class. Previously, the teachers assumed their students' success by looking at their writing or speaking with a very subjective and biased decision. This measurement will provide more objective ways in looking at the students' productivity, especially in writing.
There are several genres of texts which learners need to master; one of them is argumentative writing. This kind of text provides learners with a set of logical learning about a particular topic by students of English Literature students. It is necessary not only for their preparation of writing an undergraduate thesis but also for their daily writing courses primarily academic writing. The researchers chose argumentative essays since they could depict the learners' logical reasoning and be expected to mention sophisticated or technical words more often than any other writing genre.
Recent studies have showed a vital concern on investigating the lexical complexity and syntactic complexity in L2 context (Arya et al., 2011;Douglas & Miller, 2016;Johnson, 2017;Lahmann et al., 2019;Mazgutova & Kormos, 2015;Vaezi & Kafshgar, 2012;Yoon, 2017) but none of them have studied about the comparison between lexical and syntactic complexity in L1 and L2 writings. Most of the previous research gain their data from written text (Arya et al., 2011;Johnson, 2017;Mazgutova & Kormos, 2015;Vaezi & Kafshgar, 2012;Yoon, 2017), speaking performance (Lahmann et al., 2019), and the students' understanding on reading (Douglas & Miller, 2016). Therefore, this study investigates the lexical and syntactic complexity from different points of view from undergraduate students' L1 and L2 writings. The present study differs from the previous studies in terms of research design and source of data. The comparison study is utilized in expectation to see whether there is a difference in lexical and or syntactic complexity between students' L1 and L2 writings.
In respect to the consideration of comparing the L1 and L2 writings, the researchers gain support from the assumption that the majority of people consider "writing fluency" in a foreign language to be more difficult than writing in their native tongue (Waes & Leijten, 2015). Even though they possess a high level of competence in the second language, most language learners realize and are frustrated by the actual fact that their L2 writing skills are not as fluent as in L1 (Segalowitz, 2010). Several researches have shown the L2 learners produce L1 texts with greater fluency than L2 texts. The fact is not a surprising finding in and of itself, but it does raise the intriguing question of which variables help us better to depict and to explain these differences (Lindgren et al., 2008;Ong & Zhang, 2010;Segalowitz, 2010).
According to Waes & Leijten (2015), productivity of writing texts in L1, such as process and product is more productive, but their research is more about investigating the processes which characterized the writing fluency. The processes are described through four components: 1) production, 20 process variances, 3) revision, 4) pausing behavior. Although their study aims to find the production of the writers but it is, in fact, seen as less detail by noticing how many words were produced in a certain time. The present researchers argued that this kind of production needs to be investigated further by considering the syntactic and lexical complexity expected to give new evidence of how L1 and L2 writers differ. Waes & Leijten (2015) noticed that to complement their research. There is a need to see the comparison of L1 and L2 writings from another aspect, such as their productivity. Therefore, the present researchers come up with this study as a complementary.

Literature Review 2.1 Second Language Acquisition
SLA has been gradually more vital for the human species since their everyday activity in learning new language embedded is increased in concern. In many places of the world, humankind is demanded to adapt to their environment, which commonly forces them to acquire a new language (Doughty & Long, 2008). Like what was cited from Charles Darwin's book "The Origin of Species" that said 'the survival of the fittest', humankind also faces a new language as a thread; therefore, those who can fit to their environment linguistically will be more survived than those who cannot. A similar thing happens worldwide, when English as an international language should be acquired by billions of people who have their first language and therefore demand them to be bilingual or even multilingual (ibid), and most of the 'victim' here are children or students.
Several researchers in the late 1960s and early 1970s identified that second language (L2) is systemic, as opposed to random errors, as derived from a conception of interlanguage. Internalization of learning a mental grammar has been conducted, linguistic rules and principles should be internalized into certain terms, according to a recent claim about interlanguage. The nature of interlanguage proficiency, the extent to which interlanguage grammars are similar to other grammars, and the role of Universal Grammar are all explicitly stated (Doughty & Long, 2008).
In sum, second language acquisition is still an essential subject in psycholinguistics which the researchers want to focus. The topic of SLA, which is the main point of this study, is related to productivity in writing, especially in lexical and syntactic complexity, explained in further detail.

Lexical Complexity
The term "lexical complexity" attributes to the study of how students elected words in language both spoken or written. According to Lu (2012), a writer's skill to communicate effectively in writing is indicated as lexical complexity. The three aspects of lexical complexity are lexical density, lexical sophistication, and lexical variation. Counting the lexical word (i.e. noun, verb, adjective, and adverb) is counted to consider lexical density, then it is compared with the accumulation of words in the essays. While lexical variation deals with how vary the words used in the essay. It is divided the total number of word types with the total number of word tokens or so-called TTR. The last measurement is lexical sophistication which counted the ratio of the advanced or unusual words to the total number of lexical words in the essay.
It is multi-dimensional, according to some recent lexical richness models. As a result, it should be measured using a variety of lexical measures in a complementary manner. Skehan & Foster (2012) and Bulté & Housen (2012) concepts are implemented in this present study along with Read's (2000) lexical richness framework. Bulté & Housen (2012) correctly indicate that language-related complexity ought to be considered from at least three perspectivesleading to the same conclusion as Read (2000), who stated that lexical complexity can be measured using three subcomponents: lexical diversity, lexical density, and lexical sophistication. The compositionality of words (morpheme and syllable structure) is the fourth sub-component to be included in the analysis of lexical complexity, which is more useful for measuring learner's writing production by lower competence English learners or younger learners but not good enough to measure text production by high competence English learners (Bulté & Housen, 2012).
The researchers argued that in order to determine one's lexical complexity, measurements should include at least the three subcomponents mentioned previously (Bulté & Housen, 2012;Read, 2000). Each of the sub-components, according to the researchers, can be captured by a variety of measures, some of which are more stable for different text samples than others-for example, lexical diversity can be measured by TTR, lexical density-by the ratio of content words relative to functional words or total word counts, and lexical sophistication-by the ratio of less frequent words.

Syntactical Complexity
In second language (L2), the syntactic complexity can be seen in terms of syntaxic variation and sophistication or, more precisely, the variety of the produced syntactic structures and their sophistication levels. Syntactic complexity in the teaching and research of L2 writing has been identified as a crucial structure. The extension of the syntax repertoire of a learner is essential for its progress in the target language (Ortega, 2003).
The "multi-dimensional construct" which is viewed as syntactic complexity should be operationalized (Norris & Ortega, 2009) is gaining traction, with global (e.g., mean length of T-unit), clausal (e.g., subordinated or coordinated clauses per T-unit), and phrasal (e.g., mean length of clause, complex nominals per clause) subconstructs being included (e.g., Lu, 2011Lu, , 2017Norris & Ortega, 2009). This extension provides a conceptualization of how linguistic features are used to highlight and structure complexity in order to construct various meanings (Casal & Lee, 2019). It supports Lu's (2010) definition of syntactic complexity as the use of sophisticated and various structures that allow expansion of the capacity to use the additional language in ever more mature and skillful ways, tapping the full range of linguistic resources offered by the given grammar to successfully fulfill various communicative goals (Ortega, 2015).
Furthermore, Ortega (2003) explains the range of forms that appears in language production as well as their level of sophistication is known as syntactic complexity. The length of the production item, the quantity of inserting clauses, the range of structural types, and the sophistication of the particular structures are all used to classify syntactic complexity. As an outcome of this characterization, a variety of global measures emerge. Norris & Ortega (2009) proposed measuring syntactic complexity using the most commonly used length, based on this definition. In the acquisition of language by children, length-based measures are commonly used (Atkinson-King & Brown, 1975). Another common metric is the mean of subordination clauses, which is calculated by multiplying all clauses by a given production unit, such as T-unit (Elder & Iwashita, 2005) or C-unit (Skehan & Foster, 2005). The less common measurement, which is suggested for low-level proficient, is based on the mean of a coordinate clause in T-unit (Bardovi-Harlig, 1992).
For describing syntactic complexity of L2 writing, a wide range of measures have been proposed. The measures cover up production item length, the number of subordinate and coordinate clause, the variation of syntactic structure used in the text, and the level of specific sophisticated syntactic structures. In several L2 improvement lessons these complexity and fluence measurements were researched to identify valid and trustworthy development indicators that L2 teachers and researchers could use to accurately identify and illustrate the developmental level or global fluency of a learner in the second language (Larsen-Freeman, 1978;Ortega, 2003;Wolfe-Quintero et al., 1998). Most of earlier researchers looked at syntactic complexity by employing one or more of the six syntactic complexity measures: mean length of sentence, mean length of T-unit (Stockwell & Harrington, 2003), mean length of clause (Beers & Nagy, 2009), T-units per sentence, clauses per T-unit (Beers & Nagy, 2009;Ellis & Yuan, 2004), and dependent clauses per clause (Beers & Nagy, 2009;Ellis & Yuan, 2004). Lu (2011) proposed comparing the complete range of interest methods in a single study by means of a standard learner data to discover the most trustworthy indicators of syntaxic complexity in L2 writing. In her results study, she proposed that CN/C and MLC are the best candidates for syntactic measurements since it can distinguish learners' writing production in all levels in which these measurements improved in all four levels of English proficiency being tested. The next group includes CN/T, MLS, and MLT, all of which distinguished two adjacent levels and progressed generally relative to the school level.
As a result, the focus of this research will be on college students who are suspected of having proficiency levels ranging from low to middle. The researchers will then investigate syntactic complexity in this study using dependent clause per clause (DC/C), clause per sentence (C/S), coordinate phrase per clause (CP/C), mean length of sentence (MLS), and mean length of Indonesian Journal of English Language Teaching and Applied Linguistics, 5(2), 2021 clauses (MLC). Furthermore, only Indonesian essays will be manually calculated, while English essays will be computed using Syntactic Complexity Analyzer (Lu, 2010).

Research Methodology 3.1 Research Design
The present research employed a quantitative study based on corpus analysis taken from 134 essay writings of undergraduate students' English Literature at the University of Mulawarman. This study design is a comparative study in order to find whether it is found a difference between the lexical and syntactic complexity of L1 and L2 writings of undergraduate students or not.

Participants
The subjects or respondents of the research were 67 second-semester undergraduate students of the English Literature study program at the University of Mulawarman. They were requested to do two argumentative writings for the same topics in English and Indonesian. Therefore, there were 134 essay writing in which Indonesian and English essays' corpus size are 23.317 and 27.540, respectively. The researchers did not define the writings' topic to make the respondents feel free to write based on their interest to depict their lexical and syntactic complexity naturally.

Instruments
As the researchers collected two different language writings, they needed two different instruments to determine the lexical and syntactic complexity. For English writings, the researchers used Web-Based Lexical Complexity Analyser and Web-Based Syntactic Complexity Analyser proposed by (Lu, 2010). While for Indonesian writings the researchers investigated the lexical complexity manually by following the formula as follows: Lexical Density = Lexical Sophistication = ℎ Lexical Variation = Afterwards, the researchers proceed the syntactic complexity by utilizing the following formula: After getting the result from the previous measurements, the researchers calculated it using paired sample t-test to see if there were any differences between lexical and syntactic complexity taken from the two-language writings.

Data Analysis Procedures
134 writings of 67 students of English Literature were analyzed by Web-Based Lexical Complexity Analyzer and Web-Based Syntactic Complexity Analyzer (Lu, 2010) to find the lexical and syntactic complexity for each writing. After being analyzed, the researchers tabulated the findings to ease computation using Paired Sample T-Test. The value is taken from Paired Sample T-Test, then displayed in a table then underwent a conclusion drawing process.

Findings 4.1 Lexical Complexity in L2 Writings (English) and L1 Writings (Indonesian)
The lexical complexity of undergraduate students' writings is measured through three variables, namely: lexical density, lexical sophistication, and lexical variation. The researchers computed the lexical complexity in L2 writings by using Web-based Lexical Complexity Analyzer (Lu, 2010). Table 1 below showed the comparison of lexical complexity between students' English essays and their Indonesian ones, as follows: From the descriptive view of the data, the result showed that students' lexical complexity in L2 writings is higher than their L1 writings since all of the lexical complexity aspects such as lexical density, lexical sophistication, and lexical variation in their English writings showed higher mean score than their Indonesian ones. The mean difference of students' lexical density in English writings and Indonesian writings is 0,0761.
The table showed a significant difference between students' lexical complexity in English writings and Indonesian ones. The sig (2-tailed) of lexical density in English writings and Indonesian writings is 0.000, lower than 0,05. The sig (2-tailed) of lexical sophistication (LS) and lexical variation (LV) of English writings and Indonesian writings are 0.036 and 0.000, respectively, which of them are less than 0.05. The finding shows that it is found a significant dissimilarity between the two types of writings. Looking at the data deeper by comparing the mean scores, we can conclude that the students produce higher lexical complexity in their English writings than their Indonesian ones. The finding is indeed surprising since the students' prior assumption is that they had no difference between their essays. If there is a difference, the initial hypothesis suggests that they should produce higher lexical complexity in Indonesian writings.
This anomaly occurs because the students tend to create redundant and ineffective sentences. Repetitive words are also inevitable when they did Indonesian writings, which are less straightforward. The tendency of being less straightforward in Indonesian writings made the students produce redundant words and used more functional words than the content words.
Looking at the findings, the value of sig (2-tailed) of lexical sophistication is higher than other variables. Although the lexical sophistication between writings is significantly different, it showed that sophisticated or rare words are not way different from the lexical density and lexical variation. It showed that the students' diction of both languages is almost the same.

Syntactical Complexity in L2 Writings (English) and L1 Writings (Indonesian)
Syntactic complexity depicts how complex the students construct the sentences both in English and Indonesian writings. The researchers chose five indices to measure the students' syntactic complexity such as MLS, MLC (Atkinson-King & Brown, 1975), DC/C, CP/C, C/S (Peter Skehan & Foster, 2005). The researchers employed a Web-based Syntactic Complexity Analyser (Lu, 2010) to compute the L2 writings' syntactic complexity. Besides, the researchers count syntactic complexity of L1 writings manually by obtaining the number of words, sentences, clauses, dependent clauses, and coordinate clauses.
After computing all indices, the researchers calculated them using Paired Sample T-Test to see the difference between English and Indonesian writings created by the undergraduate students. The computation had displayed in table 2, as follows: The table showed that four indices of syntactic complexity of English writings have higher syntactic complexity than Indonesian writings. Only CP/C (coordinate phrase per clause) showed that Indonesian writings have a higher value than English writings. It shows that the students used more coordinated phrases in their Indonesian writings than in English writings.
The result of paired sample t-test of syntactic complexity of students' English and Indonesian writings are significantly different in four indices such as mean length of sentence (MLS), dependent clause per clause (DP/C), coordinate phrase per clause (CP/C), and clause per sentence (C/S). The Sig. (2-tailed) scores of those four indices are less than 0.05, whose scores are 0.006, 0.000, 0.038, and 0.002 for MLS, DC/C, CP/C C/S, respectively. However, there is no significant difference between the two writings since the Sig for mean length per clause.
(2-tailed) of MLC of English and Indonesian writings are 0.607 that p > 0.05. This finding showed that the students construct their English and Indonesian essays differently. Relating the Sig. (2-tailed) finding with the mean score, it showed that the students have higher syntactic complexity in their English writings than in the Indonesian ones.

Discussions
The present study examines lexical complexity of students' writing using three measurements: lexical density, lexical variation, and lexical sophistication. Lexical density enables the researchers to measure how many content words the students used in their writings. On the other hand, lexical variation shows how vary the words used in the essay.
Lexical sophistication gives information about how advanced the words used by the students. Those three measurements give information about the student lexical productivity which is essential in making a good composition.
The study begins with a hypothesis that the students' writing ability is higher in their L1 than L2 since they are accustomed and familiar with their mother tongue (Lindgren et al., 2008;Ong & Zhang, 2010;Segalowitz, 2010;Waes & Leijten, 2015). The previous study suggests that there was a major difference between the quality of students' L1 and L2 writings (Waes & Leijten, 2015). In line with this argumentation, the present study reveals that there is a significant difference between L1 and L2 writings, both in lexical and syntactic complexity. However, the present study obtains a surprising fact opposite to previous studies (Lindgren et al., 2008;Segalowitz, 2010;Treffers-Daller et al., 2018;Waes & Leijten, 2015) that the students' L2 writings have higher lexical and syntactic complexity in this study. The result showed an anomaly of a common assumption that Indonesian as their first language should have a higher lexical density since they are accustomed to using it. This anomaly happened because there is a tendency that the students used redundant and ineffective sentences, which made them used more token words but less content words. This case can be a cause of a low score of lexical density and variation. Besides, the students used repetitive words more often in their Indonesian writings; therefore, their lexical variation and lexical sophistication become low. In lexical sophistication, the learners may get help from a dictionary or thesaurus to get more advanced words than in their L1 writing. Since it is their mother language, learners are not bothered to look up into dictionary to make their words varied. This assumption is our explanation of how their lexical sophistication can be lower in their L1 than in L2 writings.
With respect to the comparison of syntactic complexity between L1 and L2 writings, the researcher proposed five measurements such as dependent clause per clause (Beers & Nagy, 2009;Ellis & Yuan, 2004), clause per sentence, coordinate phrase per clause (Norris & Ortega, 2009), mean length of sentences (Stockwell & Harrington, 2003) and mean length of clause (Beers & Nagy, 2009). From these five measurements, the students' L2 writings are significantly different with their L1 writing in four measurements (dependent clause per clause, clause per sentence, mean length of sentences, and coordinate phrase per clause). As surprising as a result from lexical complexity, the students' L2 writings are higher than their L1 writing. On the contrary, their mean length of clause is similar between students' L1 and L2 writings. Therefore, the result did not show any significant difference. The explanation of how the students' mean length of clause in L1 and L2 can be similar is that they tend to use a similar structure for the two writings. It is found that they tend to use combine two or more coordinate clause to make a long sentence. Other researchers also confirmed that the L2 learners used coordinate phrases in their L2 writings as frequent as their L1 writings (Kuiken & Vedder, 2019). That is why their mean length of clause are similar since they construct it from similar structure that is coordinate phrases. They also encounter difficulty arranging an idea in the sentence, making the more coordinated phrase to accommodate their ideas in a sentence.
Finally, an exceptional result between the present study and the previous ones which shows the opposite direction on which writings show better writing ability of the L2 learners is found. However, it is first mentioned that the previous studies examined more in the processes while the present study investigated the process results. This study actually should be the complementary evidence of what previous studies have done earlier with respect to the present study's result, and further investigation involved both processes and results need to be conducted.

Conclusions
Given the shortage of comparison research on linguistic complexity, this study aims to fill the gap by investigating whether or not the students' lexical and syntactic complexity differ in their L1 and L2 writings. This study begins by borrowing a hypothesis that L2 learners will be more comfortable writing their L1 texts than in their L2 ones. Therefore, the researchers are expected to see a higher lexical and syntactic complexity in their L2 writings than in L1 writings.
In summary, the present research did not support the findings of previous studies (Lindgren et al., 2008;Segalowitz, 2010;Treffers-Daller et al., 2018;Waes & Leijten, 2015) that the learners' L1 writings should be better than their L2 writings. However, this study's finding reveals that their L1 writings' lexical complexity is significantly higher than their L2 writing. It is depicted from all three measurements: lexical density, lexical variation, and lexical sophistication, shows results that their L2 writings are higher than their L1 writings in three aspects. A similar result is also revealed for the syntactic complexity, where four of five measurements of syntactic complexity of the students' L2 writings are higher than theirs in L1 writings. Only MLC (mean length per clause) shows no difference between L1 and L2 writing. It means that they both constructed their clause in a similar way.
Nevertheless, the use of lexical and syntactic complexity is still highly recommended to complement the teachers' writing assessment rubric which is considered to be more objective. The lexical and syntactic complexity are considered more objective since they used quantitative measurements that do not exist in common writing rubrics. The lexical and syntactic complexity will help the teachers' to quantify the learners' writing quality and reduce their biased and subjectivity in scoring the writing assessment.
Despite the potential contribution of the present research findings in writing productivity, the researchers acknowledged several limitations in this study. First, this study is result-based research which is lack of monitoring process when the students' create their writings. As a result, there is a possibility that students used an online dictionary and other internet helps while doing these writings. Therefore, a monitored-writing assignment is highly recommended for further researchers. Second, the researchers did not limit the topic for the two writings, so there is a possibility that students translate their L1 writing to L2 writings or vice versa. Thus, a more specific topic or a distinctive topic between L1 and L2 writings is suggested. Third, to get broader evidence on how distinctive the L2 learners' lexical and syntactical complexity, it is recommended to investigate different language production forms such as oral presentation and translations.