英语论文网

【摘要】本文试图采用语料库的方法从文体学视角分析欧•亨利小说集《四百万》。研究揭示，通过语料库软件计算出的总体统计数据，为有关欧•亨利小说广泛认同的文学阐释提供了更为具体的描述基础。在探讨小说场景和基本主题方面，重现序列的搭配及频数信息发现了前人并未关注过的语言学特征。 Abstract：This article attempts to apply corpus-based method to a stylistic interpretation of O. Henry’s short story collection The Four Million. It is shown that the overall statistics computed by corpus software has provided a more detailed descriptive basis for widely accepted literary interpretations of his stories. In terms of story settings and general themes, the collocation and frequency information of recurrent sequences can identify valuable linguistic features which literary critics seem not to have noticed. 1.Introduction O. Henry was called the American Guy De Maupassant. Both authors wrote twist endings, but O. Henry’s stories were much more playful and optimistic. Among the former studies on O.Henry’s short stories, there is consensus that O.Henry’s works are generally branded with such features as surprising endings, use of coincidence or chance to create humor, ingenious and exquisite layouts, smile-in-tears irony and so forth. Despite the detailed literary discussion, little work has been done to reveal its linguistic styles. Nor is there work with quantitative data as convincing evidence. In terms of the established description of his style, it seems unlikely that the corpus-based method can find anything original. However, the stylistic analysis in the present paper aims to illustrate the value of corpus empirical method in exploring the literary styles. On the one hand, statistic data help to confirm the canonical view on O.Henry’s short stories; on the other hand, stylistics is related to linguistic features of his works. 2.Data and Methodology This paper is devoted to investigating the linguistic styles of O.Henry’s works in an empirical way, applying both quantitative and qualitative methods. The study adopts two corpora. One is O.Henry’s book The Four Million (a collection of stories), published in 1906, contains a series of short stories which took place in the New York City in the early years of the 20th century. The computer readable versions available on the internet are used to set up a minor working corpus for investigation (https://www.literaturepage.com/read/thefourmillion.html). The other one is Brown corpus used as a reference corpus. The corpus concordance software used in this study is Wordsmith tools. Wordsmith can undertake more detailed analyses of frequencies of concordance items and extract collocational information. By use of corpora software, words with significant keyness in the book The Four Million will be sorted out first, and then concordance lines with a keyword and its collocates will be extracted. The corpora data will be processed by statistical instruments. 3.Overall Statistics The overall statistics are one essential starting point for a systematic corpus-based textual analysis. Wordsmith Tools are used to provide the overall statistics of the two corpora and comparison is made as shown in Table 3.1. Table 3.1 Comparison of Overall Statistics between the Two Corpora text file tokens (running words) in text types (distinct words) type/token ratio (TTR) standardised TTR standardised TTR basis mean sentence length (in words) mean word length (in characters) word length std.dev. Overall of Mini 52,770 8,251 16 46.97 1,000.00 15 4 2.24 Overall of Brown 1,390,505 47,146 4 39.07 1,000.00 23 5 2