Research

Home

News

RESEARCH

Learner Corpora
Lexicography

Learner corpora

JEFLL Corpus

A corpus of Japanese secondary school learners' English compositions. So far we have more than 10,000 subjects ranging from Year 7 to 12. The total size is approximately 650,000 running words. The online web query system is now available free for research.

Please visit the JEFLL Corpus online query page.

International Corpus of Crosslinguistic Interlanguage (ICCI)

Collections of younger learners' English essays from 7 different nations and regions (China, Taiwan, Hong Kong, Spain, Austria, Poland, Israel), which are comparable in design with JEFLL Corpus. The approximate size is about half a million tokens.

Official website for ICCI (part of Global COE Program at TUFS)

	Automatic identification of learner errors using edit distance & parallel LC
	A proofread version of JEFLL has been prepared and the differences between the original writings and the corrected ones are automatically extracted from aligned sentence pairs, using "Levenshtein distance". The results can be tagged for omission, addition, misformation errors. This heuristic can help avoid time-consuming manual error annotations. More information will be available soon!

	CEFR and corpus-based identification of criterial features
	This is a new project of identifying criterial features for CEFR levels by examing learner corpora at different CEFR levels. I am a collaborator with the English Profile Programme with Professor Masashi Negishi at TUFS. Official website for EPP

Lexicography

	Experimental approaches toward dictionary look-up behaviour
	This is my primary research interest before I shifted my focus on corpus linguistics. I did a lot of experimental studies on L2 dictionary use. Special page on dictionary use (coming soon!)

	Corpus lexicography
	Since the COBUILD project started, I have been caught in the web of words stored on computer! Special page on corpus lexicography (coming soon!)

RESEARCH

JEFLL Corpus

International Corpus of Crosslinguistic Interlanguage (ICCI)

Automatic identification of learner errors using edit distance & parallel LC

CEFR and corpus-based identification of criterial features

Experimental approaches toward dictionary look-up behaviour

Corpus lexicography