This Web site has been created and is supported by the Computational Linguistics Laboratory
(CLL) at Katanov State University of Khakasia (KSU) located in Abakan, Russia. Its aim is to provide information
about laboratory's activities and products.
This site is outdated. Latest information about the CLL's activities is available at http://vetsky.narod2.ru
The CLL conducts non-commercial theoretical and applied research within scopes of information retrieval, text summarization, data mining, computer assisted language learning (CALL), and corpus linguistics. The research is supported by federal and local grants.
The CLL at KSU was founded in 2002 to conduct work in the following areas.
Applied linguistics research, development of computer systems to be used in Computer
Assisted Language Learning and Insruction. By now 10 such systems have been created (see Products section). A classification of software used in foreign language learning and teaching is given in .
Automatic text summarization research. V.Yatsko (last name also spelt "Iatsko"),
the head of the CLL, is the author of symmetric summarization conception that underlies
PASS and ETS allowing to produce coherent and adequate summaries. For details see Our Publications [1-4]. ROS system allowing to summarize Web pages indicated by the user in a continual mode is being developed. In 2008 we released Universal Summarizer (UNIS) that has a smart automatic text classification function. Once the text is classified as scientific, publicistic, or fiction UNIS applies algorithms specially optimized for this text type to significanyly increase the quality of resulting summaries.
Evaluation of the Internet information retrieval systems. Depth of user's search  and reference dictionary conceptions are being developed to evaluate automatic text summarization systems as
well as the Internet information retrieval systems .
Discourse analysis. Integrational discourse analysis conception [6-8] distinguishes between surface and deep levels of discourse structure.
Currently we are investigating various types of possessive discourse and linguistic
features of possessive relations differentiating between alienable and inalienable
Computer learner corpora research project. This ongoing project is aimed at 1)
creating corpora of texts (dictations, expositions, compositions, etc.) produced
by Russian-speaking learners of English; 2) creating tools for error tagging and
automatic analysis of these corpora; 3) contrastive analysis of Russian learner
corpora with corpora produced by speakers of other languages. The project is in
line with research done by Granger et al .
- Linguistic Toolbox (LIT).
LIT provides the user with a set of instruments for linguistic analysis, such as tokenizer, text splitter, tagger, dictionary comparer, wordlist, concordancer. By means of these instruments the user can get statstic data about the text, annotate it with POS tags, and conduct various types of searches. LIT supports English and Russian; its prototype version was ready in March 2008. A Beta version is to be released by the end of the year.
- Data/text mining. We are developing algorithms for mining chat logs and
blogs with the aim of preventing undesirable events, for example acts of violence.
TEXOR system that
performs such mining is available online. Recently we completed a commercial
project on sentiments mining having created a system that recognized and analyzed
opinions of users about commercial products. The system works on an ontology
and linear grammar that we specially developed for this project.