Lucia Kocincová (2015). Interactive visualization methods for Sketch Engine. Master thesis. Masaryk University, Faculty of Informatics.
Abstract: Visualization is undoubtedly one of the most desired methods for displaying data, especially when dealing with so called big data. Visualization can uncover unnoticed and hidden relationships within the data and in addition, it enables the users to understand and interpret the data with less effort. This thesis focuses on interactive visualizations generated from the corpora data. First, it introduces the state-of-the-art tools for corpora visualizations and a corpus management system named Sketch Engine, for which numerous design concepts were created. Then four of them – corpora overview, thesaurus, word sketch and word sketch difference – were implemented as an online application with the main use of the Data-Driven Documents library. Last, these visualizations were evaluated by the user testing which revealed that the implemented concepts were not only graphically very appealing but also helpful. Therefore, the interactive visualizations will be incorporated in the Sketch Engine online interface in the upcoming future.
Matouš Ejem (2015). English learner corpora [in Czech]. Bachelor thesis. Masaryk University, Faculty of Arts.
Abstract: Learner corpora conjoin second language acquisition research, foreign language teaching and corpus linguistics. In this work I present available English learner corpora.
Lucie Kaplanová (2015). Collection of linguistically motivated examples of CQL [in Czech]. Bachelor thesis. Masaryk University, Faculty of Arts.
Abstract: This bachelor thesis deals with query language for corpora called CQL (Corpus Query Language). It explains use of individual operators, attributes, and structures that can be used in CQL search. The thesis also includes a set of linguistically oriented CQL queries for Czech and English.
Monika Močiariková (2015). Methods for Automatic Acquisition of Dictionary Definitions [in Slovak]. Bachelor thesis. Masaryk University, Faculty of Arts.
Abstract: The thesis is trying to explain the term definition and why it is difficult to say whether some sentences are definitions or not. It also describes the Sketch Engine system and the CQL language. The practice part is dedicated to design, implementation and evaluation of queries for automatic definition search.
Dominika Talianová (2014). Corpus Data Visualization. Bachelor thesis. Masaryk University, Faculty of Informatics.
Abstract: The aim of this thesis is to study approaches used in concurrent processing and to apply them to the evaluation of queries in the system Manatee. Part of the work is not only a detailed evaluation of queries processing speed with various number of cores available during the evaluation, but also a comparision of the length of code between the old and the new implementation.
Abstract: From a natural language corpus, word usage data over time can be extracted. To detect and quantify change in this data, automatic procedures can be employed. In this work, the theory of ordinary and robust regression methods is discussed and applied to real world data with great success. A Python implementation is included. Smoothing of time series and detection of seasonality is examined, but ultimately this path does not seem to give satisfactory results for the data explored.
Abstract: This thesis proposes and implements an algorithm for evaluation of sentences with respect to their understandability and informativeness. It can be embedded into a variety of applications, such as corpus querying tools or automated dictionaries. The proposed algorithm is highly customizable, since it employs a variety of criteria approximating the similarity of sentences to good dictionary examples. It was optimized using machine learning algorithms according to a set of manually labelled concordances. The algorithm is usable in practical applications, however it is still being developed.