Semato-Ling
Pierre Plante, Lucie Dumas and Élias Rizkallah
Faculté des sciences humaines
Université du Québec à Montréal
The Semato software implements a morphological, syntactical and semantical analyzer of French and English. These three levels collaborate to obtain sets of emerging themes for texts gathered in corpus.
Semato-Ling gathers a set of files built during the linguistic analysis (called indexation) of your corpus. These files contain in a form that can be imported by other software, all the morphological, syntactical and semantical information added by Semato to your text data.
On this page, you will find four links (corresponding to four steps) allowing you to obtain on your corpus the linguistic descriptions produced by Semato.
The description of the files produced by Semato-Ling is available by following this link: Semato-Ling - Help .
Step 1: Opening a project Semato
If you already have a Semato project, and it's activated (its name appears at the top right), go straight to step 2.
If you want to activate another project that has already been opened, follow this link, then return to this page for Step 2.
You will be prompted for a project name and a password.
Projects and passwords names are strings that follow specific rules constitution:
- the string has a minimum of 4 characters;
- the string must begin with an alphabetic character;
- all alphabetic characters in the string are lowercase;
- the string contains no accented character, no cedilla, diacritics;
- the string contains no other character than the 26 lower case letters (abcdefghijklmnopqrstu vsxyz) and the 10 digits (0 1 2 3 4 5 6 7 8 9).
Follow this link to open a project, and then return to this page for Step 2.
Step 2: upload your corpus
Attention, you can not upload a corpus to Semato if you do not have a project open (step 1).
If you've already uploaded a corpus in your project, and this, whatever the method of entry used data, you can proceed to step 3.
The easiest method to get Semato-Ling files is method spreadsheet data entry. This link explains the spreadsheet method. For the demonstration, we use a corpus of 42 speeches by American presidents (1961-2005). The name of the project is SOTUS (State of the Union Speeches).
This link opens the spreadsheet in Excel format of one of these speeches (Bush 1990). The SOTUS corpus will be the example for linguistics files produced by Semato-Ling.
Attention, all files transferred to Semato must be encoded in Windows ANSI (WINDOWS-1252).You should always transfer .txt type files and never their Excel version (.xls .xlxs) or Word (.doc .docx), or any other format that is not .txt .
To upload your spreadsheet file to your Semato project, follow this link.
If, following an indexing error, you must edit your file, overwriting the previous file uploaded using the same name.
You will find here more features on the handling of your input files.
Upon completion of uploading, come back to this page to continue with Step 3.
Step 3: data indexing
Semato-Ling files are produced by the process of indexing text in Semato.
If you have already indexed your corpus, you can recover Semato-Ling files by going directly to Step 4.
To perform indexing of your text, follow this link.
The time required for indexing depends on the volume of your corpus and the number of corpus to be processed (other than yours). In total, this can vary from minutes to hours ...
At the end of indexation, you can go to step 4.
Step 4: recovery Semato-Ling files
At the end of the indexing, you can recover files produced by Semato-Ling by following this link.
|