Rozdíly

Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.

Odkaz na výstup diff

Both sides previous revision Předchozí verze
Následující verze
Předchozí verze
Poslední revize Both sides next revision
czesl:czesl [2020/12/10 17:56]
rosen [CzeSL – a Learner Corpus of Czech]
czesl:czesl [2021/03/11 19:53]
rosen [Tools]
Řádek 1: Řádek 1:
 +{{ :​czesl:​logolink_op_vvv_hor_barva_eng.jpg?​600 |}}
 +
 ======= CzeSL – a Learner Corpus of Czech ======= ======= CzeSL – a Learner Corpus of Czech =======
  
Řádek 9: Řádek 11:
     * 2012–2016:​ Ministry of Education, Youth and Sports – //Czech National Corpus//, no. LM2011023     * 2012–2016:​ Ministry of Education, Youth and Sports – //Czech National Corpus//, no. LM2011023
     * 2016–2018 (extended to mid-2020): Grant Agency of the Czech Republic – [[https://​ufal.mff.cuni.cz/​czesl|Non-native Czech from the Theoretical and Computational Perspective]],​ no. 16-10185S     * 2016–2018 (extended to mid-2020): Grant Agency of the Czech Republic – [[https://​ufal.mff.cuni.cz/​czesl|Non-native Czech from the Theoretical and Computational Perspective]],​ no. 16-10185S
-    * 2018–2022: ​KREAS, Faculty of Arts, Charles University; Structural and Investment Funds of the European Union –[[https://​kreas.ff.cuni.cz/​en/​]]+    * 2018–2022:​ [[https://​kreas.ff.cuni.cz/​en/​|KREAS]], Faculty of Arts, Charles University; Structural and Investment Funds of the European Union
   * Alternative address of this site: [[http://​utkl.ff.cuni.cz/​learncorp/​]]   * Alternative address of this site: [[http://​utkl.ff.cuni.cz/​learncorp/​]]
  
Řádek 15: Řádek 17:
 ===== Available versions ===== ===== Available versions =====
  
-| ^  Thousands of tokens in  ^^^^  ​annotation ​ ^^^^^ Metadata ^  Access ​ ^  Year  ^ +| ^  Thousands of tokens in  ^^^^  ​Annotation ​ ^^^^^ Metadata ^  Access ​ ^  Year  ^ 
-| ::: ^  non-native ​ ^^  ethnolect ​ ^  𝚺  ^  ​Error  ​^^  ​Linguistic ​ ​^^^:::​^:::​^:::​^+| ::: ^  non-native ​ ^^  ethnolect ​ ^  𝚺  ^  ​error  ​^^  ​linguistic ​ ​^^^:::​^:::​^:::​^
 | ::: ^  essays ​ ^  theses ​ ^ ::: ^ ::: ^  Tags  ^  TH  ^  T0  ^  T1  ^  T2  ^ ::: ^ ::: ^ ::: ^ | ::: ^  essays ​ ^  theses ​ ^ ::: ^ ::: ^  Tags  ^  TH  ^  T0  ^  T1  ^  T2  ^ ::: ^ ::: ^ ::: ^
 ^ CzeSL-plain |  1,315 |  732 |  428 |  2,475 |  --  |  --  |  --  |  --  |  --  |  --  |  SD  |  2012  | ^ CzeSL-plain |  1,315 |  732 |  428 |  2,475 |  --  |  --  |  --  |  --  |  --  |  --  |  SD  |  2012  |
Řádek 89: Řádek 91:
     * Each text with its annotation consists of several related files.     * Each text with its annotation consists of several related files.
     * Some of the texts are independently annotated twice.     * Some of the texts are independently annotated twice.
 +    * Includes also flat version (files named *.vert), see CzeSL-man v2 below.
   * **CzeSL-man v1 searchable**:​   * **CzeSL-man v1 searchable**:​
     * Searchable by KonText: https://​kontext.korpus.cz/​first_form?​corpname=czesl-man     * Searchable by KonText: https://​kontext.korpus.cz/​first_form?​corpname=czesl-man
Řádek 104: Řádek 107:
   * Apart from the error annotation, the content and metadata are the same as in CzeSL-man v1.   * Apart from the error annotation, the content and metadata are the same as in CzeSL-man v1.
   * Linguistic annotation (tags and lemmas) is provided for all tokens at Tier 0 and Tier 2.   * Linguistic annotation (tags and lemmas) is provided for all tokens at Tier 0 and Tier 2.
 +  * Downloadable from https://​bitbucket.org/​czesl/​czesl-man/​ (files named *.vert).
  
 === CzeSL-TH === === CzeSL-TH ===
Řádek 153: Řádek 157:
   * Multi-level concordancer [[http://​utkl.ff.cuni.cz/​czesl/​selaq.html|SeLaQ]],​ used for basic searching in CzeSL-man   * Multi-level concordancer [[http://​utkl.ff.cuni.cz/​czesl/​selaq.html|SeLaQ]],​ used for basic searching in CzeSL-man
   * Standard concordancer [[http://​wiki.korpus.cz/​doku.php/​en:​manualy:​kontext:​index|Manatee/​KonText]],​ used for searching in CzeSL-plain and CzeSL-SGT   * Standard concordancer [[http://​wiki.korpus.cz/​doku.php/​en:​manualy:​kontext:​index|Manatee/​KonText]],​ used for searching in CzeSL-plain and CzeSL-SGT
-  * General corpus tool [[http://beta.clul.ul.pt/​teitok/​site/​|TEITOK]], currently used for building, editing and viewing learner corpora hosted by the Institute of Theoretical and Computational linguistics (see [[http://​utkl.ff.cuni.cz/​teitok/​|Learner corpora at ICTL]])+  * General corpus tool [[http://www.teitok.org|TEITOK]], currently used for building, editing and viewing learner corpora hosted by the Institute of Theoretical and Computational linguistics (see [[http://​utkl.ff.cuni.cz/​teitok/​|Learner corpora at ICTL]])
  
 ===== Bibliography ===== ===== Bibliography =====
Řádek 164: Řádek 168:
 //Compiling and annotating a learner corpus for a morphologically rich language – CzeSL, a corpus of non-native Czech.// [[https://​karolinum.cz|Karolinum,​ Charles University Press, Praha]]. [[https://​karolinum.cz/​knihy/​rosen-compiling-and-annotating-a-learner-corpus-for-a-morphologically-rich-language-23802|Print copy, e-book]] [[https://​dspace.cuni.cz/​handle/​20.500.11956/​123103|CU Digital Repository]] //Compiling and annotating a learner corpus for a morphologically rich language – CzeSL, a corpus of non-native Czech.// [[https://​karolinum.cz|Karolinum,​ Charles University Press, Praha]]. [[https://​karolinum.cz/​knihy/​rosen-compiling-and-annotating-a-learner-corpus-for-a-morphologically-rich-language-23802|Print copy, e-book]] [[https://​dspace.cuni.cz/​handle/​20.500.11956/​123103|CU Digital Repository]]
  
 +===== Acknowledgement =====
  
 +This work was supported by the European Regional Development Fund project “Creativity and Adaptability as Conditions of the Success of Europe in an Interrelated World” (reg. no.: CZ.02.1.01/​0.0/​0.0/​16_019/​0000734).

QR Code
QR Code czesl:czesl (generated for current page)