Od nagrania do korpusu, czyli o metodzie archiwizowania języka mówionego mieszkańców wsi z wykorzystaniem narzędzi lingwistyki cyfrowej

Helena Grochola-Szczepanek

doi:10.24917/20831765.16.5

PDF (Język Polski)

Published: Dec 29, 2021

DOI: https://doi.org/10.24917/20831765.16.5

Keywords:

corpus, spoken language, dialect, transcription

Helena Grochola-Szczepanek

Institute of Polish Language, Polish Academy of Sciences, Cracow

https://orcid.org/0000-0002-1511-0486

Abstract

The article presents the method of archiving of the rural speech during the development of the electronic language corpus. Attention is focused on how to get spoken data and transcription of non-standard dialect code. It also presents the problems and limitations resulting from nonnormative spoken data and the solutions applied. The recording and converting of spoken language data for corpus is a complex and multi-phase process. The data is obtained from recorded interviews with respondents. The developed system of spoken data transcription combines the properties of non-standard code, the capabilities of tools and needs of corpus.

Downloads

Download data is not yet available.

How to Cite

Grochola-Szczepanek, H. (2021) “From recording to corpus, i.e. the method of archiving of the rural speech with using digital linguistics tools”, ANNALES UNIVERSITATIS PAEDAGOGICAE CRACOVIENSIS. STUDIA LINGUISTICA, (16), pp. 54–67. doi: 10.24917/20831765.16.5.

Issue

No. 16 (2021)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Author, submitting a text to the editorial board of the journal “Annales Universitatis Paedagogicae Cracoviensis. Studia Linguistica", certifies that the content of the article has not been published so far and that the work does not violate in any way the copyright or related rights of other person, as well as other rights of third parties, and that no one's rights to the work (or any part thereof) have been missed. After signing the contract, the property rights to the published materials are transferred to the University of the National Education Commission, Krakow.

“Annales Universitatis Paedagogicae Cracoviensis. Studia Linguistica” is an open access journal, and all its content is made available free of charge to users and institutions under the Creative Commons CC-BY-NC-ND 4.0 license (attribution, non-commercial use, no derivative works). Under this license, the authors agree that their work may be lawfully reused for any purpose, except for commercial purposes, without the prior consent of the author or publisher. Everyone can read, download, copy, print, distribute and process these works, provided that the author's marking and the original publication place are correct. Published texts may not be used to create derivative works (e.g. to translate and publish in another language without the consent of the publisher). This is in line with the BOAI (Budapest Open Access Initiative) definition. "Studia Linguistica" does not charge for submitting or processing articles.

References

Bańko M., Kłosińska A., 1994, Polszczyzna mówiona nieobecna w słownikach, [w:] Współczesna polszczyzna mówiona w odmianie opracowanej (oficjalnej), red. Z. Kurzowa, W. Śliwiński, Kraków, s. 89–96.
Google Scholar

Dunaj B., 1986, Dialektologia a socjolingwistyka, „Folia Linguistica” 12, s. 15–23.
Google Scholar

Grochola-Szczepanek H., Górski R.L., von Waldenfels R., Woźniak M., 2019, Korpus języka mówionego mieszkańców Spisza, „LingVaria” LV/1, s. 165–180.
Google Scholar

Grochola-Szczepanek H., Woźniak M., 2018a, Badania korpusowe języka mieszkańców Spisza a czynnik pokoleniowy, [w:] Dialog pokoleń w języku i językoznawstwie, red. E. Wierzbicka-Piotrowska, Warszawa, s. 79–90.
Google Scholar

Grochola-Szczepanek H., Woźniak M., 2018b, Transkrypcja języka mieszkańców wsi w aplikacji ELAN w Korpusie Spiskim, [w:] Historia języka, dialektologia i onomastyka w nowych kontekstach interpretacyjnych, red. R. Przybylska, M. Rak, A. Kwaśnicka-Janowicz, Kraków, s. 267–278.
Google Scholar

Klessa K., Wagner A., Oleśkowicz‑Popiel M., Karpiński M., 2013, Paralingua – A New Speech Corpus for the Studies of Paralinguistic Features, „Procedia‑Social and Behavioral Sciences” 95, s. 48–58.
Google Scholar

Labocha J., 2012, Pragmatyczne mechanizmy składni języka mówionego, „Slavia Occidentalis” 69, s. 139–145.
Google Scholar

Lewaszkiewicz T., 2017, O zapisach fonetycznych polskiej i słowiańskiej mowy ludowej i potocznej, „Gwary Dziś” 9, s. 183–197.
Google Scholar

Przybylska R., 2009, Badania nad polszczyzną mówioną a leksykografia, [w:] Polszczyzna mówiona ogólna i regionalna, red. B. Dunaj, M. Rak, Kraków, s. 33–39.
Google Scholar

Sierociuk J., 2009, Zasoby fonograficzne Zakładu Dialektologii Polskiej Uniwersytetu im. Adama Mickiewicza i ich przydatność w badaniach procesów rozwojowych polszczyzny mówionej, [w:] Polszczyzna mówiona ogólna i regionalna, red. B. Dunaj, M. Rak, Kraków, s. 179–188.
Google Scholar

Wagner A., Bachan J., Klessa K., Demenko G., 2015, Przegląd wybranych aspektów analizy prozodii mowy spontanicznej na potrzeby technologii mowy, „Prace Filologiczne” LXVI, s. 271–298.
Google Scholar

Waldenfels R. von, Woźniak M., 2016, SpoCo – a simple and adaptable web interface for dialect corpora, „Journal for Language Technology and Computational Linguistics” 31, s. 155–170.
Google Scholar

Baza Mazak, Akustyczna baza danych gwar mazowieckich. Wokalizm, http://www.bazamazak.uw.edu.pl/ (dostęp: 07.02.2021).
Google Scholar

Český národní korpus, http://ucnk.ff.cuni.cz (dostęp: 07.02.2021).
Google Scholar

GOS – Referenčni govorni korpus slovenskega jezika, http://korpus-gos.net (dostęp: 07.02.2021).
Google Scholar

Korpus Spiski, Język mieszkańców Spisza. Korpus tekstów i nagrań gwarowych, http://spisz.ijp.pan.pl (dostęp: 07.02.2021).
Google Scholar

NKJP – Narodowy Korpus Języka Polskiego, http://nkjp.pl (dostęp: 07.02.2021).
Google Scholar

Pęzik P., 2012, Język mówiony w NKJP, [w:] Narodowy Korpus Języka Polskiego, red. A. Przepiórkowski, M. Bańko, R. Górski, B. Lewandowska‑Tomaszczyk, Warszawa, s. 37–47, http://nkjp.pl/index.php?page=3&lang=0 (dostęp: 27.02.2021).
Google Scholar

Pęzik P., 2014, Spokes – a search and exploration service for conversational corpus data, https://clarin-pl.eu/dspace/bitstream/handle/11321/47/spokes_pezik.pdf?sequence=5&isAllowed=y (dostęp: 10.01.2021).
Google Scholar

Przepiórkowski A., Bańko M., Górski R.L., Lewandowska-Tomaszczyk B. (red.), 2012, Narodowy Korpus Języka Polskiego, Warszawa, http://nkjp.pl/index.php?page=3&lang=0 (dostęp: 27.02.2021).
Google Scholar

Spokes-CLARIN, http://spokes.clarin-pl.eu/ (dostęp: 07.02.2021).
Google Scholar

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)