Towards an Empirically-based Valency Lexicon of Latin

  • Marco Passarotti Università Cattolica del Sacro Cuore di Milano
  • Berta González Saavedra Università Cattolica del Sacro Cuore di Milano

Abstract

Nonostante una secolare tradizione lessicografica, la lingua latina manca ancora di risorse lessicali di tipo computazionale aggiornate allo stato dell’arte. Ciò è strettamente connesso alla limitata disponibilità di corpora testuali latini annotati linguisticamente, sulla cui base empirica possano essere costruite nuove risorse lessicali. Tuttavia, una serie di progetti mirati allo sviluppo di avanzate risorse linguistiche per il latino (tra cui alcune treebank) è stata avviata nel corso dell’ultimo decennio. In questo articolo, presentiamo Latin Vallex, un lessico di valenza per il latino realizzato in stretta connessione con l’annotazione semantico-pragmatica di due treebank latine comprensive di testi di epoche e generi diversi. Ciò consente di connettere biunivocamente le strutture valenziali registrate nel lessico e le loro occorrenze nei dati testuali delle treebank.

References

Bamman, D. and Crane, G. (2006). The design and use of a Latin dependency treebank. In J. Nivre and J. Hajič (Eds.), Proceedings of the Fifth Workshop on Treebank and Linguistic Theories (TLT2006). Prague, Czech Republic: ÚFAL, pp. 67--78.

Delatte, L., Evrard, E., Govaerts, S. and Denooz, J. (1981). Dictionnaire fréquentiel et Index inverse de la langue latine. Université de Liège: Laboratoire d’analyse statistique des langues anciennes.

Fillmore, C. (1982). Frame semantics. Linguistics in the Morning Calm. Seoul: Hanshin Publishing Co., pp. 111--137.

Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolárová-Reznícková, V. and Pajas, P. (2003). PDT-VALLEX: Creating a Large Coverage Valency Lexicon for Treebank Annotation. In J. Nivre and E. Hinrichs (Eds.), TLT 2003 – Proceedings of the Second Workshop on Treebank and Linguistic Theories. Volume 9 of Mathematical Modelling in Physics, Engineering and Cognitive Sciences, Växjö, Sweden: Växjö University Press, pp. 57--68.

Happ, H. (1976). Grundfragen einer Dependenz-Grammatik des Lateinischen. Goettingen, Germany: Vandenhoeck & Ruprecht.

Kingsbury, P. and Palmer, P. (2002). From Treebank to Propbank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002). Las Palmas - Gran Canaria, Spain: ELRA, pp. 1989--1993.

Kohl, M., Wiese, S. and Warscheid, B. (2011). Cytoscape: software for visualization and analysis of biological networks. Methods in Molecular Biology, 696, pp. 291--303.

Korhonen A., Krymolowski, Y. and Briscoe, T. (2006). A Large Subcategorization Lexicon for Natural Language Processing Applications. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy: ELRA, pp. 1015--1020.

McDonald, R.T., Nivre, J., Quirmbach-Brundage, Y, Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zang, H., Täckström, O., Bedini, C., Castelló, N.B. and Lee, J. (2013). Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, pp. 92--97.

McGillivray, B. and Passarotti, M. (2009). The Development of the Index Thomisticus Treebank Valency Lexicon. In Proceedings of LaTeCH-SHELT&R Workshop 2009. Athens, Greece: ACL, pp. 43--50.

McGillivray, B. (2013). Methods in Latin Computational Linguistics. Leiden: Brill.

Messiant, C., Korhonen, A. and Poibeau, T. (2008). LexSchem: A Large Subcategorization Lexicon for French Verbs. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco: ELRA, pp. 533--538.

Mikulová, M. et alii. (2005). Annotation on the tectogrammatical layer in the Prague Dependency Treebank. The Annotation Guidelines. Prague, Czech Republic: ÚFAL.

Minozzi, S. (2010). The Latin WordNet project. In P. Anreiter and M. Kienpointner (Eds.), Latin Linguistics Today. Latin Linguistics Today. Akten des 15. Internationalen Kolloquiums zur Lateinischen Linguistik. Innsbruck, Austria: Innsbrucker Beiträge zur Sprachwissenschaft, pp. 707--716.

Panevová, J. (1974-1975). On Verbal Frames in Functional Generative Description. Part I, Prague Bulletin of Mathematical Linguistics, 22, pp. 3--40; Part II, Prague Bulletin of Mathematical Linguistics, 23, pp. 17--52.

Passarotti, M. (2004). Development and perspectives of the Latin morphological analyser LEMLAT. In A. Bozzi, L. Cignoni and J.L. Lebrave (Eds.), Digital Technology and Philological Disciplines. Linguistica Computazionale, XX-XXI, pp. 397--414.

Passarotti, M. (2011). Language Resources. The State of the Art of Latin and the Index Thomisticus Treebank Project. In M.S. Ortola (Ed.), Corpus anciens et Bases de données, «ALIENTO. Échanges sapientiels en Méditerranée», N°2. Nancy, France: Presses universitaires de Nancy, pp. 301--320.

Passarotti, M., González Saavedra, B. and Onambélé Manga, C. (2015). Somewhere between Valency Frames and Synsets. Comparing Latin Vallex and Latin WordNet. In C. Bosco, S. Tonelli, and F.M. Zanzotto (Eds.), Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015). Torino, Italy: Academia University Press, pp. 221--225.

Petrov, S., Das, D., and McDonald, R. (2012). A Universal Part-of-Speech Tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). Istanbul, Turkey: ELRA, pp. 2089--2096.

Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R. and Scheffczyk. J. (2006). FrameNet II. Extendend Theory and Practice. E-book available at http://framenet.icsi.berkeley.edu/index.php?option=com_wrapper&Itemid=126.

Sgall, P., Hajičová, E. and Panevová, J. (1986). The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Dordrecht, NL: D. Reidel.

Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T. (2003). Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 13(11), pp. 2498--504.

Štěpánek, J. and Pajas, P. (2010). Querying Diverse Treebank in a Uniform Way. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010). Valletta, Malta: ELRA, pp. 1828--1835.

Tesnière, L. (1959). Éléments de syntaxe structural. Paris, France: Editions Klincksieck.

Urešová, Z. (2004). The Verbal Valency in the Prague Dependency Treebank from the Annotator's Point of View. Bratislava, Slovakia: Jazykovedný ústav Ľ. Štúra, SAV.

Published
2016-12-30
How to Cite
Passarotti, M., & González Saavedra, B. (2016). Towards an Empirically-based Valency Lexicon of Latin. RiCOGNIZIONI. Rivista Di Lingue E Letterature Straniere E Culture Moderne, 3(6), 51-68. https://doi.org/10.13135/2384-8987/1832
Section
CrOCEVIA