Powered by OpenAIRE graph
Found an issue? Give us feedback

ATILF

Computer Processing and Analysis of the French Language
12 Projects, page 1 of 3
  • Funder: French National Research Agency (ANR) Project Code: ANR-22-CE38-0002
    Funder Contribution: 412,460 EUR

    The CODIM project focuses on the two main linguistic resources for organizing monologues or conversations in human languages : D(iscourse) M(arkers) (therefore/donc, well/ben,bon etc. in English/French) and prosody (in particular intonation). It will evaluate their status with respect to two major views on communication: compositionality (the possibility of combining meaningful expressions into more complex meaningful expressions) and pattern or construction-based approaches (the idea that language users exploit partly ‘frozen’ strings of words). We will compare the semantic and prosodic properties of simple and complex French DM (e.g. ah + bon) found in corpora for written and spoken French, using a variety of complementary approaches for DM identification (category-driven text mining), clustering (statistics and Machine Learning) and research in prosody (ToBI representation, speech analysis/synthesis). This will foster or reinforce strong collaborations between linguists and computer scientists.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-22-CE54-0013
    Funder Contribution: 462,332 EUR

    The PREFAB project aims to identify and analyze the prefabricated patterns of French interactions (e.g. comment dirais-je ‘how shall I put it’), in spoken corpora, interactional written corpora, and dialogues in fiction (these resources are already available). The project, initiated by LIDILEM, integrates researchers from ICAR, ATILF and BCL, research units with complementary skills. The modeling is based on construction grammars and includes syntactic, semantic, pragmatic and interactional dimensions. The innovative aspects of the project are : (a) the study of a very wide range of prefabricated patterns, from expressive expressions to metadiscursive patterns (b) an integrated approach to levels of linguistic processing (in a model based on "constructicons") (c) an innovative inductive methodology of corpus exploration (including treebanks) (d) the study of variation between sub-genres and mediums, including a comparison between French and German c. The data compiled will be freely available. They will contribute to the linguistic heritage and will be useful for language teaching as well as computer applications.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-12-CORP-0017
    Funder Contribution: 240,000 EUR

    This project starts from the consideration that it is time conjugating inquiry on a scientific corpus with information technology research tools. Until now the building of information technology infrastructures for scientific corpora is mainly devoted to make available the images and the transcription of the texts. AMPERE2014 starts from the already existing electronic resource “@Ampère et l’histoire de l’éléctricité” (www.ampere.cnrs.fr) and intends to exploit Ampère’s corpus, which is qualitatively and quantitatively impressive. Ampère wrote thousands of pages and discussed the most important subjects of the sciences in the first third of the Nineteenth century, proposed a vast inquiry into philosophy, and analyzed both knowledge and its creative process. Main aim of AMPERE2014 is to empower analysis of and research on Ampère’s corpus through IT applications. This project is planned in order to perform analysis, comparison and connection among elements within the different texts (publications, manuscripts, correspondence, private writings,) of Ampère’s corpus. It will be a process from indexing, to the production of semantics interrelations, until actual research: a real synthesis between scholarship and information technology, something that is in fact a novelty in history of science and inquiry into scientific corpora.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-23-CE38-0007
    Funder Contribution: 536,351 EUR

    The heterogeneity of levels of langage learners is very frequent in the same class and its handle represents a major problem for the langage teachers, which should provide personalised resources to each learner. Thus, the STAR-FLE project aims to propose innovant digital solutions available in the Natural Language Processing (NLP) area, that may improve text comprehension of French L2 learners and that helps teachers to handle multiple levels of learners. We proposed context-based aided for the comprehension of lexical issues, but also of MWE expressions found in original texts. Our system provides MWE identification, generation of definitions adressed to a specific learner’s profile but also synonym search, word sense disambiguation and simpler synomyms and the possibility to chose simpler synonyms for a better comprehension of a text. On the other hand, we build original NLP resources such as annotated CEFR corpus and lexicons, MWE annotated corpus.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-13-BSH3-0009
    Funder Contribution: 239,948 EUR

    In the Ninth Century, the rich Arabic tradition of adab finds its way to Spain, in al-Andalus, which then played a central role in knowledge exchange from the Orient and then relayed to the West, by monasteries from the North of the Iberian Peninsula in the 11th and 12th C. In al-Andalus, the adab literature meets the Jewish sapiential tradition of the midrashic literature. New collections are composed, including original works in the 10th and 11th centuries and from the 12th century on, exempla and philosophers’ sayings are translated into Hebrew, Latin, and Romance languages. Much of this complex heritage is found in the extensive Spanish paremiological literature, which is at its highest in the 16th and 17th centuries, and in current Spanish, Judeo-Spanish and Maghrebian collections of proverbs. Although the main lines of these exchanges are known, we lack specific information on the circulation of these short sapiential statements (our basic research unit), on the successive translating choices made by the translators, the cultural reinterpretations, or the weight of a borrowing over another. If sapiential textual filiations and translation sequences should be treated cautiously, this is particularly true for the sapiential statements contained in these texts. Due to the difficulty of understanding them, these volatile elements, whose categorization varies with time and considered cultures, have never been subject to overall textual studies, which would recount their sources, circulation and evolution through the different spoken or written languages by the three cultures within the Iberian Peninsula, during the Middle-Ages. The paremiological studies have principally produced compilations of proverbs (thesauri); editions; erudite studies dedicated to a single work, a single language or a single culture, except for D. Gutas’ remarkable groundbreaking work on the Philosophical Quartet (1975). The few existing databases take into account contemporary “paremiae” corpora, most often unilingual or with a traductology perspective. Therefore, the aim of the ALIENTO project is to calculate matches even when partial, close or distant connections in order to reassess inter-textual relations by comparing a great quantity of data and intersecting encoded texts written in different languages. This I why the project, which needs a close interdisciplinary collaboration between computational researchers (ATILF) and the linguists and specialists of literature (MSH Lorraine + INALCO and the international network of collaborators), will develop a computational software transferable to other similar texts using a large corpus of reference composed of 8 related texts which circulated in the Iberian Peninsula (in Latin, Arabic, Hebrew, Spanish and Catalan), representing 582 pages for a number of sapiential statements evaluated at 9,570 units. The developed software will extract and connect brief sapiential units through matching generated by the specific encoding system elaborated scientifically and written in an encoding manual XML-TEI. The choice and the type of annotations used result from a collaborative reflexion between the members of the project, specialists of linguistic paremiology, ancient texts, design engineers of textual databases, computational researchers during special scientific sessions. It will evolve in a collaborative manner during the matching processes. At the end we will have: - a body of texts belonging to a multilingual corpus, digitized, tagged in XML/TEI and publicly accessible, linked to a set of data on the text and its author. - a set of brief sapiential units with their XML/TEI annotations, accessible free of charge. - a trilingual questioning interface, making it possible to display the matched statements contained in these works, with information which can be used to study them regardless of the language. - an encoding methodology and a software for matching data transferable to other similar corpora.

    more_vert
  • chevron_left
  • 1
  • 2
  • 3
  • chevron_right

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.