Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Datenverarbeitung")

An off-the-shelf approach to authorship attribution Nasir, Jamal A.; Görnitz, Nico; Brefeld, Ulf Book Chapter | Aus: Dublin City University and Association for Computational Linguistics (Hrsg.): Proceedings of COLING 2014: Technical papers | Stroudsburg; PA: Association for Computational Linguistics | 2014 35007 Endnote: Author(s): Nasir, Jamal A.; Görnitz, Nico; Brefeld, Ulf
Title: An off-the-shelf approach to authorship attribution
In: Dublin City University and Association for Computational Linguistics (Hrsg.): Proceedings of COLING 2014: Technical papers, Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 895-904
URL: http://www.aclweb.org/anthology/C14-1085
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Algorithmus; Automatisierung; Autor; Computerunterstütztes Verfahren; Data Mining; Datenverarbeitung; Information Retrieval; Methode
Abstract: Authorship detection is a challenging task due to many design choices the user has to decide on. The performance highly depends on the right set of features, the amount of data, insample vs. out-of-sample settings, and profile- vs. instance-based approaches. So far, the variety of combinations renders off-the-shelf methods for authorship detection inappropriate. We propose a novel and generally deployable method that does not share these limitations. We treat authorship attribution as an anomaly detection problem where author regions are learned in feature space. The choice of the right feature space for a given task is identified automatically by representing the optimal solution as a linear mixture of multiple kernel functions (MKL). Our approach allows to include labelled as well as unlabelled examples to remedy the in-sample and out-of-sample problems. Empirically, we observe our proposed novel technique either to be better or on par with baseline competitors. However, our method relieves the user from critical design choices (e.g., feature set) and can therefore be used as an off-the-shelf method for authorship attribution. (DIPF/Orig.)
DIPF-Departments: Informationszentrum Bildung

Learning to summarise related sentences Tzouridis, Emmanouil; Nasir, Jamal A.; Brefeld, Ulf Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of COLING 2014: Technical papers | Stroudsburg; PA: Association for Computational Linguistics | 2014 35008 Endnote: Author(s): Tzouridis, Emmanouil; Nasir, Jamal A.; Brefeld, Ulf
Title: Learning to summarise related sentences
In: Association for Computational Linguistics (Hrsg.): Proceedings of COLING 2014: Technical papers, Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 1636-1647
URL: http://www.aclweb.org/anthology/C14-1155
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Abstract; Algorithmus; Automatisierung; Computerlinguistik; Datenverarbeitung; Semantik; Text
Abstract: We cast multi-sentence compression as a structured prediction problem. Related sentences are represented by a word graph so that summaries constitute paths in the graph (Filippova, 2010). We devise a parameterised shortest path algorithm that can be written as a generalised linear model in a joint space of word graphs and compressions. We use a large-margin approach to adapt parameterised edge weights to the data such that the shortest path is identical to the desired summary. Decoding during training is performed in polynomial time using loss augmented inference. Empirically, we compare our approach to the state-of-the-art in graph-based multi-sentence compression and observe significant improvements of about 7% in ROUGE F-measure and 8% in BLEU score, respectively. (DIPF/Orig.)
DIPF-Departments: Informationszentrum Bildung

UBY-LMF. Exploring the boundaries of language-independent lexicon models Eckle-Kohler, Judith; Gurevych, Iryna; Hartmann, Silvana; Matuschek, Michael; Meyer, Christian M. Book Chapter | Aus: Francopoulo, Gil (Hrsg.): LMF Lexical Markup Framework | Hoboken; NJ: Wiley | 2013 33414 Endnote: Author(s): Eckle-Kohler, Judith; Gurevych, Iryna; Hartmann, Silvana; Matuschek, Michael; Meyer, Christian M.
Title: UBY-LMF. Exploring the boundaries of language-independent lexicon models
In: Francopoulo, Gil (Hrsg.): LMF Lexical Markup Framework, Hoboken; NJ: Wiley, 2013 (Computer engineering and IT series), S. 145-156
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Englisch
Keywords: Computerlinguistik; Datenverarbeitung; Lexikographie; Lexikon; Mehrsprachigkeit; Modell; Semantik; Syntax
Abstract: Following a long series of successful scientific projects and collaborations, the community responsible for developing lexicons for Natural Language Processing (NLP) and Machine Readable Dictionaries (MRDs decided to jump start their International Organization for Standardization (ISO) standardization activities in 2003. A group of 60 researchers (cited herein as the "LMF team") spent 5 years gathering requirements and developing the ideas which resulted in the LMF standard. The LMF specification is a success. Numerous lexicon managers currently use LMF in different languages and contexts. This book is dedicated to reporting on a number of these applications. It is structured as follows: Chapter 1 presents the historical context of LMF. Chapter 2 provides an overview of the LMF model. Chapter 3 deals with the Data Category Registry, which provides a flexible means for applying constants like /grammatical gender/ in a variety of different settings. The remaining chapters present concrete applications and experiments on real data, which are important for developers who want to learn about the use of LMF. Despite this success, we do not claim that LMF is perfect. Indeed, several chapters describe a number of limitations and/or proposals for its improvement.
DIPF-Departments: Informationszentrum Bildung

Case study: XLIFF in a large-scale international OECD-study Upsing, Britta; Rölke, Heiko; Andrea, Ferrari; Steve, Dept Book Chapter | Aus: Anastasiou, Dimitria; Vázquez, Lucia Morado (Hrsg.): First International XLIFF Symposium | Limerick: Univ. | 2010 30244 Endnote: Author(s): Upsing, Britta; Rölke, Heiko; Andrea, Ferrari; Steve, Dept
Title: Case study: XLIFF in a large-scale international OECD-study
In: Anastasiou, Dimitria; Vázquez, Lucia Morado (Hrsg.): First International XLIFF Symposium, Limerick: Univ., 2010 , S. 17-19
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computer; Datenaustausch; Datenverarbeitung; Deutschland; Erwachsener; Fallstudie; Internationaler Vergleich; Kompetenz; Software; Test; Tools; Übersetzung
Abstract: Das Programme for the International Assessment of Adult Competencies (PIAAC) ist eine von der OECD (Organization for Economic Co-operation and Development) in Auftrag gegebene und von den teilnehmenden Ländern finanzierte Studie. In der Studie werden Daten von Erwachsenen im Alter von 16 bis 65 Jahren erhoben. Neben einem umfangreichen Hintergrundfragebogen umfasst die Studie kognitive Tests in den Gebieten Lesefähigkeit, Zahlenverständnis und Problemlösen im Umgang mit dem Computer. XLIFF wurde für den gesamten Prozess der Übersetzung, Anpassung und Verifikation der Testmaterialien im Einsatz. Um die Vergleichbarkeit der Tests international sicherzustellen, wurde auf eine strikte Trennung von Text und Layout geachtet. Für die Übersetzung wurde das Open Language Tool (OLT) eingesetzt. Der Gesamtprozess wurde über eine Portal online gesteuert, dass auch eine Vorschau von übersetzten Tests anbot. Insgesamt wurden Tausende unterschiedliche Dateien und Zehntausende Zwischenversionen verwaltet. Unser Prozess zeigte Vor- und Nachteile des Einsatzes von XLIFF auf. Ein Flaschenhals war beispielsweise der zentrale Server mit dem Portal in Zeiten hohen Benutzungsaufkommens. Fehler in XLIFF-Dokumenten konnten bisweilen erst spät festgestellt werden, so dass mehrere Schritte wiederholt werden mussten. Die Installation des OLT war teilweise schwierig. Im Positiven kann man festhalten, dass sich die strikte Trennung von Text und Layout extrem bewährt hat gerade auch im Vergleich zum Ablauf bei den (zeitgleich durchgeführten) Übersetzung der papierbasierten Tests, bei denen diese Trennung nicht gegeben war.
Abstract (english): The Programme for the International Assessment of Adult Competencies (PIAAC) is a study organised by the Organization for Economic Co-operation and Development (OECD) and funded by the participating countries. It assesses skills of adults in 27 countries (35 national versions totalling 24 languages). This is done by administering tests to people aged 16 to 65. The tests are delivered on computer or on paper, depending on the participant s familiarity with information technology. The survey measures literacy and numeracy skills in the participants and collects their background data. In addition, the computer-based version measures how well the participants can solve problems in technology-rich environments, e.g. problems that involve finding information on a web-page. XLIFF was used for the entire translation, adaptation and verification process of the computer-based test material. The rationale for this was to completely separate the text from the layout. The translation of paper-based files was done in word documents. The Open Language Tool (OLT) was used to translate, edit and verify computer-based test materials and background questionnaires. Every upload of a translated XLIFF file made it possible to preview the translated item online. All in all, several thousand different XLIFF files with tens of thousands of distinct versions have been processed so far in the PIAAC study. The translation and adaptation process showed strengths and weaknesses of our workflow as well as of the XLIFF approach. As examples of the drawbacks: the central generation of previews was slow in times of heavy usage of the translation portal; there was a relatively high occurrence of crashes in using the OLT translation editor; corrupt XLIFF files were sometimes detected relatively late in the process, so that a set of motions had to be repeated; maintenance of TMs posed a variety of unexpected challenges; inline formatting that involves tag edition seemed difficult to handle for translators who were not sufficiently familiar with the tool; and the installation and use varied across operating systems. On the positive side: the strict separation of layout and text content was extremely useful in comparison to the translation of MS Word-files mentioned above, where some translators introduced errors and layout changes; the spellchecking options worked well for those languages for which dictionaries were available; different scripts and alphabets were processed smoothly; propagation of 100% matches across XLIFF files worked in a satisfactory way; and, in general, the format was deemed suitable for a mix of players with advanced knowledge of CAT tools and others with virtually no experience with this type of interface.
DIPF-Departments: Informationszentrum Bildung

A monitoring toolset for PAOSE Cabac, Lawrence; Dörges, Till; Rölke, Heiko Book Chapter | Aus: Hee, Kees Max van; Rüdiger, Valk (Hrsg.): Applications and theory of Petri nets: 29th International Conference, PETRI NETS 2008, Xi'an, China, June 23-27, 2008, proceedings | Berlin: Spinger | 2008 7983 Endnote: Author(s): Cabac, Lawrence; Dörges, Till; Rölke, Heiko
Title: A monitoring toolset for PAOSE
In: Hee, Kees Max van; Rüdiger, Valk (Hrsg.): Applications and theory of Petri nets: 29th International Conference, PETRI NETS 2008, Xi'an, China, June 23-27, 2008, proceedings, Berlin: Spinger, 2008 (Lecture notes in computer science, 5062), S. 399-408
DOI: 10.1007/978-3-540-68746-7_26
URL: http://dx.doi.org/10.1007/978-3-540-68746-7_26
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Petri-Netz; Softwareentwicklung; Computerprogramm; Netzarchitektur; Modell; Modellierung; Datenverarbeitung; Datenanalyse; Informatik; Informationswissenschaft
Abstract (english): Paose (Petri net-based Agent-Oriented Software Engineering) combines the paradigm of AOSE (Agent-Oriented Software Engineering), with the expressive power of Petri nets - reference netsto be more precise. While AOSE is a powerful approach when itcomes to designing and developing distributed (agent) applications, itdoes not address the problems specific to debugging, monitoring, andtesting of these applications, i.e. no global state of the system and verydynamic operating conditions. To tackle these problems, two tools havebeen developed in the context of Paose, which are presented in thiswork.Firstly, this paper will give a short overview over the interrelated set oftools, which exists already and supports Petri net- based AOSE. The toolsare centered around the Petri net-based multi- agent system developmentand runtime environment Renew/Mulan/Capa. Secondly, Mulan-Viewer and Mulan-Sniffer will be presented in moredetail - two tools to address the issues encountered during debugging, monitoring, and testing agent applications. Both tools are first class members of the aforementioned family. The first tool, Mulan-Viewer, deals with the introspection of agents and agent behaviors, while it alsooffers rudimentary features for controlling the agent-system. The Mulan-Sniffer as the second tool places emphasis on tracing, visualizing, and analyzing communication between all parts of the multi-agent applicationand offers interfaces for more advanced methods of analysis, such as process mining. Both Mulan-Viewer and Mulan-Sniffer are realizedas Renew plugins that can also be extended by other plugins. (Autor)
DIPF-Departments: Informationszentrum Bildung