Research

Lymba's talented team of research scientists and natural language experts have collectively published more than 300 papers in the areas of semantics, ontologies, question answering, and reasoning. Our deep roots in research continue to drive the high quality of our products.

For more than 15 years, dating back to its time known as Language Computer Corporation, Lymba employees have participated in numerous government projects sponsored by ARDA/IARPA, Air Force, National Science Foundation, and Intelligence Community. State-of-the-art technologies resulting from these programs make it possible for Lymba products to provide best-in-class solutions.

Check out our selected publications:

A Semantically Enhanced Approach to Determine Textual Similarity

This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.

Retrieving Implicit Positive Meaning from Negated Statements

This paper introduces a model for capturing the meaning of negated statements by identifying the negated concepts and revealing the implicit positive meanings. A negated sentence may be represented logically in different ways depending on what is the scope and focus of negation. The novel approach introduced here identifies the focus of negation and thus eliminates erroneous interpretations. Furthermore, negation is incorporated into a framework for composing semantic relations, proposed previously, yielding a richer semantic representation of text, including hidden inferences. Annotations of negation focus were performed over PropBank, and learning features were identified. The experimental results show that the models introduced here obtain a weighted f-measure of 0.641 for predicting the focus of negation and 78 percent accuracy for incorporating negation into composition of semantic relations.

Semi-Automatic Domain Ontology Creation from Text Resources

Analysts in various domains, especially intelligence and financial, have to constantly extract useful knowledge from large amounts of unstructured or semi-structured data. Keyword-based search, faceted search, question-answering, etc. are some of the automated methodologies that have been used to help analysts in their tasks. General-purpose and domain-specific ontologies have been proposed to help these automated methods in organizing data and providing access to useful information. However, problems in ontology creation and maintenance have resulted in expensive procedures for expanding/maintaining the ontology library available to support the growing and evolving needs of analysts. In this paper, we present a generalized and improved procedure to automatically extract deep semantic information from text resources and rapidly create semantically-rich domain ontologies while keeping the manual intervention to a minimum. We also present evaluation results for the intelligence and financial ontology libraries, semi-automatically created by our proposed methodologies using freely-available textual resources from the Web.

Automatic Building of Semantically Rich Domain Models from Unstructured Data

The availability of massive amounts of raw domain data has created an urgent need for sophisticated AI systems with capabilities to find complex and useful information in big-data repositories in real-time. Such systems should have capabilities to process and extract significant information from natural language documents, search and answer complex questions, make sophisticated predictions about future events, and generally interact with users in much more powerful and intuitive ways. To be effective, these systems need a significant amount of domain-specific knowledge in addition to the general-domain knowledge. Ontologies/Knowledge-Bases represent knowledge about domains of interest and serve as the backbone for semantic technologies and applications. However, creating such domain models is time consuming, error prone, and the end product is difficult to maintain. In this paper, we present a novel methodology to automatically build semantically rich knowledge models for specific domains using domain-relevant unstructured data from resources such as web articles, manuals, e-books, blogs, etc. We also present evaluation results for our automatic ontology/knowledge-base generation methodology using freely-available textual resources from the World Wide Web.

Polaris: Lymba’s Semantic Parser

Semantic representation of text is key to text understanding and reasoning. In this paper, we present Polaris, Lymba’s semantic parser. Polaris is a supervised semantic parser that given text extracts semantic relations. It extracts relations from a wide variety of lexico-syntactic patterns, including verb-argument structures, noun compounds and others. The output can be provided in several formats: XML, RDF triples, logic forms or plain text, facilitating interoperability with other tools. Polaris is implemented using eight separate modules. Each module is explained and a detailed example of processing using a sample sentence is provided. Overall results using a benchmark are discussed. Per module performance, including errors made and pruned by each module are also analyzed

Lymba’s PowerAnswer 4 in TREC 2007

This paper reports on Lymba Corporation’s (a spinoff of Language Computer Corporation) participation in the TREC 2007 Question Answering track. An overview of the PowerAnswer 4 question answering system and a discussion of new features added to meet the challenges of this year’s evaluation are detailed. Special attention was given to methods for incorporating blogs into the searchable collection, methods for improving answer precision, both statistical and knowledge driven, new mechanisms for recognizing named entities, events, and time expressions, and updated pattern-driven approaches to answer definition questions. Lymba’s results in the evaluation are presented at the end of the paper.