The Lymba K-Extractor is a unique and powerful solution for semantically processing, searching, and extracting information from unstructured text. Harness the power of Big Data with our flexible unstructured to structured knowledge extraction capabilities.

Lymba has created a powerful bundle of knowledge enrichment and ontology building products.

The Lymba K-Extractor will semantically parse stores of unstructured text and automatically extract actionable information into either a tradition RDBMS or RDF store

Our newest version includes the following features:

  • Put structure into unstructured content by using the K-Platform's NLP Pipeline to extract semantic RDF triples from text that can be efficiently indexed and stored in a triple store.
  • Index structured content into a standard RDBMS with appropriate schemas.
  • Infer relationships and concepts from ontologies created using Jaguar, our automatic Ontology Building Tool
  • Align the structured and unstructured content with the ontology stored in the semantic model.
  • Query knowledge stored in a data store easily and efficiently using our built-in Natural Language query interface

Lymba's K-Platform Extractor works with any database as a state-of-the-art tool for natural language processing and ontology building as seen in the figure below. The K-Extractor automatically processes all free text documents (blog entries, pdf documents, word documents, tweets, and more) through the Lymba NLP pipeline and transforms the unstructured content into RDF triples for storage in a Spatial semantic index or standard RDBMS with provided schemas.

Diagram of Oracle 11gR2 K-Platform Extractor

Lymba's K-Extractor is the only semantic processing product for extracting knowledge from unstructured data and pushing it into structured resources using a common ontology if required. Other products extract events, named entities, and some relations, but the Lymba K-Extractor uses our years of research and deep NLP pipeline to understand, organize, and align the data.

The figure below depicts the Lymba K-Platform and Oracle 11gR2 (for example) integration architecture for semantically indexing unstructured data. The architecture relies on a socket-based service to pass semantic knowledge extraction requests from Oracle 11gR2's Semantic Technologies components to the NLP pipeline in the K-Platform.

First, the Oracle 11gR2 user installs and configures K-Platform Extractor services into the database's Semantic Technology components using the installation scripts and documentation provided by Lymba. Once the K-Platform Extractor services have been configured, the user can request for Lymba's NLP pipeline to extract and index rich semantic knowledge from the unstructured documents present in the database tables.

Diagram of Oracle 11gR2 K-Platform Extractor

A semantic index creation request triggers the K-Platform Extractor interface, which passes the input document to the NLP pipeline in the K-Platform via a socket service. The NLP pipeline extracts rich semantic knowledge from the user document and returns the extracted knowledge, in W3C's RDF format, back to the K-Platform Extractor interface via the socket service. The semantic index is then created from the extracted knowledge in RDF format.

All the steps involved in the semantic index creation process, including K-Platform Extractor requests, NLP pipeline execution, loading extracted RDF triples into the semantic index, etc., are performed in parallel. The expected throughput is 1 KB/second/CPU i.e. one kilobyte of user input text is semantically indexed in one second for each CPU used by the K-Platform.