Lymba and Oracle have partnered to offer a unique and powerful solution for semantically processing, indexing, and searching unstructured content in an enterprise semantic database.

Lymba has also worked closely with the Oracle Semantic Technologies team to streamline the communication between the database and the K-Platform, and to introduce new interfaces to make the bundled product easy to use and administer.

Together these two products fill the need to:

  • Put structure into unstructured content by using the K-Platform's NLP Pipeline to extract semantic RDF triples from text that can be efficiently indexed and stored in the Oracle 11gR2 semantic index.
  • Index structured content with the Oracle 11gR2 database.
  • Build ontologies from the set of documents using the K-Platform's Ontology Building Tools that can be stored in the Oracle 11gR2 semantic model.
  • Align the structured and unstructured content with the ontology stored in the semantic model.
  • Query the structured and unstructured content at the same time using SPARQL.

Lymba's K-Platform Extractor works with Oracle's 11gR2 database as a state-of-the-art third party product for natural language processing and ontology building as seen in the figure below. The K-Platform Extractor automatically processes all free text documents (blog entries, pdf documents, word documents, tweets, and more) through the Lymba NLP pipeline and transforms the unstructured content into RDF triples for storage in the Oracle Spatial semantic index.

Diagram of Oracle 11gR2 K-Platform Extractor

Lymba's Third Party Extractor for 11gR2 is the only semantic processing product for organizing unstructured data into a rich and complete set of semantic triples that can be aligned with structured resources using a common ontology. Other products extract events, named entities, and some relations, but the Lymba Extractor uses the product suite of the K-Platform to understand, organize, and align the data each time a new document is added to the Oracle 11gR2 database.

The figure below depicts the Lymba K-Platform and Oracle 11gR2 integration architecture for semantically indexing unstructured data. The architecture relies on a socket-based service to pass semantic knowledge extraction requests from Oracle 11gR2's Semantic Technologies components to the NLP pipeline in the K-Platform.

First, the Oracle 11gR2 user installs and configures K-Platform Extractor services into the database's Semantic Technology components using the installation scripts and documentation provided by Lymba. Once the K-Platform Extractor services have been configured, the user can request for Lymba's NLP pipeline to extract and index rich semantic knowledge from the unstructured documents present in the database tables.

A semantic index creation request triggers the K-Platform Extractor interface, which passes the input document to the NLP pipeline in the K-Platform via a socket service. The NLP pipeline extracts rich semantic knowledge from the user document and returns the extracted knowledge, in W3C's RDF format, back to the K-Platform Extractor interface via the socket service. The semantic index is then created from the extracted knowledge in RDF format.

All the steps involved in the semantic index creation process, including K-Platform Extractor requests, NLP pipeline execution, loading extracted RDF triples into the semantic index, etc., are performed in parallel. The expected throughput is 1 KB/second/CPU i.e. one kilobyte of user input text is semantically indexed in one second for each CPU used by the K-Platform.