Ontology & Graph Creation

White Paper

Ontology and Domain Specific Model Generators

Executive Summary

As more systems enter the marketplace, a growing need arises for these systems to interact with one another. Each Internet of Things (IoT) device has a specific way it passes data to other systems. Interoperability is a growing need that we hear from our clients. How can these systems operate together? Lymba’s method leverages intelligent document processing (IDP) to derive schema information from technical documents, as well as the system itself. We then automatically generate an ontological representation of the information before translating that to domain specific models in machine readable formats using Large Language Models (LLMs).

In previous Natural Language Understanding (NLU) work, Lymba faced a similar challenge: how to process diverse unstructured text? Our approach is to treat each document as its own system.

We automatically created domain ontologies to represent knowledge from a corpus of documents. Ontologies are, similarly, used today to represent the schema of a system. By combining automated ontology generation from text along with structured data from diverse sources, we capture the relevant named entities and relationships to populate graph databases. Using this same framework, we can create domain specific models per system, rather than per document.

Key innovations include the use of advanced document processing technologies, such as NLP, multimodal AI, and Visual Document Understanding (VDU), to analyze and extract structured data from diverse sources, including PDFs and XML Schemas. The generated data undergoes ontology creation and refinement through tools like OWL and large language models (LLMs), enabling machine-interpretable representations of complex relationships. These ontologies are then converted into DSMs and machine-readable interface documents, such as JSON or XML schemas, which enhance data exchange protocols across systems. Each stage integrates AI capabilities with human oversight to validate and ensure compliance with industry standards like JC3IEDM for the U.S. Navy. The framework also includes rigorous validation, and verification to measure performance against benchmarks such as data interoperability and integration times, ensuring the robustness of the approach. The iterative process incorporates domain expert feedback for continuous refinement, addressing scalability and adaptability challenges.

1. Introduction

Modern systems such as the U.S. Navy’s Integrated Combat System (ICS), face a critical challenge in achieving seamless interoperability across their diverse array of sensors, weapons, and communications platforms. With over 40 unique system elements aboard a single surface ship—each employing disparate data models—the Navy struggles to normalize information into a unified, machine-readable framework capable of powering artificial intelligence (AI) and machine learning (ML) applications. Manual efforts to construct a domain-specific model (DSM) and ontology from technical documentation are prohibitively resource-intensive, leading to prolonged integration timelines, soaring costs, and operational bottlenecks that restrict the Navy’s ability to harness AI/ML for real-time decision-making. This fragmentation not only impedes rapid adaptation to emerging threats but also undermines the potential of netted force operations to leverage shared, real-time data across platforms.

By automating knowledge extraction from technical documents and transforming this information into standardized, machine-readable formats, the framework ensures seamless integration, operational efficiency, and scalability.

2. The Opportunity

To address these challenges, Lymba introduces an innovative, automated solution for the generation of surface DSM and ontology from technical documents. The proposed framework harnesses the latest advancements in AI, including large language models (LLMs) such as GPT-4 for semantic analysis of technical manuals, reasoning models to deduce hierarchical relationships and logical dependencies, and automated ontology generation framework to convert unstructured data into formal, machine-actionable ontologies. The key innovation lies in an AI-driven framework that surpasses conventional manual methods by systematically extracting, formalizing, and standardizing knowledge from unstructured technical documentation. At its core, a multi-stage AI pipeline processes diverse inputs—structured schemas and unstructured technical documentation—to generate a surface DSM and an ontology aligned with standards like JC3IEDM. Specialized NLP modules trained or fine-tuned with domain specific vocabulary enable accurate entity extraction (e.g., radar parameters, weapon system dependencies) and data normalization, resolving ambiguities such as conflicting sensor naming conventions. For instance, the framework dynamically aligns a new hypersonic sensor’s data format with legacy systems like Aegis, ensuring plug-and-play interoperability via standardized JSON/XML interfaces. This innovation builds upon Lymba’s existing expertise in NLP, knowledge management, ontology creation and generative AI.

Solution Impact. By automating the creation of surface DSM and ontology, this solution directly tackles the challenges inherent in integrating disparate data models from systems like sensors, weapons and communications. Leveraging advanced AI/ML algorithms, natural language processing (NLP) techniques, and ontological analysis, the proposed framework normalizes disparate data formats into a unified common ontology, enabling seamless interoperability. Once established, the AI-driven DSM allows new sensors, weapons, and communication elements to be integrated with minimal changes to integration software, reducing both time and acquisition costs. Real-time data exchange and standardized machine-readable interfaces (e.g., JSON/XML) not only accelerate threat response timelines but also enhance decision-making processes by ensuring a single, consistent understanding of tactical information. Moreover, the continuous refinement of ontologies by LLMs safeguards the solution’s adaptability to emerging threats, advanced technologies, and evolving mission requirements—ultimately future-proofing your systems. This streamlined approach to data standardization lays the foundation for robust AI/ML workflows, fostering greater situational awareness, human–machine collaboration, and overall effectiveness in contested environments aligning with your strategic goals of agility and technological dominance.

3. Architecture

Fig. 1: High-level conceptual diagram

The Architecture Diagram illustrated in Figure 1 depicts the workflow of the proposed framework.

Our solution introduces a transformative approach for the interoperability of systems, addressing critical operational challenges through three key innovations: (1) a standardized data model that streamlines integration of new components and ensures interoperability across platforms; (2) automated ontology and DSM generation to reduce acquisition and maintenance costs, enabling you to integrate the latest AI/ML advancements into their systems; and (3) an adaptive architecture designed to evolve with emerging technologies, mitigating obsolescence risks and eliminating costly system overhauls. By prioritizing interoperability, cost efficiency, and scalability, this solution directly strengthens system readiness in rapidly evolving operational environments.

4. Lymba’s capabilities and track record

Lymba has extensive expertise in Natural Language Processing (NLP) and Large Language Models (LLMs). Lymba has successfully augmented its award-winning NLP pipeline, KExtractor, by integrating LLMs at the module level, enhancing its capabilities for both general-purpose and client-specific tasks. Lymba has previously worked with Air Force Research Laboratory (AFRL) to create an unsupervised ontology creation tool, called Jaguar. In this project, Lymba developed the NLP tools to extract concepts and semantic relations from text. For a banking client, Lymba developed a system that crawls the web to ingest and process multi-modal data sources, using LLMs to extract themes, sentiments, and key facts. This data was presented in a dashboard view, highlighting regional public sentiment on topics that could affect earnings. Similarly, for an multi-national automotive company, Lymba created a system to aggregate vehicle reports for the National Highway Traffic Safety Administration (NHTSA). This involved classifying multi-source feedback (e.g., emails and questionnaires) into categories such as "electric problems" or "seatbelt issues," and filing reports based on NHTSA requirements.

Lymba’s other products include: (1) Genny®: an automatic question-answering generator system from textbooks for students to help them prepare for exams. (2) TextGraphAI®: an ontology and knowledge graph generator that processes unstructured information to uncover relations and hidden insights.

5. Conclusion

By combining LLMs, automated ontology generation and NLP, Lymba helps stakeholders in both defense and commercial sectors to increase efficiency, lower technology adoption time, and build robust systems. Backed by Lymba’s proven expertise in NLP, ML, and enterprise-grade systems, Domain Specific Model Generation is poised to become an indispensable tool in addressing the complex challenges of modern information environments.

If you are interested in partnering with Lymba or learning more about how CAPTURE can be customized to meet your specific needs, please contact us to schedule a discussion or demonstration. We look forward to collaborating and advancing the frontiers of strategic, data-driven insight.

Schedule a consultiation

Download PDF

Ontology & Graph Creation

info@lymba.com